Subject: Re: FSBs and mechanized documentation
From: Rich Morin <rdm@cfcl.com>
Date: Sat, 11 Mar 2006 13:42:48 -0800

At 12:01 PM -0500 3/11/06, simo wrote:
> On Sat, 2006-03-11 at 00:56 -0800, Rich Morin wrote:
> The problem with harvesting mechanically bug databases or
> even source codes is quality.  How do you choose gold
> pieces and scrap waste?  A mechanized system is not able
> to distinguish between an insightful bug report and crap.
>
> If you end up with too much waste in the result people
> will not use it.

True, though we tolerate quite a bit of noise in, say, the
results that Google provides.  If the first page has two
hits, six plausible matches, and 40 misses, I'm happy.  If
you're looking for all of the bug reports that involve the
foo() and bar() functions, and the doc system can report
them in a second, it can save you a lot of time and effort.

My focus on entities and relationships, however, is not an
accident.  If a human has declared that particular types
of entities can have particular types of relationships, and
the instances of relationships can be filled in by humans
or programs, the results are likely to be highly reliable
and very specific.

Of course, most of the relationships will be irrelevant to
the question at hand, even if they concern an item that the
user has specified.  A function may make dozens of function
calls and be called by hundreds of other functions.  It uses
specific data structures, is mentioned in certain documents,
and resides in a file which uses certain include files, etc.

Giving the user a way to winnow out "noise" relationships
isn't trivial, but there are some useful principles that can
be applied.  For example, we know that humans are much better
at pattern recognition and selection than they are at
specifying and remembering details.  So, we let the machine
present possible matches (and match criteria).  The human can
then navigate to the ones that seem most promising.

Again, I seem to have drifted off into design questions, but
I hope that this convinces you that the object isn't just to
collect and present random data.

-r
-- 
http://www.cfcl.com/rdm            Rich Morin
http://www.cfcl.com/rdm/resume     rdm@cfcl.com
http://www.cfcl.com/rdm/weblog     +1 650-873-7841

Technical editing and writing, programming, and web development