Subject: Re: mechanised documentation and my business model solution
From: Rich Morin <rdm@cfcl.com>
Date: Mon, 27 Mar 2006 10:43:55 -0800

At 9:30 AM -0800 3/27/06, Brian Behlendorf wrote:
> There's at least two wiki tools backed by SVN:

Thanks for the pointers!


> I'm a strong believer, though, that content specific to code should
> be built and versioned next to the code. ...

What he said.


> I suggest the answer lies less in using some alternative tool just
> for documentation (severing ties between docs and code) and instead
> adjusting the developer tools ...

There are some things that are difficult to do "on the fly" as you are
reading a file.  For example, let's say that the user is looking at a
function and wants to find all of the other places where it is used.
Unless s/he is very patient, this usage information should be indexed.

Making matters worse, the user might desire more complex examples.  I
might want to find a case where two functions are used, but a third
one is not.  Doing a broad set of searches and sifting through the
results by hand might work, but it is not the right Way To Do It.

In general, we can't know what information a programmer (or user, or
operator...) might need, let alone the ways in which the information
might be constrained, combined, etc.  So, we end up indexing everything
that might be of interest and using a query language (SQL, perhaps, but
suitably hidden from view) to extract particular results.  Although this
could be glued onto a version management system (preferably by some
loose coupling), it still sounds a lot like mechanized documentation.


> I also feel that proper, searchable archives of mailing lists often ...

Agreed.  Several years ago, a friend suggested a cooperative system that
would store and retrieve the meaning of particular error messages, hints
about context and debugging strategies, etc.  I thought it sounded like
a cool idea, but nothing came of it.  Later, using Google to look up
error messages, I was struck by the way his request had been granted.

However, the combination of Google and email archives is not a substitute
for well-structured documentation.  IMHO, some of the greatest benefits
of the web lie in the fact that search can allow us to find structured
documents that someone has taken the time and trouble to create.  And, if
we can convince machines to do part of the work, why not do so?


> The reason we have Google rather than Xanadu is that creating
> structured data is much harder (per value received) than sorting and
> filtering unstructured data.

Structuring the entire content of the web is an unrealistic goal.  Clay
Shirky creates and demolishes this straw man in his essay:

  The Semantic Web, Syllogism, and Worldview
  http://www.shirky.com/writings/semantic_syllogism.html

However, he also notes that particular piles of data can and will be
tied together in useful ways.  Basically, the notion is that starting
from a single source of structured (or structurable) information is
much easier than trying to add structure to random sources.

In the case of software systems, we are starting with highly-structured
and readily-available information.  This allows Doxygen to produce useful
results from C/C++ code, even when the programmers haven't added any
comments at all.  I submit that a more comprehensive system could produce
even more better results, because of network effects.

-r
-- 
http://www.cfcl.com/rdm            Rich Morin
http://www.cfcl.com/rdm/resume     rdm@cfcl.com
http://www.cfcl.com/rdm/weblog     +1 650-873-7841

Technical editing and writing, programming, and web development