Subject: RE: business case for mechanized documentation
From: Rich Morin <rdm@cfcl.com>
Date: Thu, 13 Apr 2006 14:43:40 -0800

At 2:21 PM -0600 4/13/06, Anderson, Kelly wrote:
> ... the user -vs- programmer audience issue is the largest gulf.

True.  My present focus is on administrators, programmers, and the
occasional user who wants to investigate some underlying issue.  I
would like to build on this, eventually, to answer user questions
such as "where did this file come from?", but that's more long term.


> man worked initially because most everything in unix was command
> line, and geared towards programmers. ...

Agreed.  Let's agree to ignore the general class of users for now.


> I think we can agree that ... the audience is critical.

The man pages work (to the extent that they do) because their users
are willing to RTFM (and supplementary documents) until they have a
grasp on the situation.  So, a man page may not be very forthcoming
with background and context information, but the dedicated reader
can still "get there" with enough effort.  Not a perfect solution,
but one which matches the needs and resources of most developers.


> Ok. That's a great goal, how do you propose to accomplish it?

I have several essays online that cover some of my ideas on this
topic.  See http://www.cfcl.com/rdm/MBD if you are interested.


> Entities and relationships are a great modelling tool.  Deriving
> these automatically from source code, however, seems a very AI
> intensive task.  How do you propose to move your research away
> from AI and towards something concrete?

"AI intensive" is an understatement for the problem of automatic
harvesting of relationships (though some data mining packages are
able to approach this in some situations).  However, that's not my
goal for the moment.  Rather, I hope to define the ontology by
hand, then use mechanical means to populate it with instances.


This (simplistic) table shows the general approach:

              Entity                Relationship

  Class       Ent. Class (EC)       Rel. Class (RC)

  Instance    Ent. Instance (EI)    Rel. Instance (RI)

Here are some representative examples:

  EC:  file node, control file, program

  RC:  reads, writes

  EI:  /etc/passwd, /bin/passwd

  RI:  /bin/passwd reads and writes /etc/passwd

If the documentation system "knows" that a program may read (and
sometimes write) a control file, it can accept the fact that the
/bin/passwd program (or really, a process embodying it) can do
both to /etc/passwd.

Although I've already done this sort of thing using imperative
languages, the results weren't as scalable as I wanted.  However,
knowledge representation and reasoning systems (e.g., OpenCyc,
OpenLoom) are quite capable of storing this sort of information,
answering queries, etc.  I'm currently playing with OpenLoom, to
bring myself up to speed on it and evaluate its suitability.


The major challenges, from my perspective, lie in

  *  creating the ontology (including relationships)

  *  harvesting the instance information from Unix,
     folding in expert knowledge, etc.

  *  using the knowledge to ease navigation, provide
     context, etc.

So, although there is some AI component to what I'm doing, it's
not as if I need to do any cutting-edge work there.  Because the
topic area (Unix) is relatively well defined, creating a (naive)
ontology is much easier than in many other domains.  My current
work on this (barely started, at present) is located at

  http://www.cfcl.com/rdm/mediawiki/index.php/AC_Index


> (Note: My definition of Artificial Intelligence is "The set of
> computational problems that have not yet been resolved well
> enough to earn their own label."

and thus be classified as "simple applications programming" (:-)

-r
-- 
http://www.cfcl.com/rdm            Rich Morin
http://www.cfcl.com/rdm/resume     rdm@cfcl.com
http://www.cfcl.com/rdm/weblog     +1 650-873-7841

Technical editing and writing, programming, and web development