Subject: Re: Heading back towards the mailing list subject
From: Rich Morin <>
Date: Mon, 27 Mar 2006 11:15:26 -0800

At 9:55 AM -0800 3/27/06, DV Henkel-Wallace wrote:
> Businesses that need to follow ISO 9000 ... need documentation
> tracking systems.

Yep.  So do the companies that are scrambling to meet Sarbanes-
Oxley requirements, to say nothing of defending against lawsuits,
dealing with Homeland Security "requests", etc.

> Typically these systems are outrageously priced...

That is a judgement call, but I'm quite certain that they are not
inexpensive.  There are several reasons for this, including legal
liability and the complexity of the problem space.

The hardest problem, however, is that the input material isn't
well structured.  Sure, you can track the documents, but how do
you track the concepts inside them?  Unlike computer source code,
the tokens in English text are context-dependent, ill-defined, etc:

  "The question is", said Alice, "whether you can make words
  mean so many different things."

  "The question is," said Humpty Dumpty. "which is to be master
  - that's all."

    Lewis Carroll's "Through the Looking Glass"

Nonetheless, there are some Open Source technologies that might be
relevant to this sort of work.

  Unstructured Information Management Architecture (UIMA) is an
  open, industrial-strength, scalable, and extensible platform
  for creating, integrating, and deploying unstructured information
  management solutions from combinations of semantic analysis and
  search components. (edited slightly)

  As today's information explosion generates greater and greater
  volumes of raw data, the challenge of storing and retrieving this
  information in the most efficient manner continues to grow, whether
  the data is stored on a local disk or distributed over the  World-
  Wide Web.

  Managing Gigabytes helps you to meet this challenge by showing how
  to capitalize on new methods of compressing and  accessing data,
  enabling you to store information more efficiently and locate
  specific items more quickly and cost-effectively  than ever before.
  It  uniquely covers fully-tested techniques for both text and image
  compression and shows how to construct a tailor-made electronic
  index for accessing text, scanned documents, and images.

  The Pentaho BI Project provides enterprise-class reporting,
  analysis, dashboard, data mining and workflow capabilities that
  help organizations operate more efficiently and effectively.  The
  software offers flexible deployment options that enable use as
  embeddable components, customized BI application solutions, and
  as a complete out-of-the-box, integrated BI platform.

I am compiling a list of Open Source tools that seem interesting
for use in Model-based Documentation.  Suggestions are welcome:

However, none of these tools is more than infrastructure for the
kinds of information-management tasks that modern corporations are
facing.  My personal take is that, if Open Source is going to be a
player here, it will have to do so by pulling together the efforts
of corporations who have the need and resources to make significant
contributions.  This isn't going to be done by a guy in a garage...

--            Rich Morin     +1 650-873-7841

Technical editing and writing, programming, and web development