Subject: Re: Mechanizing community information resources
From: "Forrest J. Cavalier III" <>
Date: Mon, 23 Sep 2002 12:34:24 -0400 (EDT)

Karsten (and others) please take a look at
for some ideas.

"Mechanizing information resources" as you call it,
is a problem I have thought a lot about, (both on the
organizational side and the value/funding side.)  I 
have been integrating items into the "Programmers
Webliography" at RocketAware for several years.  There
are now 56,000 items in the database.

Dale Dougherty at O'Reilly was interested in the project
at the beginning. He and I met at LinuxWorld NYC about it
several years ago. I even caught some interest from Jon Erickson
(the editor at DDJ.) But in both cases it wasn't clear then
(or now) how it fit with their market, or even how it should
progress to create the most value for what community.

------------------------------------ gets about 1500 unique visitors a day,
but nearly all of them come from search engines. This
matches with the pattern of "path of least effort" that
people typically use to find information.  (They will
look on their own bookshelf before they ask a colleague,
for example.  And they will want to search a good general
search engine before going to specialty sites.)

Unless a significant percentage of visitors arrive
from a bookmark, or customers want to pay for a syndicated
subset, selling subscriptions is probably not viable.

Making an advertising play has become very iffy over the
last two years.  (Although back when I was talking to Dale
Dougherty, he claimed that ad space on sites
were selling for upwards of $40 CPM, which I found hard to
believe even then.)

To be honest, until there is a solid funding mechanism
I am kind of hoping that RocketAware doesn't get any more
successful.  It is hosted at, and fits in their
space and bandwidth limits.  If it grows much more I'd
have to throw real money and time at it to handle the

Since most of the visitors are basically coming from google,
and the site already has a pretty high google page rank, I 
expect the traffic to continue to increase steadily, not
as a big jump.  So I think I am safe to work on the spin-off

For the design of projects like this, two books I can
recommend is "Introduction to Cataloging and Classification",
by Wynar and Taylor. And "Developing Library and Information
Center Collections" by G. Edward Evans. (Both are a library
science texts I first borrowed from a relative who has an
MS in library science.)

There is a serious (and hard-to-solve) problem you won't
see until you get to about 20,000 entries.  As you add content,
your relevance to a community increases, but your relevance
to an individual decreases.

The first choice is to organize by subject and split categories
when they get to have a 100 to 200 entries. That works until
you get to about 20,000 entries, at which point you will have
some subjects with 300 or so entries and no obvious way to
sub-categorize by subject.

You could choose to sub-categorize by some other criteria,
but then you have a mixed hierarchy.  This is what freshmeat
and SourceForge do.  It works to keep the number of items
at a node reasonable, but it ends up forcing some visitors
to read through multiple categories for what they want.

RocketAware takes the approach of having a purely subject-based
classification, and then allowing the visitor to filter on
various items.  (SourceForge and Freshmeat (Trove) allow
filtering also, but it isn't powerful enough.)

More importantly, I do all the classifications for items
into the RocketAware subject tree.  Letting visitors and
untrained users do it leads to misplaced items (as
SourceForge, and to some extent FreshMeat shows.)  Early
on I workedt a lot on a classification user interface that
I could use efficiently.  I'd say it works pretty well,
since I can integrate items into a 300-subject tree by hand
upwards of 100 items per hour, and get bursts of 500 per hour,
(depending on the source, and pre-organization.)


These and other details are probably off-topic for fsb, but I'd 
be willing to continue off-list.