Subject: Re: Data on open source business impact
From: "Tim O'Reilly" <tim@oreilly.com>
Date: Wed, 27 Oct 1999 15:58:50 -0700



Ben_Tilly@trepp.com wrote:
> 
> > Well, I'm a big fan of data, and I do think that there is a huge
> > opportunity for us in the open source/free software community to collect
> > useful data that the big market research firms will never think of.
> > (Some of you may have heard me talk about the impact of the netcraft
> > survey on business perception of the importance of Apache, for
> > instance.  This would have been completely overlooked by traditional
> > market researchers.  This is why we need to keep remembering that the
> > "hacker community" of friends and potential allies is considerably
> > larger than people who share any particular slice of this migration.)
> >
> So what data would you suggest collecting?  Here are a few items
> that I would consider relevant:
> 
>  - How many unique names appear on the credits in an average
>    free software distribution?
> 
I believe Paul Jones at Metalab is doing some work on this; I've been
encouraging him to do so, and I think he has some intriguing results. 
Not sure when they'll be published.

>  - How many downloads are there from popular sites X, Y, Z, with
>    (if possible) statistics by project, project type, licence, etc.

Agreed.  This would be great.  One problem we have is that we don't even
have a good consensus (something like the Media Metrix ranking of web
sites) that would allow us to identify the top download sites and start
tracking them.  The number of people who have told me "I have the
largest download site for free software" is significantly larger than
the possible contenders for that title :-)

So many times, someone says, "well, you can't trust these numbers
because there are so many other download sites as well" but it would be
nice to do some homework and try to assemble stats from some of the
largest, and get some widely published data and trend graphs.

What sites should we be tracking/getting stats from?  What sites do
members of this list use for software download?  Metalab?  Freshmeat? 
the site formerly known as cdrom.com?  cpan? 
your_favorite_technology.com or .org?

> 
>  - How regularly do updates happen to free projects?

Again, I think that Paul's study addresses this.

> 
>  - For projects which use standard bug-tracking software, can we
>    get estimates on numbers of bugs, the turn-around times on bug
>    reports, etc?  Break this out by the severity of the bug.  The
>    second statistic is what the same statistics are for people who
>    had support contracts with companies who are in the business of
>    supporting this software.  (eg LinuxCare and Red Hat.)  If
>    comparative statistics can be drawn (eg through surveys) of the
>    response times from various vendors, the resulting study could
>    be *very* valuable...
> 

I know there are a couple of theses (e.g. at MIT) looking at this; I've
seen a couple, but I don't think they've been published yet.  But I
agree--getting these stats would be a good thing.

>  - In a survey of people who work with various pieces of software.
>    How many bugs are you aware of outstanding in current/your
>    versions of said software?  Specify by type of bug.  (This may
>    be subject to interpretation.  For instance I consider it a serious
>    design bug in JavaScript that the variables are untyped AND
>    the operators are untyped.  That may be the spec but one of
>    those two REALLY SHOULD be typed.  Programmers should
>    not be left wondering why 2+2 is 22!)
> 
> What other statistics am I missing..?

There are all kinds of cool things:  traffic on usenet/mailing lists/web
mentions/links to prominent sites/effect of slashdot postings on book
sales (Amazon refers to the "slashdot effect" -- it's the only site
where a single mention markedly affects sales)/actual composition of
typical linux distributions (authorship of code--e.g. how much of Linux
comes from FSF, Berkeley, etc.) etc. etc.

Also, what other market share figures might we be able to track like the
netcraft server survey and internet operating system counter?  What kind
of "hidden signatures" might tell us about open source use at  big
companies, for instance?

(A good example of this latter point:  we sucked down the Amazon
purchase circle data, and were interested to note that Perl appeared in
more corporate purchase circles top ten lists than any other technology
topic.  The only topic that showed up more often in the entire purchase
circle database was Harry Potter.)

The list could go on and on.  Anyone who'd like to work with us on this
kind of stuff, and has ideas for what we could track, is welcome to
start up a private dialogue with me or with Madeline Schnapp
(madeline@oreilly.com).  (Or to continue the discussion on this list...)
> 
> Ben

-- 
Tim O'Reilly @ O'Reilly & Associates, Inc.
101 Morris Street, Sebastopol, CA 95472
+1 707-829-0515, FAX +1 707-829-0104
tim@oreilly.com, http://www.oreilly.com