Subject: Free RDBMS market case (was Re: "I've got more programmers than you")
From: "Karsten M. Self" <kmself@ix.netcom.com>
Date: Wed, 3 Oct 2001 14:11:49 -0700
Wed, 3 Oct 2001 14:11:49 -0700
on Tue, Oct 02, 2001 at 11:20:15PM -0700, David Fetter (david@fetter.org) wrote:
> On Wed, Oct 03, 2001 at 01:59:31AM -0700, Tom Lord wrote:

> Free industrial-strength RDBMS?  Just forget it.  Nobody can afford do
> the kind of exhaustive QC such a beast needs for free.  Yes, I know
> about PostgreSQL and like it a lot, but it will never catch up with
> Oracle, Sybase, Informix, etc. 'cause the project doesn't (and a crisp
> $20 says it won't so long as it's Free) have those kind of resources
> available.

Strongly disagreed.

I identified the RDBMS arena as one in which free software would make
significant inroads, May 10 of last year, in correspondence with a
reporter.  My comments at the time:

    Databases.  Quite possibly the next big area for a free software
    success.  Having struggled with installations of the leading
    proprietary databases (while installing the free stuff in my sleep),
    I'm convinced that IBM and Oracle have a bit of an issue in front of
    them.  They can hold the high ground, but the small and mid-range
    will fall, just as with PCs and serverspace.  PostgreSQL just got
    itself a company focused on the product.  Could be good.

My feeling then, and now, is that it is installed legacy systems tied to
RDBMSs, and not any inherent quality, scalability, or capability issues,
which will slow adoption.  It's similar to the desktop situation, but
with a higher level of standardization (SQL) and a smaller (but
significant) retraining issue.

This was followed shortly by a LinuxToday story expressing a similar
viewpoint:

    http://linuxtoday.com/news story.php3?ltsn=2000-05-24-006-20-OP

I ran an altameter query around June 5 with the following results:

    http://x42.com/cgi-bin/altameter.cgi?q=oracle+informix+sybase+msql+mysql+sleepycat+%22sql+server%22+interbase+postgres+postgresql+ingres+db2

    ...that's a list of databases, including a number of free and
    proprietary products.

    Oracle tops the list with 4.7m hits.  MS SQL Server is second at
    1.1m.  In third place, with 911,928 hits, is MySQL.  Pretty
    impressive for a free product.  That beats out Sybase, Informix,
    DB2, and Ingres, and Interbase -- all proprietary products.

Today's results for a similar query on Google (Altameter is broken, this
may be a temporary issue).  Included are both pages returned ("hits")
and impressions per week ("query/wk") obtained through the keyword ad
service preview at Google, both serving as proxies of interest and/or
use:

    Database       Hits        Query/wk
    --------    ---------      --------
    Oracle:     2,180,000       441,900
    mysql:      1,220,000       152,600
    SQL Server: 1,070,000        96,600
    Sybase:       399,000        43,700
    DB2:          344,000        38,800
    Informix:     292,000        24,000
    Postgres:     240,000        17,200
    msql:         167,000         1,200
    Interbase:    106,000        20,300
    Ingres:        90,600             0
    Sleepycat:     18,200             0
    
Note that MySQL edges out SQL Server, and Postgres makes a strong
showing at 7th place, just below Informix.  Trying "Berkeley db" as a
proxy for Sleepycat results in 29,700 hits and 1,800 weekly queries.

Note that by all three measures shown (and granting they're proxies),
free software has *surpassed* MS SQL Server, Sybase, Informix, and
Ingres, to claim second place in general interest, if not overall usage.  

If you're not familiar with the concept of market conversion by low-end
penetration, I'd strongly recommend Clayton M. Christensen's  The
Innovator's Dilemma , Harper Business, 1997, 2000, ISBN 0-06-662069-4.


As for the battle faced by free software databases:  the problem is
eminently suited to free software adoption:

  - It's based on largely open standards (SQL, ODBC).

  - It's a technical problem.  Free software development still seems to
    favor technical domains to human interface ones.

  - There is an identifiable low-end market.  Proprietary database
    software is expensive.  Several proprietary entities have abandoned
    the market by licensing products as free software (Borland
    Interbase, SAP), failing (or nearly:  Ingres, Sybase), or through
    acquisition (Informix -> IBM).

  - The problem is amenable to an incremental approach.  Early takers
    were primitive, low performance, or both:  msql, MySQL, and early
    versions of Postgres.  MySQL still fails to provide key
    functionality including a solid transactional system (I'm out of
    data on this, aware that progress is being made, but understand it's
    not finalized), outer joins, and subqueries.  However its speed
    advantages provide it a healthy niche.  Postgres by contrast is
    approaching the core SQL-standard featureset of comparable
    commercial products, but this has come slowly over time.

  - QC if anything seems by most reports to be facilitated, not
    hindered, by a free software approach.  Database and similar
    enterprise software is more often subscription than purchase based,
    a model which lends more favorably to software quality and away from
    upgrade treadmills.  However, free software tends to beat even this
    record.  The reasons are allegorical and multiple, but include:

    - Open code.  The source is visible.  Users may use it to clearly
     identify likely source(s) of bugs, or submit fixes.

    - Low stigma to admitting bugs.  There are generally few commercial
      sales to lose, the pejorative risks of identifying software issues
      is low.  The probability of being "outed" by the community on a
      particular issue are high.  Might as well beat them to the punch
      if you're aware of the problem.

    - Database QA is largely an issue of regression tests.  Which are
      eminently suited to automation.  Which can be distributed.  To
      users.  Who can run them themselves.  Who can show the results to
      The Boss who asks "but, does it do foo correctly?" or "how fast
      is it at bar?".

The main barriers to entry are:

  - Existing base.

  - Project financing advantages offered by existing players.

  - Potential standards "de-commodification" as practiced by Microsoft
    and exemplified by the cases of Kerberos, NDS, the current W3C
    "RAND" patent policy initiative (the term "uniform fee only", UFO,
    has been proposed by RMS and is strongly encouraged), and changes or
    abandonment of the SMB filesharing protocol.

  - Legislative attacks such as the SSSCA which would call for mandatory
    copy prevention mechanisms to be incorporated in both software and
    hardware.  Either modality would be a potential death blow to a free
    software database.  A software requirement, risking corrective
    mediation by changes to source code, would likely criminalize a free
    software database, its use, manufacture, or distribution.  A
    hardware requirement locking disk access would likely make a free
    software RDBMS implementation nonfunctional.

Given a level playing field on the last two points, significant free
software database penetration is inevitable, if not a current fact.

Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What part of "Gestalt" don't you understand?              Home of the brave
  http://gestalt-system.sourceforge.net/                    Land of the free
   Free Dmitry! Boycott Adobe! Repeal the DMCA!  http://www.freesklyarov.org
Geek for Hire                      http://kmself.home.netcom.com/resume.html


["application/pgp-signature" not shown]