Subject: Re: Back to business [was: the Be thread]
From: Keith Bostic <bostic@abyssinian.sleepycat.com>
Date: Wed, 24 Nov 1999 12:22:32 -0500 (EST)

> From: Crispin Cowan <crispin@cse.ogi.edu>
>
> The kernel is much easier to contribute to than GCC.  The kernel is, in some
> sense, a very large library of components, with a consistent (syscall)
> interface and shared set of resources.  The rules one must follow to play in
> the kernel are relatively simple.

This is true for the Linux kernel, because, in a lot of ways,
it is a relatively simple kernel in which to work, e.g., it
isn't really multi-threaded.  This isn't nearly so true for
the Solaris kernel.

> From: Ian Lance Taylor <ian@airs.com>
>
> I don't really agree with this, actually.  To contribute a significant
> optimization or a new processor port to gcc, there's a fair amount you
> have to learn.  On the other hand, the sources include a long manual
> which is reasonably clear to a programmer and quite detailed.  It's
> fairly easy to contribute a new warning, for example--that was my
> first gcc patch (-Wmissing-prototypes).

I agree, there are things in any project that are useful and
that fall readily to the lots-of-developers approach.  I would
argue, without any proof, that this type of work tends to be
less "fun" for hackers.

In my own case, Berkeley DB, we get almost no contributions from
our user base.  Users find bugs, of course, but I can count on
two hands the number of submissions that included an attempt at
a fix, and I can think of exactly two bugs where the fix was
right.  We have gotten exactly one feature implemented outside
the group.  I can think of two reasons for this: 1) DB is a
complex piece of code (high-concurrency, threaded,
transactional, recoverable, B+tree), or 2) database systems is
really algorithms hacking, and doesn't appeal to many people.

I mentioned the Linux kernel above.  There are many software
problems that have traditionally been difficult to implement in
a distributed group, e.g., OS virtual memory.  Those problems
have usually fallen to small groups of people, usually working
on them full-time.  (It's difficult to write complex software
solutions if it's not your day job.  Thinking creatively about
the VM and buffer management isn't best done after 8 hours of
selling TVs.)

So, that said, the interesting question for the manager of a
distributed open source project is: should I deliberately shun
complex solutions, in favor of a brute-force modular approach
that lets me have a less sophisticated programmer base, and
that requires less start-up time?  Software written that way
is likely to have fewer bugs than more complex solutions, plus
I'm confident that CPUs will continue to become faster.

For example, I believe log structured filesystems are actually
easier to write than conventional filesystems, as long as you
ignore filesystem accounting (except in very general ways) and
cleaning when you're almost out of disk space.  As the Linux
filesystem guy, I might be inclined to say: "Disk space is
cheap.  Let's write a simple filesystem that requires lots of
disk space to run well, and simply refuse all user requests if
there's not a fair amount of free disk space."

--keith