Subject: Re: Software architecture matters (was Re: Free software businesses)
From: Thomas Lord <lord@emf.net>
Date: Wed, 06 Sep 2006 11:52:30 -0700

Kragen Javier Sitaker wrote:

 > Sound interesting?

Yes.   I gather this grows out of some of your experience with
wheat, etc.


> So here's a laundry list of architectural features that support
> practical software freedom:
>
>   

People might not notice what you did.   You addressed the questions of
software architecture with respect to basic software freedoms, almost
but not quite in the same way RMS enumerates them.


    Kragen:

    1. The freedom to study a program that you use.
    2. The freedom to copy a program that you use.
    3. The freedom to modify           "
    4. The freedom to redistribute      "                  (w/ or w/out 
modifications)

    RMS:

    0. The freedom to use a program you possess, without restrictions.
    1. The freedom to study the program.
    2. The freedom to redistribute the program to help your neighbors.
    3. The freedom to modify the program (including sharing modifications
        with the public)

(With the exception of the omission of "use", your list more closely 
resembles
the structure of the GPL.)

So, enclosed is a long point by point.   Hope it's helpful and I might be
interested in learning more, perhaps off-list, about what you are planning
to work on.

-t



> 1- features that help in study:
>    a- runs on the user's machine, not on some distant server or in a
>       sealed box; that's what most of the previously cited essays are
>       about
>   

To be sure, the death of client-side computing is greatly
exaggerated, but I think you go too far in ruling out "distant
servers".

I've met and read about several projects who build one kind or
another of grid and looks to me like the prevailing bet these days
is that (node + Tb)/hours are expected to be very inexpensive
in coming years.    "Free for (limited / select) experimental or
personal use," is already happening at multiple scales.

I think that if we value open source practices and licensing, or
anything close to them, we have to think about "distant servers"
as an important, emerging platform.

Operating at scale, with many users, applications on grid platforms
are *not* almost free (e.g,. the operating costs for Google Search).
While there are projects such as RAD (at UCB) aimed at lowering
those costs there are rather stubborn lower bounds provided by
electricity and bandwidth costs.

Even if we try to stay within the constraints of P2P supercomputing
then, at scale, we have application providers competing for what
will become a scarce resource ("screensaver hours" and bandwidth)
and the same need to monetize remains.   P2P supercomputing
has to compete against grids on price/performance and, as
that competition heats up, even the node providers want
a better rate of pay than "cool screen-saver".

So, we have this new rival resource -- commodity super-computing --
and more and more examples of how to use that platform to
manufacture profitable products.

Therefore, proprietary vendors will (continue to) concentrate hard
on the problem of monetizing services that can only run on "distant
servers".  They want to be in the business of buying up commodity
computing and bandwidth and turning a profit on services delivered
from that platform.   They have no a priori reason to respect an ideal
of maximizing the utility of client-side computing -- they'll (continue
to) intrude on traditionally client-side services.   They are pretty
much constrained to adopt "user pays" models and will often do
so in problematic ways (e.g., user payments of the form of viewing
ads or contributing data to closely-held socio-economic databases).

The good news is that developer access to this new platform is
already cheap and becoming cheaper.   The RAD vision of a
guy in his basement writing the next Google over a long weekend
is not particularly absurd.   Make a good pitch to Sun and you
can get some grid cycles.   Prices from Amazon are pretty
interesting.

The free software / open source communities have to embrace
that reality, not avoid it.

Where does that leave, say, Peru?   Well, in the short term such regions
continue to represent an opportunity for experts in providing
client-side utility, especially on modest computers.   How long will
it take unchecked, proprietary, grid-based services to remove that
opportunity?   It is not an accident that Mozilla -- a fancy terminal
emulator -- has such prominence among free software/open source
projects.

For all of us, it is foolish to neglect client-side and LAN computing
simply for reasons of survivability and perhaps ultimately for reasons
of energy efficiency.   But the food and water I stash away in preparation
for an earthquake doesn't eliminate my interests in the municipal water
system -- grids (and P2P supercomputing) aren't platforms for
open source to overlook.

>    b- interpretation (since source is more likely available) or a
>       reasonable simulacrum (as Python does)
>   

"interpretation" is an ambiguous term here and the aside about
Python just adds to my confusion.   I have no idea what you
mean by this point.


>    c- navigation tools --- go to definition, show uses; at least tag support
>   

There is a trap here unless you say "lightweight, hackable, navigation
platform" rather than "navigation tools".

Let us suppose that we wind up with, as it sometimes seems like we are,
most programmer training consisting of orientation in the use of one
or two closely controlled (even if open source) monolithic whizz-bang
IDEs.    Such navigation tools do not liberate programmers to
manipulate source with freedom -- they constrain programmers to
focus on becoming more and more efficient at manipulating source
*within externally proscribed constraints*.   Line coders.   It's a
new version of what "programmer" meant in the 40s/50s: a skilled
clerical worker who, as part of a large team, could hand-compile
somebody else's program (while not especially understanding or
caring about the whole).

Closely related to a navigation platform is a transformation platform:
how difficult is it write programs which transform other programs?



>    d- embedded documentation (doc strings: the poor man's Literate
>       Programming)
>   

I think so to although, in practice, even in systems where they have a
long history (e.g., Emacs) I see a lower and lower percentage of new
ones being written in such a way as to be useful.

Perhaps, in part, we need work practices and pedagogical uses
of doc strings that better realize their value.



>    e- higher-level languages (less code to wade through)
>   

Less code, sure, but often greater flexibility and generality, also.

It is generally easier to transform high-level code either by modification
or by changing the rules of its interpretation.    Closely related,
compilers for (well designed) higher-level languages can, in many
useful cases, discover that a particular application of a piece of code
reduces to a simpler case (for which more efficient code can be
generated) yet, at the same time, and different use of the same code
(when the reduction doesn't apply) is still compiled correctly.  (E.g.,
some uses of type inference.)

Of course, it is a lesson that goes back to the late 70s and 80s that
the pattern of designing domain specific languages (whether as
tiny tools in unix or complexes of macros in lisp) is often a great
way to solve a problem.



>    f- transparent syntax (Python 1.5.2 is the current optimum, perhaps
>       with Python-syntax list comprehensions added; compare Haskell list
>       comprehensions to the Python version to see how much difference in
>       clarity formally irrelevant syntax details make)
>   

Let's avoid the religious comparison's here.  

One consideration, re syntax, is how amenable it is to lightweight 
navigation
and transformation.

Lisp syntax, for example, has advantages in editor implementation and
command sets, and advantages for the macro system (and other program
transformers).   The obvious trade-off is that it is harder to learn to 
read and
easier to write incomprehensible examples of.    Yet, if you watch a skilled
lisp hacker sling code in whatever version of Emacs she uses, or a lisp
team accomplish so much more with such a smaller number of hackers, the
trade-offs start to look plausible.   It's only when you see how 
difficult it is
to train line-coders to use lisp that you begin to see the other side.



>   g- live results display (as in a spreadsheet, or as in the XUL editor
>       window from the Mozilla Extension Developer's Extension; very
>       difficult for imperative languages!)
>   

Exploratory programming is a win for many applications, certainly.


>    h- generally, being more functional and less imperative
>   

Under the heading of "features that help in study" that's quite a claim.

I'm pretty sure I *agree* but I wouldn't raise it above the line of
informed opinion.   There is interesting evidence to the contrary (e.g.,
talk to "programming 101" teachers about the relative ease of teaching
simple imperative vs. simple functional programming -- the latter,
from what I hear, requires a much greater leap in understanding).



>    i- being able to look at values in live programs (e.g. the DOM
>       inspector in Mozilla, the JavaScript shell in the Extension
>       Developer's Extension, evaluating expressions on the fly in Emacs
>       and Squeak)
>   
At what cost?   A lightweight, easy to hack, *platform* for debugging
is probably the way to get greatest utility.  

For example, I do a lot of C programming.   GDB is usually fantastic.
Once upon a time I got to see a C *interpreter* that offered even
fancier features for live interrogation for a program.    The problem,
in both cases?   The fancier they get the slower and/or flakier.  
The ancient, original debugger -- the print statement -- still takes the
day much more often than I would have guessed it would by now, 20
years ago.


>    j- Tim Berners-Lee's Rule of Least Power: for each part of your
>       system, write in the least powerful language that will do the
>       job --- in the sense of being furthest from Turing-complete.
>   

I wonder if he meant "language" in the lisper sense -- "the least
powerful abstraction".    It's often a good principle, in that light.

For example, "You extend this part by declaring new finite automata"
is often better than "You extend this part by writing a new, arbitrary 
function."
The automata makes it very clear what clients of the part being extended
expect while the arbitrary function invites people to abuse the 
extensibility
hook.    The automata declarations are easy to read, analyze, and manipulate
whereas arbitrary functions can get quite out of hand.



> 2- features that help in copying:
>    a- being self-contained, like MacOS applications, not spread all over
>       dozens of directories like Microsoft Windows and Linux applications
>    b- being small, like Squeak, not large with zillions of dependencies
>       like .NET
>   

Sure, well-managed modularity does have something to do with copying.


> 3- features that help in modifying:
>    a- all the features that help in studying, of course
>    b- compilation, if any, must be rapid, perhaps incremental, and
>       transparent (Squeak excels here)
>    c- the ability to run new code without restarting the program or
>       deleting all your data
>   

Don't exaggerate or overlook the other sides of these issues.

The higher the demands you place on compilation the more risk
you create regarding well-managed modularity (your hairy compilation
system is present in the list of dependencies).

And while exploratory programming is a great tool, there are at
least three big problems to consider:

1) It is a mistake to rely on systems that "never shut down" because
    it is too easy to inject new code into such a system from which
    no recovery is possible.

2) Injecting new code into a running system is a great way to explore
    and often effective, but a trap occurs when that new code is found
    to be working nicely in an environment that violates invariants the
    new code would violate had it been present at start-up time.   You
    restart the system, now with the new code, and suddenly less is working
    than was working in the first place.

3) The discipline of a compilation phase helps to ensure that there is
    always a piece of static text that can be transformed into the system
    you are working with.   Absent that -- as in "save the world" systems
    like Smalltalk or some lisp machines -- great confusion and
    irreproducibility  can ensue.


> 4- features that help in redistribution:
>    a- all the features that help in copying, of course
>    b- for redistributing modifications, decentralized source control
>       helps a lot; we have a long way to go before I can automatically
>       accept changes from my friend Ping, queue for review changes from
>       my cousin Ben (who sometimes makes totally insane choices), and
>       ignore changes from that guy who tried to root my box last year,
>       while also giving me the ability to instantly switch back to
>       running the "official" version to see if I can reproduce a bug
>    c- capability security and getting serious about the Principle of
>       Least Authority would go a long way toward letting other people
>       try your code, the way you can get them to try your JavaScript
>       toys just by sending them a link
>       (http://pobox.com/~kragen/sw/js-calc.html --- graphing SIMD RPN calc!)
>
> Many of these features are present in Tom's example of the web site
> (1a, 1b, 1e, almost 1g, 1j, 2a in the case of PHP, 2b, 3b, and 3c)
> but, as far as I can tell, none of them are in the mainstream of
> open-source desktop development in GNOME and KDE.  A large part of
> Canonical Ltd.'s R&D seems to be devoted to 4b, thanks largely to Tom.
>
>   

Thank you for noticing.



-t