Subject: Re: IC's mystery patent claims
From: Rob Cameron <>
Date: Fri, 29 Sep 2006 09:38:22 -0700

On September 27, 2006 04:41 pm, Thomas Lord wrote:
> A community patent review shouldn't be limited to putting up
> the patent and asking for prior art or ex-post-facto opinions
> about what is or is not.   
> Instead, when possible, it should be a statement of the maximum
> number of non-patentable aspects of the problem definition and
> a request for people to sketch solutions and/or pony up code.
> For example, given just what we know about the IC patent
> application ---
> I'd want to look into adapting the bitmap rotation algorithm publicized
> in the book "Smalltalk-80, The Language and its Implementation"
> to convert a UTF-8 stream for bit-parallel processing.   Are there
> other fairly obvious methods?   How hard, really, are any of these
> to reduce to practice?   What about some of the other elements like
> encoding form conversion or lexical analysis?


Although I will say a bit more about our technology below, I really
would encourage that this kind of discussion be taken to the Community
Patent Review site and e-mail list.   This is an important initiative.
It has the attention and support of major IT players.   I don't know 
how the process will work out, but the intent appears to allow 
community development and discussions of questions like these.
However, it is a process defined to work within the current USPTO
framework (and with USPTO support), so legislative reform discussion
probably does not belong there.

I have been struggling to reply to you because we have quite
different e-mail styles.   You are quick, prolific and share your
opinions readily.   I try to take a more contemplative, academic
approach.  I hesitate to make judgements, particularly when all 
the evidence is not in.

I was put off by your chiding with respect to the term "code unit"
in discussing UTF-8 and UTF-16.   My usage follows directly
the standard terminology in which a UTF-8 code unit is 8 bits
and a UTF-16 code unit is 16 bits.   They do not fall in one-to-one
correspondence because characters encoded using sequences
of one, two or three UTF-8 code units correspond to a single UTF-16
code unit, while characters encoded using four UTF-8 code units
correspond to two UTF-16 code units in a surrogate pair.

I will admit that your phrase "UTF-8 to UTF-16 encoding form 
conversion" is technically more precise than "UTF-8 to UTF-16 
transcoding."   But my vernacular usage is less wordy and 

I was also put off by your comment that you suspected I was
making "apples and oranges" comparisons between my work
on SIMD-acceleration of XML and other recent work in high-performance
XML.   At least two of these groups, one at Intel (Apparao and Bhat,
Beacon 2004) and one at IBM (Kostoulas, et al, WWW 2006) refer
to SIMD techniques as potentially of value.

The straightforward application of SIMD for character processing
based on a traditional byte-oriented approach has some application
to things like memcpy, strlen and strcmp, but doesn't work out
so well with Unicode and XML.   In UTF-8 to UTF-16 conversion,
48 bytes of UTF-8 data may correspond to anywhere from 32
to 96 bytes of UTF-16 data.   ASCII characters within UTF-8
expand to 2 bytes of UTF-16 and three-byte sequences compress
to 2.   General UTF-8 data will have mixtures of varying sequence
lengths, so the problem of taking three 16-byte registers of UTF-8
data and converting them to two to six 16-byte registers of UTF-16
data involves substantial interregister and intraregister byte
movement.     Nobody has reported success that I know.

Our approach to SIMD processing involves parallel bit streams
that allow us to process 128 code unit positions at a time.
The slides Larry posted shows the nice example of UTF-8 validation
to illustrate the approach.   UTF-8 to UTF-16 conversion is more
complex, requiring substantial bit-level movement within 

You mention bit matrix rotation.   We have cited the very nice
book Hacker's Delight by Warren in our prior art submission with
respect to transposition.    Warren also presents a compress
algorithm that is useful for bit movement.   However, his method
requires a (log2 n)**2 preprocessing phase, where our SIMD methods
requires only log2 n, a substantial improvement (from 49 down
to 7) for 128 bits.

We are not trying to make our techniques a mystery.   Our plan
is to submit our patent applications to an open process of 
mediated peer review.   Separately, I am also preparing papers
to demonstrate the techniques without the "patentese", as
well as open source software.     I still have a lot of work to do.