From: Angelo Farina [farina@pcfarina.eng.unipr.it]
Sent: 16 November 2003 15:27
To: 'music-dsp@shoko.calarts.edu'
Cc: 'guille@atc.creative.com'; 'guille@ccrma.stanford.edu'
Subject: RE: [music-dsp] Microsoft gets a patent on partitioned
convolution

Anders,
The inventor of the "new" Microsoft algorithm is Guillermo Garcia, a researcher at the Creative Advanced Technology Center, Scotts Valley, CA, and a Ph.D candidate at the CCRMA of the Stanford University, working with prof. Julius O. Smith.
Here is his home page: http://ccrma-www.stanford.edu/~guille/
Approximately one year ago, he presented an AES paper about this algorithm:

"Optimal Filter Partition for Efficient Convolution with Short Input/Output Delay", G. Garcia, Audio Engineering Society (AES) 113th Convention, October 2002, Los Angeles, CA.
Pre-Print n. 5660

I was present during his presentation, which was quite interesting. After the presentation some questions were made, and I vaguely remember to have asked him (probably privately, after the official period for questions was finished) why he did consider the number of multiply-adds a reliable indicator of the "computational cost" of convolution algorithms, when it is now widely accepted that the number of memory accesses (exceeding the L1 cache size) is the real cost indicator, and consequently the traditional unpartitioned convolution, which is cited as the lower-cost solution in several parts of his paper, is in reality a suboptimal algorithm, which is easily outperformed by Your BruteFIR (or by Christian Knufinke's Super Impulse Reverb), employing optimized uniform partitioning.
It is interesting to note how mr. Garcia cites in his paper most of the known prior art about partitioned convolution, including Soo and Pang, Ferrara, Gardner and the Lake DSP patent. 
Both in the paper and in the Microsoft patent, it is assumed that partitioned convolution is a long-term, well known, unpatentable approach. The Gardner-Lake patent is considered only as a particular rule for determining the "optimal" size of the partitions, and it appears how, if one employs a different partitioning rule, he is not infringing the Lake patent.
The paper is mostly focused on an "optimal" algorithm for finding alternative sizes of not-uniform partitions, based on the Viterbi method. However, during the presentation, mr. Garcia also claimed to have "invented" a trick which makes uniform-size convolution much faster, that is accumulating frequency-domain blocks and later making a single IFFT for each block, instead of making a separate IFFT for each of them, and later summing their corresponding time-domain blocks.
It must be noted that he seemed to be not in knowledge of the 1988 Kulp AES paper, which clearly described exactly the same trick, providing also calculations of the "cost" of the algorithm substantially identical to those made by Garcia himself. From what he said both during the presentation and during the following answers to questions, I am quite convinced that mr Garcia did not really know about the Kulp paper, and he re-invented on his own the frequency-domain summation trick, thinking that it was completely new. The guy seemed to me absolutely sincere on this point, and he is not the kind of people who willingly stoles intellectual property of anyone else... 
Re-reading the Microsoft patent after reading mr. Garcia's paper, it is quite clear to me what are the two inventive steps claimed by Microsoft:
1) The trick about summing the frequency-domain blocks BEFORE IFFT.
2) The optimized algorithm for finding the sizes of not-uniform partitions The two things substain each other, and only together they can be claimed to be a true invention.
In fact, the Gardner (Lake) algorithm cannot employ frequency-domain summation, because it employs blocks of unequal size, and consequently the "assembly" of the results can only be performed in time domain, after several separate IFFT operations: consequently, the computational cost is high, and nowadays better results can be obtained through uniform-partitioned convolution.
On the other hand, very short latencies cannot be obtained with uniform-partitioned convolution, which is very efficient due to frequency-domain summation, but is locked to a fixed block lenght which gives the overall latency. 
But Garcia proposes a "fractal" subdivision: the first "long" block is subdivided again in N shorter, equal lenght blocks (which can be efficiently convolved with a subprocess employing again equal-size partitioned convolution), consequently the latency can be reduced with very little cost increase.
He even proposes to go further in the process, subdividing again the first of these shorter blocks in even shorter, equal size sub-blocks... This process can be repeated infinitely, providing theoretically "zero" latency without much increase in the total cost.
In substance, if the Kulp paper did not exist, the Microsoft patent would be a real inventive step, perfectly valid, not relying on the Lake algorithm, providing simultaneously high efficiency and very short latency.
Unfortunately (for Microsoft) the Kulp paper does exist, so anyone can now employ this approach without risks of actions from Microsoft...
(Incidentally, the Kulp paper also completely invalidates the EU patent recently got from Lake).
The only part of the Microsoft's patent which can survive is just the usage of the Viterbi method for optimizing the sizes of the partitions. This appears to be very clever in the Garcia's paper, but remember that he employs the "wrong" metric for assessing the "cost" of the algorithm: number of multiply-add operations instead of number of external-memory read-write operations.
Consequently, if one optimizes for the "real" maximum efficiency, instead of the number of multiply-adds (and perhaps employing a different optimization approach, Viterbi is not the only one available!), also this part of the patent can be circumvented (and the final implementation should of course be even faster than the Garcia's one, but this can only be proved comparing the actual performances on a given hardware platform).
If my above analysis is correct, this means that:
- the US patent office already recognized that substantially the Lake patent is just a "rule" for subdiving the impulse response in blocks having lenght conforming to a certain optimization approach, and if a different optimization approach is employed (as claimed by Microsoft), a novel, independent (not extension) patent can be issued.
- the main advantage of the Microsoft's optimization method is that it relies on the fact that the re-assembly of the results of multiple blocks can be performed very efficiently in frequency domain if the blocks have the same size. As this is clearly NOT Garcia's invention, because Kulp reported this in 1988 (and it was probably known well before), any other optimization algorithm different from Microsoft's one, but also based on the frequency-domain-summation benefit, should be allowed without patent infringiment.

I temporarily place a copy of Garcia's paper and of Kulp's paper on my web server, in a not-public area reserved just for Music-DSP readers:
HTTP://pcfarina.eng.unipr.it/AES-113
The papers will stay there until mr. Williams from AES discovers them, and asks me to remove them (as it usually happens after a while...) From the same directory You can also download a single PDF file containing the Microsoft patent (I do not think that this is copyrighted...)

Disclaimer: be aware that copying the AES PDF files on Your computer could be an infringiment of copyright of the Audio Engineering Society. I legally own the CDs containing the AES preprints, so I am allowed to let everyone I like to "read" them, but in general readers are not allowed to physically copy ("save") the files on their own permanent storage units.

Excuse me for the long post, far exceeding the usual size of typical Music-DSP posts.
I hope to have contributed on this topic with some first-hand information. However, I suppose that the real information could only come if mr. Garcia join us on this list, and share his knowledge with us. As he is a very clever researcher on DSP algorithms for audio processing, I suppose that some postings from him would really be illuminating for all of us...
So I am sending a copy of this message also to him (I do not like to post something about one's work without letting him know...), inviting him to read the music-dsp archives for previous messages about this topic, and hoping that he will like to explain what really is claimed in the Microsoft patent.
Bye!

Angelo Farina
University of Parma, ITALY
HTTP://pcfarina.eng.unipr.it

> -----Original Message-----
> From: music-dsp-admin@shoko.calarts.edu Yes, I re-read it and see that 
> you are right. They use the term "overlap-add"
> just for plain addition. They leave out the overlap-save and 
> overlap-add alltogether.
> 
> Anyway, the patent is incorrectly constructed. In their prior art they 
> refer to Gardner's paper, and clearly refer to partitioned convolution 
> as an existing approach. Still, they claim that prior art anyway. This 
> type of erroneous construction is not uncommon.
>