Latency vs. performance
lThe software implementation is based on frequency-domain convolution (overlap-and-save), which inherently introduces some latency.
lFurthermore, the audio stream I/O on a PC is always buffered, so an intrinisic latency is caused by the buffer size
lBruteFIR distinguishes himself from other convolvers by the fact that it implements partitioned convolution: the impulse response is subdivided in many segments of equal length, and this reduces the latency to twice the length of a segment, instead of twice the length of the whole IR.
lOn modern CPUs, the partitioned convolution is more efficient than traditional unpartitioned overlap-and-save, with a reduction of CPU load of 20-50%, and can reduce the overall latency to less than 100 ms.
lVery efficient FFT implementations are freely available (Intel NSP, FTTW), and thus the computing power of a PC is enough for real-time convolution of 20 IRs, at 44.1 KHz, 32 bits, each being 65,536 points long. The demonstration machine, installed in room 22, is an old Pentium-II 400 MHz.
l