Ambisonics is a Spatial Audio multichannel format widely employed for capturing, manipulating, transmitting and rendering complex audio scenes, where sound is coming from many directions simultaneously in a complete 3D environment. Ambisonics is the standard audio format for VR videos on Youtube and Facebook, and is employed in videogames and interactive VR experiences. Ambisonics is also the underlying technology employed in other spatial audio formats, such as Mpeg-H.
In Ambisonics, a multichannel audio file contains the whole spatial information. But instead of having each channel representing the sound coming from a discrete direction (as in other multichannel audio formats), each channel in Ambisonics represents the sound captured by a virtual microphone, possessing a quite strange directivity pattern and aiming - these directivities are mathematically defined as Spherical Harmonics functions, and constitute a hierarchical family of patterns of various orders, as shown here:
Spherical Harmonics functions of order 0 to 4.
In most cases nowadays Ambisonics recordings are perfomed at 2nd, 3rd or even 4th order (hence they have 9, 16, or 25 channels), named HOA (High Order Ambisonics) for differentiating form the obsolete First Order Ambisonics (FOA, 4 channels) employed in the past, since 1973.
HOA recordings can be done employing microphones arrays as the three shown here below:
Zylia ZM1, 3rd order (19 capsules)
Eigenmike EM32, 4th order (32 capsules)
The raw multichannel audio streams coming from these microphone arrays are usually denoted as A-format. These signals need to be converted to Ambisonics by means of an encoding filter matrix, which produces as output the signals corresponding to spherical harmonics patterns shown above. Such Ambisonics-encoded stream of signals is usually called B-format.
A number of software tools exist for creating Ambisonics signals,
manipulating them, and rendering them to various listening systems (such as
loudspeakers and headphones). Here we list a number of "plugin suites" in VST
format, which are free to operate:
O3A Core, Ambix, MCFX, IEM, Sparta, FB Spatial Workstation.
There are many others, but the ones listed above are the ones I use most frequently. These suites contain also effects to be applied to Ambisonics B-format multichannel tracks. Most of these effects are linear, hence they do not present any particular issues being applied to Ambisonics signals. However there are also some not-linear effects which a sound engineer would like to apply, such as de-noising, compression, limiting, etc.. Albeit some of these effects are available in the suites listed above, in this page we instead provide a general method for applying ANY not-linear effect to a generic High Order Ambisonics soundtrack up to order 3 (16 channels), using Adobe Audition CC, the Sparta VST plugin suite and converting the signals from Ambisonics to SPS format and back.
SPS stands for Spatial PCM Sampling, and the most general definition is that SPS is a collection of virtual microphones whose directivity patterns cover uniformly the surface of a sphere. The most basic case is a set of 8 coincident cardioid microphones pointing to the vertexes of a cube (this layout is called Mach1). Another SPS format employed successfully in the past for making colour maps is SPS-32, which is made of 32 4th-order cardioids pointing in the same directions as the capsules of the Eigenmike:
These cardioid-based old-style SPS signals are usually labelled as "P-format" (PCM format). The basic shape of the polar patterns are cardioid microphones, so the signal captured from them is always "in phase" with respect to an omni microphone (no "negative polarity" lobes such as super or hyper cardioids, or figure-of-8 microphones). Here you see a family of cardioid microphones of orders 1 to 10:
Cardioid microphones of orders 1 to 10
Recently a new class of SPS streams have been developed, making use of a more uniform arrangement of the microphone directions over the sphere, ensuring perfect coverage, and employing multi-lobed polar patterns, with negative-polarity lobes, obtained by computing "sampling ambisonics decoding" (SAD) functions. As the method employed for defining the optimal number and directions of the virtual microphones over the sphere is the "T-design" method, this special subset of SPS signals is renamed "T-format", for enforcing the fact that the directions are defined by a T-design distribution with a suitable number of channels.
The nice thing of this approach is that using T-format
(SAD virtual microphones and T-design geometry), the conversion from HOA to SPS
and back to HOA can be fully reconstructive (lossless), if enough SPS channels
The following figure compares a 4th-order cardioid (P-format 32) with the SAD virtual microphone obtained decoding a 4th-order HOA to a 36-microphones T-design array (T-format 36):
Cardiod microphone of order 4 (left) and SAD virtual microphone of order 4 (right)
It is quite evident that the SAD virtual microphone is narrower than the pure 4th-order cardioid, albeit it has significant side lobes, some of which (small, in red) have negative polarity.
However the good thing is that, using T-format streams with enough channels, a fully reconstructive conversion (B-format => T-format => B-format) is possible. For perfect reconstructivity, indeed, T-format streams generally require a larger number of channels than B-format:
|Ambisonics order||B-format channels||T-format channels|
The conversion from B-format to T-format and back are easily done employing two of the SPARTA plugins:
While in T-format, the not-linear effects of denoising, compression and limiting do not have adverse consequences on the spatial information, preserving the position in space of each sound source. Instead, applying these not-linear effects directly to a B-format stream can easily cause a complete mess of the spatial information.
Of consequence, the recommended strategy for applying these not-linear effects (which are available as standard effects in Adobe Audition CC), always remaining inside this excellent wave editor program, are the following:
Here we present an example of this processing, starting with a 2nd-order recording made with an Octomic: please note that this microphone has only 8 capsules, hence channel # 7 is empty. After the de-noising process, we expect that channel #7 will remain silent, if everything is done properly.
First we open a 9-channels, 2nd-order Ambix recording (note channel #7 being silent):
We need now to create a new 12-channels "working track",
and to copy the Ambix recording in the first 9 channels of it (disabling the last three):
Now we re-enable the last three channels, and invoke the Sparta AmbiDec plugin as follows:
Please note that T-design(12) preset was employed, and 2nd-order SAD decoder without max_rE was specified both at low and high frequency.
As expected, the resulting T-format signal makes use of all the 12 channels:
Now the signal is in T-format (SPS), so we can safely
apply any not-linear effect, for example denoising.
We select a portion of the noise at the end, after the music stopped, and apply the Audition "Noise Reduction" effect:
After capturing the noise print (clicking the Capture Noise Print button), we click on Select Entire File and click on Apply. This way the whole recording is denoised, and this is the result:
We must now reconvert back from T-format to B-format, and this requires to invoke the Sparta AmbiEnc plugin, with the following settings:
Again note that the order was set to 2, and that the preset for T-design(12) was selected. The result is an Ambix 2nd-order (9 channels) output:
Note how the noise in the last part was properly removed, and that channel #7 is silent, as expected.
We now need to get rid of those 3 additional empty channels. We deselect them (clicking on the blue labels "10", "11" and "12" on the right of the waveforms) and Edit - Copy to New the first 9 channels into a new de-noised Ambix soundtrack, which can be saved with a proper name:
For applying compression, limiting, de-hissing,
dereverberation, adaptive click/pop elimination, etc., the same procedure can be
Audition CC comes with dozens of high-quality not-linear effects, all operating perfectly on multichannel tracks up to 32 channels.
These effects are very dangerous if applied to B-format
tracks, as they can alter the delicate gain/phase balance among the spherical
harmonics signals, disrupting entirely the spatial information.
T-format, instead, is very robust to these artefacts, as any modifications of these balances simply cause the sound coming from some directions to be more or less attenuated or amplified, which is probably what is wanted.
Consider for example the case of a recording where a noisy fan was intruding, The de-noise effect will reduce significantly the gain just for the SPS microphone pointing in the direction of the fan, leaving all other sound sources unaffected.
All the contents are Copyright by Angelo Farina, 2020