Until recently, technologies for coding audio signals, such as redundancy reduction and sophisticated source and receiver models did not incorporate spatial characteristics of source and receiving ends. Spatial audio coding achieves much higher compression ratios than conventional coders. It does this by representing multi–channel audio signals as a downmix signal plus side information that describes the perceptually–relevant spatial information.
Written by experts in spatial audio coding, Spatial Audio Processing:
- reviews psychoacoustics (the relationship between physical measures of sound and the corresponding percepts) and spatial audio sound formats and reproduction systems;
- brings together the processing, acquisition, mixing, playback, and perception of spatial audio, with the latest coding techniques;
- analyses algorithms for the efficient manipulation of multiple, discrete and combined spatial audio channels, including both MP3 and MPEG Surround;
- shows how the same insights on source and receiver models can also be applied for manipulation of audio signals, such as the synthesis of virtual auditory scenes employing head–related transfer function (HRTF) processing and stereo to N–channel audio upmix.
Audio processing research engineers and audio coding research and implementation engineers will find this an insightful guide. Academic audio and psychoacoustic researchers, including post–graduate and third/fourth year students taking courses in signal processing, audio and speech processing, and telecommunications, will also benefit from the information inside.
1.1 The human auditory system.
1.2 Spatial audio reproduction.
1.3 Spatial audio coding.
1.4 Book outline.
2.2 Spatial audio playback systems.
2.2.1 Stereo audio loudspeaker playback.
2.2.2 Headphone audio playback.
2.2.3 Multi–channel audio playback.
2.3 Audio coding.
2.3.1 Audio signal representation.
2.3.2 Lossless audio coding.
2.3.3 Perceptual audio coding.
2.3.4 Parametric audio coding.
2.3.5 Combining perceptual and parametric audio coding.
2.4 Matrix surround.
3 Spatial Hearing.
3.2 Physiology of the human hearing system.
3.3 Spatial hearing basics.
3.3.1 Spatial hearing with one sound source.
3.3.2 Ear entrance signal properties and lateralization.
3.3.3 Sound source localization.
3.3.4 Two sound sources: summing localization.
3.3.5 Superposition of signals each evoking one auditory object.
3.4 Spatial hearing in rooms.
3.4.1 Source localization in the presence of reflections: the precedence effect.
3.4.2 Spatial impression.
3.5 Limitations of the human auditory system.
3.5.1 Just–noticeable differences in interaural cues.
3.5.2 Spectro–temporal decomposition.
3.5.3 Localization accuracy of single sources.
3.5.4 Localization accuracy of concurrent sources.
3.5.5 Localization accuracy when reflections are present.
3.6 Source localization in complex listening situations.
3.6.1 Cue selection model.
3.6.2 Simulation examples.
4 Spatial Audio Coding.
4.2 Related techniques.
4.2.1 Pseudostereophonic processes.
4.2.2 Intensity stereo coding.
4.3 Binaural Cue Coding (BCC).
4.3.1 Time frequency processing.
4.3.2 Down–mixing to one channel.
4.3.3 Perceptually relevant differences between audio channels.
4.3.4 Estimation of spatial cues.
4.3.5 Synthesis of spatial cues.
4.4 Coding of low–frequency effects (LFE) audio channels.
4.5 Subjective performance.
4.6 Generalization to spatial audio coding.
5 Parametric Stereo.
5.1.1 Development and standardization.
5.1.2 AacPlus v2.
5.2 Interaction between core coder and spatial audio coding.
5.3 Relation to BCC.
5.4 Parametric stereo encoder.
5.4.1 Time/frequency decomposition.
5.4.2 Parameter extraction.
5.4.4 Parameter quantization and coding.
5.5 Parametric stereo decoder.
5.5.1 Analysis filterbank.
5.5.5 Synthesis filterbanks.
5.5.6 Parametric stereo in enhanced aacPlus.
6 MPEG Surround.
6.2 Spatial audio coding.
6.2.2 Elementary building blocks.
6.3 MPEG Surround encoder.
6.3.2 Pre– and post–gains.
6.3.3 Time frequency decomposition.
6.3.4 Spatial encoder.
6.3.5 Parameter quantization and coding.
6.3.6 Coding of residual signals.
6.4 MPEG Surround decoder.
6.4.2 Spatial decoder.
6.4.3 Enhanced matrix mode.
6.5 Subjective evaluation.
6.5.1 Test 1: operation using spatial parameters.
6.5.2 Test 2: operation using enhanced matrix mode.
7 Binaural Cues for a Single Sound Source.
7.2 HRTF parameterization.
7.2.1 HRTF analysis.
7.2.2 HRTF synthesis.
7.3 Sound source position dependencies.
7.3.1 Experimental procedure.
7.3.2 Results and discussion.
7.4 HRTF set dependencies.
7.4.1 Experimental procedure.
7.4.2 Results and discussion.
7.5 Single ITD approximation.
7.5.2 Results and discussion.
8 Binaural Cues for Multiple Sound Sources.
8.2 Binaural parameters.
8.3 Binaural parameter analysis.
8.3.1 Binaural parameters for a single sound source.
8.3.2 Binaural parameters for multiple independent sound sources.
8.3.3 Binaural parameters for multiple sound sources with varying degrees of mutual correlation.
8.4 Binaural parameter synthesis.
8.4.1 Mono down–mix.
8.4.2 Extension towards stereo down–mixes.
8.5 Application to MPEG Surround.
8.5.1 Binaural decoding mode.
8.5.2 Binaural parameter synthesis.
8.5.3 Binaural encoding mode.
9 Audio Coding with Mixing Flexibility at the Decoder Side.
9.2 Motivation and details.
9.2.1 ICTD, ICLD and ICC of the mixer output.
9.3 Side information.
9.3.1 Reconstructing the sources.
9.4 Using spatial audio decoders as mixers.
9.5 Transcoding to MPEG Surround.
10 Multi–loudspeaker Playback of Stereo Signals.
10.2 Multi–channel stereo.
10.3 Spatial decomposition of stereo signals.
10.3.1 Estimating ps,b, Ab and pn,b.
10.3.2 Least–squares estimation of sm, n1,m and n2,m.
10.3.4 Numerical examples.
10.4 Reproduction using different rendering setups.
10.4.1 Multiple loudspeakers in front of the listener.
10.4.2 Multiple front loudspeakers plus side loudspeakers.
10.4.3 Conventional 5.1 surround loudspeaker setup.
10.4.4 Wavefield synthesis playback system.
10.4.5 Modifying the decomposed audio signals.
10.5 Subjective evaluation.
10.5.1 Subjects and playback setup.
10.5.3 Test method.
Frequently Used Terms, Abbreviations and Notation.
Terms and abbreviations.
Notation and variables.
Christof Faller received an MS (Ing) degree in electrical engineering from ETH Zurich, Switzerland, in 2000, and a PhD degree for his work on parametric multi–channel audio coding from EPFL, Switzerland, in 2004. From 2000 to 2004 he worked in the Speech and Acoustics Research Department at bell Laboratories, Lucent Technologies and Agere Systems (a Lucent Company), where he worked on audio coding for digital satellite radio, including parametric multi–channel audio coding. He is currently a part–time postdoctoral employee at EPFL. In 2006 he founded Illusonic LLC, an audio and acoustics research company. Dr Faller has won a number of awards for his contributions to spatial audio coding, MP3 surround, and MPEG surround. His main current research interests are spatial hearing and spatial sound capture processing, and reproduction.