Sound Field Reconstruction
3D Immersive Audio with Six Degrees of Freedom
Algorithmic Techniques
The perception of a live music concert does not only rely on the tonal qualities of the sound but also on its spatial characteristics, including the placement of different instruments and the directions from which they are heard. Spatial audio includes a variety of techniques aimed at recreating a recorded sound environment, providing the listener with these spatial features. Spatial audio has numerous applications, especially in virtual and extended reality scenarios. Within the framework of the REPERTORIUM project, it is essential for both the real-time and offline reproduction of recorded or live concerts.
In particular, when developing a spatial audio application, the first consideration is choosing the appropriate techniques and hardware for recording, processing, and reproducing the sound environment (i.e., the concert). Many spatial audio methods rely on fundamental sound field representations, such as the plane wave decomposition or the spherical harmonic decomposition. These representations prove valuable when employing higher-order microphone arrays capable of capturing the spatial characteristics of a sound scene.
Once a suitable sound field representation is established, it can be faithfully replicated using a secondary source system. Specifically, this project concentrates on binaural rendering through headphones, in which signals are directly delivered into the listener’s ear canals, employing an acoustic model of the human head based on the listener’s Head-Related Transfer Function (HRTF). The headphone reproduction also allows for head-tracking, which contributes to a realistic experience of the sound environment, even as the listener’s head moves. In fact, this feature enables 3-Degree of Freedom (3-DoF) navigation within the rendered sound field.
Moreover, when addressing the task of online spatial rendering of concerts, the acquisition and processing of microphone array signals must be carried out in real time. Applications must operate with minimal latency, enabling the live-streaming of concerts recorded through microphone arrays placed in various positions within the performance space. Consequently, it is crucial for solutions to be efficient and lightweight while maintaining a high standard of audio quality.
Data-Driven Techniques
In recent years, we have experienced a rapidly growing development of spatial audio applications thanks to the availability of technological devices equipped with spatial distributions of microphones organised in arrays. Typically, 3D audio is addressed by means of spherical microphone arrays referred to as Higher-Order Microphones (HOM). However, customarily 3D audio is limited to a fixed location, corresponding to the recording position of the HOM. This creates a mismatch with the video modality, where technologies to enable the navigation are settled in the literature. This raised the need for the development of solutions for six degrees of freedom (6DoF) spatial audio [1,2].
6DoF spatial audio allows a user to navigate a recorded sound scene, defining a so-called “walkthrough application”. When considering the practical implementation of sound field navigation, an important aspect is related to the computational cost of the adopted sound field reconstruction algorithm. For real-time streaming of recorded sound, a straightforward solution is to apply linear interpolation of the signals acquired by a distribution of HOMs since they can be implemented through linear filtering with a limited computational cost [3]. Unfortunately, interpolation strategies are known to suffer from spectral colouration and localisation errors [4]. Parametric approaches represent an appealing solution thanks to their improved performance with respect to plain interpolation strategies [5]. For after-concert experiences, more accurate reconstruction can be achieved at the cost of increased computational complexity. Here, accurate solutions based on non-parametric models have been proposed [6]. A third approach to sound field reconstruction techniques emerged. In fact, Deep Learning paradigms proved to be effective for the reconstruction of acoustic fields [7]. Although promising, the adoption of such data-driven solutions for real-life recordings has not been investigated yet.
In REPERTORIUM, the research will first focus on the analysis of optimal microphone distributions and the effectiveness of parametric and non-parametric state-of-the-art sound field reconstruction techniques. Parametric methods describe the sound field through a compact signal model typically composed of just a few parameters, e.g., the source location or the direct and reverberant signal components. The low computational and flexibility cost make this class of techniques an appealing solution for 6DoF sound field navigation (suitable for online streaming) [8]. Unlike parametric methods, non-parametric techniques rely on the solutions of the wave equation (e.g., spherical source, planar waves, etc.) to describe the acoustic field captured by the sensors. Although accurate, such methodologies suffer from high computational costs. Recently, this problem has been mitigated using compressive sensing, which shifts the cumbersome computation to a preliminary analysis stage, while in the reconstruction, a sparse set of sources is employed with a relevant advantage in terms of computational cost [9].
In a second stage, the research effort will be placed on the development of novel hybrid sound field reconstruction solutions using HOMs and linear arrays that are flexible in terms of computational costs and hardware requirements, including data-driven methodologies [10]. Finally, techniques based on beamforming and post-filtering of the HOA signals will be developed to obtain the acoustic “minus-one”.
References
[1] T. Ciotucha, A. Ruminski, T. Zernicki, and B. Mróz, “Evaluation of Six Degrees of Freedom 3D Audio Orchestra Recording and Playback using multi-point Ambisonics interpolation,” Paper 10459, May 2021.
[2] E. Patricio, M. Skrok, and T. Zernicki, “Recording and Mixing of Classical Music Using Non-Adjacent Spherical Microphone Arrays and Audio Source Separation Algorithms,” Engineering Brief 525, October 2019.
[3] E. Patricio, M. Skrok, and T. Zernicki, “Recording and Mixing of Classical Music Using Non-Adjacent Spherical Microphone Arrays and Audio Source Separation Algorithms,” Engineering Brief 525, 2019.
[4] G. Tylka and E. Y. Choueiri, “Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones”, in Audio Engineering Society Conference: 2016 International Conference: Audio for Virtual and Augmented Reality, September 2016.
[5] J. G. Tylka and E. Y. Choueiri, “Fundamentals of a Parametric Method for Virtual Navigation Within an Array of Ambisonics Microphones”, J. Audio Eng. Soc., March 2020.
[6] S. Koyama, and L. Daudet. Sparse representation of a spatial sound field in a reverberant environment. IEEE Journal of Selected Topics in Signal Processing, 13(1), 172-184, 2019.
[7] F. Lluis, P. Martinez-Nuevo, M. Bo Møller, and S. Ewan Shepstone. Sound field reconstruction in rooms: Inpainting meets super-resolution. The Journal of the Acoustical Society of America, 148(2), 649-659, 2020.
[8] M. Pezzoli, F. Borra, F. Antonacci, S. Tubaro, and A. Sarti. “A parametric approach to virtual miking for sources of arbitrary directivity.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 28,2333-2348, 2020.
[9] M. Pezzoli, M. Cobos, F. Antonacci and A. Sarti. Sparsity-Based Sound Field Separation in The Spherical Harmonics Domain. Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022.
[10] M. Pezzoli, D. Perini, A. Bernardini, F. Borra, F. Antonacci, and A. Sarti. Deep Prior Approach for Room Impulse Response Reconstruction. Sensors, 22(7), 2710, 2022.