top of page
Writer's pictureHui Wang

4. Spatial Audio: Immerse your app in spatial audio


Apple's spatial audio is to precisely place the surround channels in the right position, so that users can feel an immersive surround sound experience by turning their heads or moving their devices.


This simulation is not just a traditional surround sound effect, but simulates the iOS device as a sound device in a fixed position in space.


In a way, spatial audio is more like "localization" of sound


It's a revolutionary technology that brings a cinematic surround sound listening experience to mobile devices.


So what I will share and introduce to you today is spatial audio, related APIs, supported devices, audio formats, and Up-Mix (It can render ordinary stereo into spatial audio).


Let me present these in four parts:



What is Spatial Audio?

Disadvantages of stereo

Before introducing spatial audio, let's take a look at ordinary stereo technology. Stereo has two disadvantages in terms of listening experience:


  1. Stereo generally uses two channels to construct a three-dimensional sound, but due to the lack of position information, the soundstage we perceive is extremely limited, whether we use headphones or stereo speakers, we cannot hear the sound behind, the sound in front sound, overhead, etc., so stereo makes it difficult to create a lifelike surround sound experience.

  2. When we wear headphones, the sound is emitted from the tiny speakers in the headphones, and the position of the sound source is fixed relative to our head. When we move our head, these speakers also move, and it sounds like the position of the sound is always the same. In change, this listening experience is called the "in-head experience", which is different from the experience we have of listening to sound in theaters and in reality. In these scenes, even if we turn our heads, it sounds like the position of the sound source is fixed.


Spatial audio

Spatial audio is a technology that can bring a surround sound listening experience. Surround sound is based on psychoacoustic, using Sound localization, by subtly changing the audio parameters received by the left and right ears, such as loudness, frequency, time domain, etc., can create an engaging virtual space sound field. Using spatial audio technology, developers can provide a surround music experience like being in a concert, or create a full-motion video game with interactive scenes, bringing players an immersive adventure experience.


Spatial audio out-of-head experience

Unlike the stereo experience, when using spatial audio, the virtual sound field we perceive is static and does not move with the movement of our head. When using a headset that supports spatial audio, the inner measurement units in the playback device will measure the current movement state information of the head to determine the listener's head posture, and then according to the change of the user's head posture, the playback device will dynamically Adjusting the audio rendering output to maintain a static sound field effect, this experience is called out-of-head experience, which will make people feel that the position of the sound source is fixed, so that it is more natural and realistic, vivid and rich. The spatial audio effect is maintained even on a turning bus or in an inclined airplane.



Spatial audio bandwidth adaptation


One of the problems that must be faced when providing spatial audio services is that the user's bandwidth is limited and unstable. Audio resources that support surround sound generally have multiple channels, such as 5.1 channel, 7.1 channel, Dolby Atmos, etc. Spatial Audio has the best support for these multi-channel audio sources. However, due to more channels and higher bit rates, the volume of multi-channel audio resource files is also much larger, which will occupy more bandwidth during transmission. If the video service is provided, in a limited bandwidth environment, the multi-channel audio will also compress the picture quality, resulting in a poor audio-visual experience.


So is there any way to provide spatial audio effects without the oversize problem?


The good news is that Apple uses Up-Mix technology to render stereo audio sources into a 5.1-channel-like audio effect, which is a much smaller file size than stereo sources, and many applications may not have multi-channel audio sources, but rather more general stereo sources. With Up-Mix technology, the app can use an existing library of stereo audio sources to provide users with spatial audio, and according to Apple, the results after Up-Mix are outstanding.



Apple recommends using the HLS protocol for distributing audio resources. HLS is a streaming protocol with good bandwidth adaptability, which can distribute different versions of audio resources to adapt to different bandwidth conditions. Based on HLS and Up-Mix technologies, developers can provide bandwidth-adaptive spatial audio services:


  • When the bandwidth of the user is relatively good, multi-channel audio resources can be directly delivered to bring the best listening experience to the user. In addition, Apple natively supports spatial audio for multi-channel audio resources, and it does not require any changes at the software level to directly deliver multi-channel audio resources.


  • When the user's bandwidth is insufficient to provide a high-quality listening experience, the audio is seamlessly downgraded to a stereo source, and spatial audio processing is provided using up-mix technology. If head tracking was provided before the downgrade transition, the tracking effect will continue to be maintained. Later, if the bandwidth is restored, full multi-channel spatial audio processing is also restored.


Loudness equalization

In order to adapt to user bandwidth, the HLS protocol will deliver different versions of audio resources. The metadata of codecs such as AAC, xHE-AAC, and Dolby Digital Plus contains some important parameters of loudness equalization. For example, the dialnorm in Dolby E is used to determine the volume value of the decoder output, the DRC Metadata in xHE-AAC is used to dynamically adjust the audio output amplitude, suppressing the volume when the volume is relatively high and moderately increasing the volume when the volume is low. Apple recommends providing appropriate parameter values when distributing audio resources, so as to balance the loudness between these different versions of resources and achieve the effect of seamless switching. See Apple's official website for more information.



API


Ultimately whether or not spatial audio is enabled depends on several parts:

  • Developers decide whether their audio content supports spatial audio

  • Whether the playback device supports spatial audio

  • Whether the user has spatial audio enabled in the control center or in the Bluetooth settings

Apple provides several APIs to set, query, or listen to these parts.


Set the spatial audio resource formats allowed by the app

As I mentioned before, spatial audio natively supports multi-channel audio, and can also support stereo audio through Up-Mix. Developers can configure the spatial audio resource formats allowed by their apps through the allowedAudioSpatializationFormats property in AVPlayerItem or AVSampleBufferAudioRenderer, the specific values are as follows:


The values of AVAudioSpatializationFormats are as follows:


Declare spatial audio support

In AVAudioSession, developers can declare to the system that your application supports multi-channel spatial audio through setSupportsMultichannelContent. If the user does not enable the spatial audio option in Control Centre or Bluetooth settings, the system will display a prompt when playing multi-channel content. However, at present, Apple has not provided specific information in what form this prompt will be presented. If your app uses AVPlayer, the system will manage this prompt for you.


Check if spatial audio is enabled and monitor changes

Use the parameter isSpatialAudioEnabled to check whether the audio route supports spatial audio:


When the audio route changes, the audio session will send AVAudioSession.routeChangeNotification notification, we can add the listener to monitor it, check the isSpatialAudioEnabled property, and then make corresponding processing.


When the user changes the spatial audio preference in Control Centre or Bluetooth settings, AVAudioSession will send a spatialPlaybackCapabilitiesChangedNotification, we can obtain the information about the spatial audio enabled status carried by the notification through AVAudioSessionSpatialAudioEnabledKey:



Feature Availability


Let's take a look at which devices support spatial audio from both a hardware and operating system perspective.


Playback devices with spatial audio playback capabilities include:
  • Built-in speakers: MacBook, iPhone, iPad Pro after 2018

  • Built-in speakers: 2020 iPad Pro

  • Built-in speakers: 2021 iMac

  • With AirPods Pro/AirPods Max, you can experience spatial audio on these devices that do not natively support spatial audio playback capabilities, for example, iPhone and iPad after 2016

  • AirPods Pro and AirPods Max

It should be noted that if you want to enable the spatial audio capabilities of AirPods Pro/AirPods Max, the system versions of the devices connected to them must be at least macOS Big Sur, iOS 14, iPadOS 14 and above.


Also, there is one more thing to be aware of. At WWDC, Apple did not classify Apple Silicon Mac into the category of built-in speakers that support spatial audio, but into the category that needs to be used with AirPods Pro/AirPods Max, and the note only supports audio with video, that is, audiovisual.


The availability of spatial audio in terms of operating systems is as follows:

Under macOS Catalina, iOS 13, and iPad OS 13, these systems support the 2018 and later year models of MacBook, iPhone, and iPad Pro with spatial audio support for the following audio playback methods via built-in speakers.

  • AVPlayerItem

  • Any http scheme specified by the WebKit <video> tag

By default, the available multi-channel audio is selected for spatial audioization.


In macOS Big Sur, iOS 14, and iPadOS 14, Apple has added support for AirPods Pro and AirPods Max earphones, which provide spatial audio support for 2016 and later iPhone and iPad devices.


By default, the available multi-channel audio is still selected for spatial audioization.


Finally, in the latest macOS Monterey, iOS 15, iPadOS 15, and tvOS 15, Apple provides spatial audio support for the following audio playback ways:

  • AVPlayerItem

  • AVSampleBufferAudioRenderer

  • WebKit extension

  • W3C extension

  • MSE extension. Please note that there is no corresponding API support for MSE, but the availability of spatial audio support can be queried in the AudioConfiguration dictionary through the Media Capabilities API.

By default, spatial audioization is provided for all mono, stereo, and multi-channel sources when available and possible. For playback scenes where AVSampleBufferAudioRenderer only contains audio, only spatial audioization of multi-channel audio is provided by default.



Resources:


DRC Metadata


HLS Authoring Specification for Apple Devices


 

Follow me on:

Comments


bottom of page