If you are not a programmer do not be scared to read it as it is also understandable for non-programmers!
So thank you so much Andreas for taking the time to write and share with us all!
Latency in audio applications is probably one of the most discussed and also one of the most annoying issues on the Android platform. Understanding and handling latency the right way can be a mighty jungle, especially if you’re a “normal” developer, and not a scientist.
This article is focused on output latency on Android devices, not input or round-trip latency. Hopefully someday I’ll be able to write about input latency as well, but so far input and round-trip was no issue in my applications. So the term latency in this article is always meant as output latency. Also please forgive me when I forget some scientific details. It is neither my goal nor am I able to write a scientific paper about latency on Android. What you read is my personal experience with the different aspects of output latency on the Android platform.
Output Latency, what is it?
In short, output latency is the time from the moment you press a button or a piano key until you hear the sound from the speakers. And, output latency in audio applications is something we all want to get rid of.
The complete output latency in a musical application, which includes live playing, is a combination of the following 3 main factors:
1. Control Input Latency (e.g. display reaction time)
2. Application Latency (everything that happens in the app layer)
3. Audio System Latency (everything that happens in the system layer)
Control Input Latency (e.g. display reaction time)
The Control Input latency is the time from the moment you touch the screen (or an external MIDI Keyboard) until the audio system gets notified by the Android OS to do something. It is influenced by various factors, which strongly depend on your device and Android version. It can vary from a few milliseconds up to 300ms or even more. The Control Input Latency is under full control of the Android OS and the underlying hardware. There’s no way to optimize or measure it from inside an app. But can get rid of a good part of it by using a MIDI controller/keyboard. The reaction time of an external MIDI keyboard is usually around 30-40ms faster than the on screen controls. This may surprise you, but the undisputed king regarding display reaction time is still the Google/Samsung Galaxy Nexus (2011).
Audio Output Latency (everything after the Control Input Latency)
The Audio Output Latency is the time from the moment when an application starts to play a sound until you hear it from the speakers. The Audio Output Latency is hardware, operating system and app dependent. A good part of it can be optimized from inside the app (as long as the hardware and operating system allows it). The Audio Output Latency can vary from ~35ms up to over 250ms. Sure, there are apps that report latencies down to 10ms, but this is not the complete thing (more about this later).
Application Latency (everything that happens in the app layer)
“Application Latency” is not an official term. I call it that way that way because it happens in the main application, the audio app. Meant is the time from the moment when an application starts to play a sound (technically when it starts to fill an audio buffer) until it is passed (enqueued) to the underlying audio system (AudioTrack or OpenSLES). This part is under direct control of the audio application. It depends on the defined audio system main buffer size and the app internal buffering.
AudioTrack is the out of the box system, which guarantees to run stable on every Android device.
It is not thought to be used in real time audio applications, but since it’s the one and only ready-to-use system, it is used in most audio apps. AudioTrack has a device dependent minBufferSize which can be obtained by invoking AudioTrack.getMinBufferSize(). In short, AudioTrack has the full control over the minBufferSize as well as over the way the buffers are handled (once a buffer is passed to the AudioTrack system). The lowest ever reported minBufferSize by AudioTrack comes from the Google/Samsung Galaxy Nexus (2011) and corresponds to an application latency of 39ms at a sample rate of 44100Hz. More likely on modern non-Nexus devices are minBufferSizes around 80ms. Using smaller buffers with AudioTrack than the reported minBufferSize usually results in an initialization error.
The native OpenSLES system on the other hand allows more control. The buffer size as well as the way the buffers are handled is under responsibility of the app developer. OpenSLES allows smaller buffers than AudioTrack, of course only as long as a device can handle it. The smallest well working OpenSLES buffer size in the G-Stomper environment corresponds to an application latency of 10ms on Android 5.x and 20ms on Android 4.4 (both with a Nexus 9).
The application latency can be calculated with a simple formula:
= audioTrackByteBufferSize * 1000 / sampleRateHz / bytesPerSample / numChannels
= internalFloatBufferSize * 1000 / sampleRateHz
Now take the max of these two values and have the Application Latency.
On the Android platform, this value can vary from ~10ms up to ~200ms.
Audio System Latency (everything that happens in the system layer)
One of the biggest mistakes regarding output latency is the fact that most apps on report only the Application Latency. This looks of course nice (e.g. Nexus 7 2013/AudioTrack: 40ms), but it is only half the truth.
The moment a buffer is passed to AudioTrack for example does actually only mean that the buffer was enqueued to the AudioTrack internal buffer queue. But you never know exactly how much time will pass before the buffer will actually come out as a sound from the speakers. The time from the moment when a buffer is passed to the audio system until you actually hear it from the speakers, is what I call the “Audio System Latency”.
The Audio System Latency comes in addition to the Application Latency and strongly depends on the audio system internal buffer pipeline (buffer queue, resampling, D/A conversion, etc.). Regarding low latency, this is the most significant part of the latency chain, which reveals the obvious problem of AudioTrack. With AudioTrack, you don’t have any control over its internal buffer pipeline, and there’s no way to force a buffer to pass it more quickly. What you can do is to prepare the buffers as final as possible, e.g. do the resampling in the audio application and pass the buffers always at the systems native sample rate. Unfortunately this does not change the latency, but it avoids glitches due to Android internal resampling.
I’ve measured Audio System Latencies of over two times more than the Application Latency. In other words, if the Application Latency is 80ms, it can easily be that the full output latency is more than 240ms, which is ridiculous for a real time application.
What did Samsung in their Professional Audio SDK to achieve such low latencies?
I’m no scientist, but it’s quite obvious that they reduced the audio pipeline (application to speaker) to a minimum, and they did a very good job with impressive results. Unfortunately the SDK is for Samsung devices only, but it’s for sure a great pioneer work, and maybe it’ll motivate others to catch up. There’s a nice video presentation of the Samsung Professional Audio SDK on YouTube: https://www.youtube.com/watch?v=7r455edqQFM
For (supported) Samsung devices, it’s definitely a good thing to consider the integration of their SDK.
What can you do as an app developer to get a faster audio pipeline?
Go native! Using the native OpenSLES reduces the Audio System Latency significantly. Even if you work with the same buffer size as with AudioTrack, you’ll notice a big difference, especially on newer Android versions.
Using OpenSLES does not implicitly mean “low latency”, but it definitely allows lower latencies than AudioTrack, because all audio buffers are written directly to the audio hardware, without the AudioTrack API and Dalvik/ART runtime overhead. This means the audio pipeline is shorter and therefore faster.
“The Audio Programming Blog” provides good tutorials regarding the OpenSLES integration:
Also helpful is this article on GitHub:
The “Google I/O 2013 - High Performance Audio” presentation gives a good overview about low latency audio on Android in general.
Will the G-Stomper apps get OpenSLES support?
Yes, definitely. Actually, OpenSLES is already integrated as an experimental additional audio system. In the current version 4.0.4, it is exclusively available for Nexus devices. The upcoming version 4.0.5 will introduce the OpenSLES to all 1ghz Quad-Core (or faster) devices running on Android 4.2 (or higher). The default will still be AudioTrack, but users with supported devices will get a notification and will be able to manually switch to OpenSLES in the G-Stomper setup (Setup dialog / Audio / Audio System / OpenSL).
How can the full output latency get measured?
Unfortunately there’s no proper way to automatically calculate the full output latency (Control Input Latency + Application Latency + Audio System Latency) from inside an app. The only way to get real numbers is to measure it.
There’s an article on android.com, which shows a way to measure the full audio output latency (Application Latency + Audio System Latency) in use of an oscilloscope and the device’s LED indicator:
But honestly, by far not everyone has that equipment.
Here’s a simple way to measure full output latency:
The only things you need are a microphone, a PC with a graphical audio editor installed, and an Android device. While recording on the PC, hold the microphone close to the screen, and tap some button or piano key on the Android screen, which is supposed to play in a sound. Be sure to tap the screen hard enough, so that the tap is audible. The microphone will record both, the physical finger tap and the audio output. Then, in the audio editor on the PC, measure the gap between the two peaks (finger tap and audio output).
Be sure to make more than one recording and take the average of the measured times. Especially the display reaction time may vary over multiple taps.
There might also be a quite significant difference between the head phone jack (with an external speaker connected) and the device internal speakers. Using the device internal speakers may result in higher latencies because of the post processing, which is usually done for internal (low quality) speakers.
This is definitely not the most scientific and also not the most precise approach, but it’s precise enough to give you an idea of the real output latency. You’ll be surprised by the results.