How Is Audio Transmitted In Multimedia?

In multimedia, audio is transmitted by converting analog sound waves into a digital format, compressing the data with a codec, and sending the data packets over a network using a streaming protocol. The end-user device then reverses this process to play the sound.

The digitization process

Sound is a continuous, analog wave that must be converted into a discrete digital format that a computer can understand. This process is handled by an Analog-to-Digital Converter (ADC).

Sampling: The ADC takes thousands of "snapshots," or samples, of the analog waveform every second. The sampling rate determines how many snapshots are taken. For example, CD-quality audio is typically sampled at 44.1 kHz, meaning 44,100 samples are taken per second. A higher sampling rate captures more detail and produces a more accurate representation of the original sound.
Quantization: Each sample's amplitude (the height of the wave) is measured and converted into a numerical value. The number of possible values is determined by the bit depth. A higher bit depth provides a wider dynamic range and greater precision. For instance, 16-bit audio offers 65,536 possible amplitude values per sample.
Creating digital data: The result is a stream of binary data ($0$s and $1$s) that represents the audio signal. For stereo sound, two independent streams are created.

Encoding and compression

The raw digital audio file is often very large. To reduce file size and bandwidth requirements for efficient storage and transmission, especially over networks, the audio is compressed using a codec (coder/decoder).

Lossy compression: This method permanently discards some audio data to achieve significant file size reduction. The discarded information is typically less perceptible to the human ear. It is used for most audio streaming and distribution.
- MP3: One of the most common and widely supported lossy formats.
- AAC (Advanced Audio Coding): Offers better audio quality at similar bitrates compared to MP3 and is the standard for formats like iTunes, YouTube, and digital radio.
- Opus: An open-source, highly versatile codec optimized for low-latency interactive applications like VoIP and video conferencing.
Lossless compression: This method reduces file size without discarding any data. The decoded audio is a perfect, bit-for-bit replica of the original.
- FLAC (Free Lossless Audio Codec): A popular open format preferred by audiophiles for its pristine quality.
- ALAC (Apple Lossless Audio Codec): A similar format commonly used in the Apple ecosystem.
Uncompressed: Some formats, like WAV and AIFF, contain uncompressed audio. These formats are used in professional audio production where fidelity is the top priority, but they result in very large file sizes.

Transmission over networks

The compressed digital audio data is then sent over a network using a specific streaming protocol. These protocols define how the data is packaged into "chunks" or packets, transmitted, and reassembled on the receiving end.

HTTP-based streaming: This is the standard method for most online audio and video streaming. It uses common web servers to deliver media.
- HLS (HTTP Live Streaming): Developed by Apple, it segments audio (and video) into small files and is highly compatible across devices and browsers.
- MPEG-DASH (Dynamic Adaptive Streaming over HTTP): An international standard for adaptive streaming that works similarly to HLS but is codec-agnostic.
Real-time protocols: These are designed for low-latency, real-time communication.
- WebRTC (Web Real-Time Communication): Enables direct, peer-to-peer audio and video communication within web browsers, perfect for video conferencing.
- SRT (Secure Reliable Transport): An open-source protocol that ensures secure, low-latency, high-quality streaming even over unpredictable networks, often used in professional broadcasting.
Older protocols:
- RTMP (Real-Time Messaging Protocol): Historically used for streaming with Adobe Flash, it is now primarily used for ingesting live streams into a media server before conversion to more modern protocols like HLS or DASH for delivery.
- RTSP (Real-Time Streaming Protocol): Once used for playback, it is now mostly used in surveillance and CCTV systems.

Decoding and playback

When the audio data packets arrive at the end-user's device, the process is reversed.

Buffering: Packets are temporarily stored in a buffer to absorb network fluctuations and ensure a constant playback rate.
Decoding: The codec on the receiving end (the "decoder" part of the codec) decompresses the audio data back into its original digital form.
Digital-to-Analog Conversion (DAC): A DAC converts the digital audio information back into an analog electrical signal.
Playback: This electrical signal is sent to a speaker or headphones, which convert it into the audible sound waves you hear.

Enjoyed this article? Share it with a friend.