Audio formats
Until the modern Web audio programs appeared, file formats such as AU, AIFF, WAV and MIDI accounted for most of the sound heard on the Web. Many of the formats described below use some or all of three elements: intelligent compression schemes to reduce file
size, a server to stream content, and a player (or plug-in) to allow playback on the end-user's computer.
Pseudo-streaming of files occurs when the file is cached to disk and can begin playing before the file has fully downloaded. True streaming, on the other hand, occurs when a part of the file is loaded into a buffer in the computer's memory and plays from there as it is streamed, without saving the file to the listener's computer at all.
The list of formats below is by no means an exhaustive survey of audio file formats used on the Internet. As with much else on the Web, there are many different solutions and competing technologies which have been developed to address the problems of file size and bandwidth.
AIFF and AIFF-C (.aif, .aiff, .aifc)
AIFF
stands for Audio Interchange File Format and was developed by Apple for storage
of sounds. The Macintosh OS includes support for playing and
creating AIFF files. AIFF is a flexible file
format, allowing the specification of arbitrary sampling rates, sample size,
number of channels, and application-specific format chunks. More
info
Amiga IFF (.iff)
The Amiga Interchange File Format is used to transfer documents to and from Commodore Amiga computers.
IFF is an 8-bit sound format.
Audio CD
CD audio tracks can be read
directly from an audio CD by converting the track files. Since CD audio data is
44.1 kHz, 16-bit, stereo, the resulting files can be quite large.
AVR (.avr)
Created by Audio Visual
Research, this is a popular sound file format on the 680x0-based Atari ST
computers. It can contain data of any sampling rate in mono or stereo at 8 or
16-bits.
DVI ADPCM (.adpcm)
This is the Intel/DVI
ADPCM (Adaptive Differential Pulse Code Modulation) format. It is a 4-to-1
compressed 16-bit file format. It is unique among the various ADPCM formats in
that it's very fast, and like all ADPCM formats it is lossy.
GSM 6.10 (.gsm, .au.gsm)
This
compression algorithm is the European GSM 06.10 standard for full-rate speech
transcoding, prI-ETS 300 036, which uses RPE/LTP (residual pulse
excitation/long-term prediction) coding at 13 kbit/s. It was developed for the
European digital cellular phone system to make the most of tight bandwidth. It analyzes and derives a mathematical
formulation of small sections of speech using a model of the human vocal tract,
and optimized for speech reproduction and is used in many
Internet phone applications..The ".au.gsm" format consists of a series of 33-byte
frames sampled at a mono 8000 Hz.
IMA ADPCM
This is a cross-platform standard from
the Interactive Multimedia Association for sound playback. The basic algorithm
is the same as in DVI ADPCM. Apple and Microsoft store their data in
different ways. Both mono and stereo sounds are
supported at an arbitrary sampling rate; however, the compression algorithm only
accepts 16-bit samples. Read also the
differences between Apple and Windows IMA-ADPCM compressed sound files.
MIDI (.mid, .midi, .kar)
Musical Instrument Digital
Interface is primarily a standard for communication between musical
instruments. General MIDI (GM) is a standard for storing compositions based on
what events happened during the performance. It does not contain digitized audio
data; instead, it stores only the information about which notes were played in a
time-line format. QuickTime 2.0 and later supports General MIDI data in
QuickTime movies.
The general MIDI format is the evolution of the MIDI standard created in 1986 after an agreement passed with all the different sound companies, they wanted to make an interface which was able to connect all the music products. The GM (General MIDI) also called MMA (USA) or JMSL (JAPAN) is officially launched in 1991 after the appearance of the sequencer and the musical computer getting more popular.
The most important advantage of a MIDI system is that you can plug several devices together.
Any MIDI device has three connectors:
- The first is "MIDI IN"
- The second is"MIDI OUT"
- The third is"MIDI THRU"
The socket "MIDI IN" is the input where the MIDI information arrives.
The socket "MIDI OUT" is the output to send the MIDI data to the other devices.
The socket "MIDI THRU" is just an output where the information travels.
The other advantage is that all the MIDI devices are compatible, we can find sequencers, rhythm boxes, sound mixers.
The information which can be carried with the MIDI format are the frequency, the pitch, the rhythm and different tones.
MOD (.mod, .s3m, .mtm)
MOD files originated on the
Amiga, but because of their flexibility and the extremely large number of MOD
files available, MOD players are now available for a variety of machines (IBM
PC, Mac, Sparc Station, etc.). This is not
really a sound format but a music format. It stores digitized instruments and
contains a musical score which produces a lengthy composition with a very small
amount of data.
MPEG Audio (.mp, .mp2, .mp3, .m1a, .m2a, .mpg,
.mpeg, .swa)
MPEG stands for the "Moving Picture Experts Group",
working under the joint direction of the International Organization for
Standardization (ISO) and the International Electro-Technical Commission (IEC).
This group works on standards for the coding of moving pictures and associated
audio. MPEG audio files can be either layer I, II or III. Increasing layer
numbers add complexity to the format and require more effort to encode and
decode. However, they also provide higher playback quality for the sample bit
rate. MPEG files come in 3 flavors, MPEG-1, MPEG-2 and MPEG-3. MPEG data can be in stereo or mono and
decompresses to 16-bit resolution. MPEG compression is a lossy algorithm based
on perceptual encodings, which can achieve high rates of compression without a
noticeable decrease in quality. Typical compression rates are around 10-to-1.
MP3 (MPEG-1, Layer III)
Uses a compression ratio capable of bringing file sizes down to approximately a megabyte a minute. MP3 uses a lossy compression scheme that removes information that is largely beyond the human hearing range. These techniques contribute to the near-CD audio quality that has made the MP3 format extremely popular. With a suitable server/player MP3 can also be streamed. MP3 stands for MPEG 1 layer 3, which is a very good way to save music files into computer format.
MP3s are usually
10-12 times smaller than normal 44Khz 16-bit stereo wav files.
PARIS (.paf)
Native format for
the Ensoniq PARIS digital audio editing system. PARIS stands for
"Professional Audio Recording Integrated System." It can contain 8, 16 and 24-bit data in mono or
stereo.
QT (QuickTime)
Movies can be created without a video channel and used as a sound format. QuickTime accepts different sample rates, bit depths, and beginning with version 3.0, was the first format to offer full functionality in Windows as well as Mac OS. QuickTime 4.0, which allows for considerable compression, supports streaming audio and video, while earlier versions support pseudo-streaming of files.
RealAudio (.ra, .ram)
True streaming technology (a single point in the broadcast can be chosen and the previous material will not load) saves time.
Supports streaming audio. RealAudio produces significant file size reductions. The latest versions of their server and player software are capable of handling multiple
encoding of a single file, allowing different versions (and qualities) to be served up to the user depending upon the bandwidth they have available.
RMF (Rich Music Format)
Beatnik's audio file format is unusual in that it can contain recorded audio and MIDI sequences at the same time. File sizes are usually extremely small and the audio required for a Web site's interface can be downloaded in a single file. Beatnik's Player and JavaScript Music Object are required to play back RMF files. Beatnik's JavaScript library allows the Beatnik Player to be scripted in order to produce interactive audio on the page.
Shockwave Audio (.swa)
Produces high quality and small file sizes based, like MP3, on MPEG audio compression.
Sound Blaster VOC (.voc)
This is the format
used by the Creative Voice SoundBlaster hardware used in IBM-compatible
computers and is optimized for that hardware. It specifies the sampling rate as
a multiple of an internal clock and is not as flexible as the other general
formats. Data can be segmented and portions of silence can be added.
Sun Audio (AU) and NeXT (.au, .snd)
A common compressed file format used for UNIX. The format specifies
arbitrary sampling rates and multi-channel sounds. It supports a number of sound
codecs, including µ-law, a-law, various linear formats of varying sample
sizes, floating point samples, native DSP samples and G.72x ADPCM compression. Most files start with the
four-character signature .snd. The .au file format, originally by SUN, is a very straightforward audio
format, unfortunately it isn't widely supported outside the UNIX community. Read
also Sun .au sound file format.
Windows WAVE (.wav)
Developed by Microsoft and IBM and is the common audio file format used for Windows. WAV files may be compressed or uncompressed, but even when compressed are still comparatively large. Like
Sun Audio, it specifies an arbitrary sampling rate, number of channels and
sample size. It also specifies a number of application-specific blocks within
the file. It has a plethora of different compression formats.
WAVE audio files are one of the common formats used to store
and play audio data. They support variable sampling frequencies, multiple
channels, and a number of compression algorithms. The following gives the
minimal requirements necessary to save audio data in this format, it doesn't
address compression and only considers sampled audio data.
A WAVE file consists of a number of chunks, each
of these chunks includes an identifier, the size of the chunk in bytes, and any
data associated with the chunk. There are two chunks that are required in order
to successfully save sampled audio waveforms, they are a format chunk, and the
sample data chunk The main advantage of using this chunk structure is that when
parsing a WAVE file you don't need to interpret every chunk type but can skip
over the ones you don't need or don't understand.
The wav format comes from the RIFF. "RIFF" means "Resource
Interchange File Format" and it was created by Microsoft.
There are five different RIFF files:
| Name of the Riff |
|
Meaning |
| PAL_ |
|
Palette file |
| RDIB |
|
Bitmap |
| RMID |
|
MIDI format file |
| RMMP |
|
Movie file |
| WAVE |
|
Sampling file |
The RIFF file of the WAV type has the .WAV extension. We can find all this information in the structure of a wav sample: The sample
rate, the number of bits which the sample is coded (8 or 16) with, the type of
the signal (mono or stereo) and data about an existing loop.
Links of interest
|