Many thanks to Walter Knapp, Dan Dugan, Raimund Specht and other
participants in this group for helping make clear that psychoacoustic
coding should not be considered "data compression." Rather, it is an
exercise in "data reduction."
ATRAC, MP3, AAC and other psychoacoustic sound encoders don't save
compressed pieces of the original sounds. Instead, they save
instructions for making sounds. Using ATRAC or MP3, these
instructions are 80% to 90% smaller than the original digitized sound
file.
=20=20
At the time of playback, the sound decoder follows the instructions
to re-create the original sounds with an acceptable level of
fidelity. Exactly what "acceptable" means to you will determine
which psychoacoustic approach, if any, you choose to use.
Psychoacoustic data reduction takes advantage of two characteristics
of human hearing: Frequency masking and temporal masking.
Researchers discovered that when a tone is played at a fixed volume,
a second tone that is slightly lower or higher in pitch can't be
detected by human hearing until the volume of the second tone passes
a threshold. This threshold is much louder than the volume needed to
detect the second tone in a quiet setting. Thus one sound frequency
masks the other.=20
The threshold volume varies with different frequency ranges. Human
hearing has the lowest threshold to masking in the 2 to 4 KHz range.
Temporal masking is easier to explain: After a loud sound is heard,
it takes a short time before human hearing can detect a soft sound.=20
Loud sounds mask softer ones when made at the same time, too.
The psychoacoustic model takes advantage of temporal and frequency
masking by ignoring sound data that can't be detected by human
hearing. This substantially reduces the instructions needed to re-
create the recorded sound.
Once the instruction set is created by the ATRAC, MPG or AAC encoder,
it is passed through a lossless data compression step to reduce the
size of the file even further.
The Motion Picture Experts Group (MPEG) originally defined three ways
to employ psychoacoustic principles in the reduction of audio data.=20
The three models are called Layers 1, 2 and 3.
MPEG1 Layer 1 (MP1): Psychoacoustic model only uses frequency
masking. Resulting file is typically about 25% of the size of the
original digitized data.
MP2: Uses frequency masking, and some temporal masking. Resulting
file is 12% to 16% of the original data.
MP3: both frequency and temporal masking are employed, improved
frequency masking techniques are used, and stereo redundancy is
detected. Resulting file is typically 8% to 10% of the size of the
original.
ATRAC uses all of the above techniques, and produces a decoder
instruction file that is about 18% to 20% of the original. That's
how 74 minutes of CD-quality audio can be squeezed on a 160MB
minidisc.
Pretty much indistinguishable from magic!
--oryoki
________________________________________________________________________
________________________________________________________________________
|