naturerecordists
[Top] [All Lists]

Re: some respect for ATRAC

Subject: Re: some respect for ATRAC
From: "Raimund Specht" <>
Date: Tue, 29 Apr 2003 09:07:00 -0000
oryoki, thanks for summarizing the principles of psychoacoustic
coding.

In the mean time, Evert provided a practical test with his MiniDisk
recorder. It turned out, that at least lees demanding band-limited
test signals will be processed very well by ATRAC (see the last
section on this page :
www.avisoft-saslab.com/compression/compression.htm.

However, more importantly, another horrific effect showed up. Evert
used an optical SPDIF interface (SoundBlaster Live!) to transfer the
test file between the PC and his MiniDisk recorder. In order to get
the best Signal to Noise Ratio, he first converted the sample rate
of the artifical test signal from 44.1 to 48 kHz and then
transferred the data. The resulting spectrogram shows incredible
artifacts:

http://www.avisoft.de/compression/compressed48MD.gif

These re-sampling artifacts at higher signal frequencies seem to
originate from the real-time sampling rate conversion process. Such
effects might also occur when transferring data from a DAT recorder
to a PC. Therefore, it is often very important to match the sampling
rates between the digital audio components (as Dan Dugan mentioned
recently). Unfortunately, these effects are often inaudible, because
the lower frequency components will not be affected.

Regards,
Raimund

--- In  "oryoki2000" <>
wrote:
>
> Many thanks to Walter Knapp, Dan Dugan, Raimund Specht and other
> participants in this group for helping make clear that
psychoacoustic
> coding should not be considered "data compression."  Rather, it is
an
> exercise in "data reduction."
>
> ATRAC, MP3, AAC and other psychoacoustic sound encoders don't save
> compressed pieces of the original sounds. Instead, they save
> instructions for making sounds. Using ATRAC or MP3, these
> instructions are 80% to 90% smaller than the original digitized
sound
> file.
>=20=20=20
> At the time of playback, the sound decoder follows the
instructions
> to re-create the original sounds with an acceptable level of
> fidelity.  Exactly what "acceptable" means to you will determine
> which psychoacoustic approach, if any, you choose to use.
>
> Psychoacoustic data reduction takes advantage of two
characteristics
> of human hearing:  Frequency masking and temporal masking.
>
> Researchers discovered that when a tone is played at a fixed
volume,
> a second tone that is slightly lower or higher in pitch can't be
> detected by human hearing until the volume of the second tone
passes
> a threshold.  This threshold is much louder than the volume needed
to
> detect the second tone in a quiet setting.   Thus one sound
frequency
> masks the other.=20
>
> The threshold volume varies with different frequency ranges.=20
Human
> hearing has the lowest threshold to masking in the 2 to 4 KHz
range.
>
> Temporal masking is easier to explain: After a loud sound is
heard,
> it takes a short time before human hearing can detect a soft
sound.=20
> Loud sounds mask softer ones when made at the same time, too.
>
> The psychoacoustic model takes advantage of temporal and frequency
> masking by ignoring sound data that can't be detected by human
> hearing.  This substantially reduces the instructions needed to re-
> create the recorded sound.
>
> Once the instruction set is created by the ATRAC, MPG or AAC
encoder,
> it is passed through a lossless data compression step to reduce
the
> size of the file even further.
>
> The Motion Picture Experts Group (MPEG) originally defined three
ways
> to employ psychoacoustic principles in the reduction of audio
data.=20
> The three models are called Layers 1, 2 and 3.
>=20
> MPEG1 Layer 1 (MP1): Psychoacoustic model only uses frequency
> masking. Resulting file is typically about 25% of the size of the
> original digitized data.
>
> MP2: Uses frequency masking, and some temporal masking.  Resulting
> file is 12% to 16% of the original data.
>
> MP3: both frequency and temporal masking are employed, improved
> frequency masking techniques are used, and stereo redundancy is
> detected.  Resulting file is typically 8% to 10% of the size of
the
> original.
>
> ATRAC uses all of the above techniques, and produces a decoder
> instruction file that is about 18% to 20% of the original.  That's
> how 74 minutes of CD-quality audio can be squeezed on a 160MB
> minidisc.
>
> Pretty much indistinguishable from magic!
>
> --oryoki



________________________________________________________________________
________________________________________________________________________

<Prev in Thread] Current Thread [Next in Thread>
Admin

The University of NSW School of Computer and Engineering takes no responsibility for the contents of this archive. It is purely a compilation of material sent by many people to the naturerecordists mailing list. It has not been checked for accuracy nor its content verified in any way. If you wish to get material removed from the archive or have other queries about the archive e-mail Andrew Taylor at this address: andrewt@cse.unsw.EDU.AU