naturerecordists
[Top] [All Lists]

Re: Sound editing & sonogram software on PC

Subject: Re: Sound editing & sonogram software on PC
From: Walter Knapp <>
Date: Thu, 13 Mar 2003 16:13:34 -0500
Lang Elliott wrote:
> Steve:
> 
> I'm not a physiologist, but I am familiar with research that shows human
> hearing is octave-based, more or less. In other words, our ability to
> discriminate slight frequency differences correspond closely to an octave
> model, at least through the midrange of human hearing.

On a fairly loose level this is so. Mostly in that we are set up with a 
log style sensitivity. Actually we are set up with a concentration of 
sensors at voice frequencies and it tapers off both directions from there.

One of the problems for nature recording is software that forces the 
mold of western/european musical scales on natural sound. Those are only 
one interpretation of human hearing abilities. It leads to software that 
creates problems dealing with non-human sounds. And tests of sound 
quality that simply test how well our recordings fit a musical scale or 
such like.

I make a strong effort to avoid getting trapped by music software or 
tests. The sonograms themselves, even the log ones, are little used in 
music because they are not in musical scales.

> If sonograms are presented as an intuitive picture of sounds, then the
> vertical frequency axis should show a doubling of frequencies. The
> arithmetic scale is very misleading in this respect because it severely
> "compresses" low pitched sounds and severely "expands" high pitched sounds
> in the frequency dimension. This creates a misleading picture and makes high
> pitched bird songs look as if they are much more frequency variable than
> they actually sound to the human ear.

It's certainly true that linear scaling of frequency is wrong. I'm not 
convinced that the log scales are a perfect representation either. In 
fact for sounds in the primary human vocal range there may be a 
advantage to the linear. I sometimes compensate my log displays when 
working on sound by changing the endpoints of the display to enlarge 
this area, in full scale log it's too little emphasized and low too much.

> As I understand it, FFT analyses are not at all based on the physiology of
> human hearing.  FFTs are based on constant bandwidth analysis. In other
> words, if the analysis bandwidth that's chosen is, let's say 50 Hz, then
> equal emphasis would be given to analysis of a full octave at 50-100 Hz as
> would be given to a tiny fraction of an octave at 10,000-10,050 Hz. And the
> arithmetic sonogram reflects this bias. This has little relationship to the
> way we hear or the way birds hear. It is simply the result of constant
> bandwidth analysis. It is a mathematical constraint.

The FFT itself is not so limited. It's a matter of programming how you 
do the analysis. Certainly the sonograms in Spark XL are not doing the 
log plot from a linear frequency assignment, or if they are, it's being 
compensated later in the program before display. I do not worry about 
this, the same amplitude will end up the same color regardless of the 
range of frequency it occurs in, that's what's important. How a 
programmer achieved this evenness in display is something I don't worry 
much about.

There is a problem in that FFT blocksize has a big influence on the 
resolution at various frequencies. That's not frequency banding as you 
are giving, but how many samples are in the analysis block. ie how much 
time it's doing each increment of analysis on. The lower the frequency 
the more samples must be included to resolve the details. But you cannot 
do it on a unlimited large size or eventually the resolution of higher 
frequencies is less resolved. What's probably needed is a sonogram where 
the blocksize used in the analysis is varied by frequency. That will add 
considerable to the processing required, I doubt even the fastest 
current desktops could do that at high resolution in realtime. I'd love 
to have the option even if it was a static clip.

> If scientists want the most biologically significant analysis of bird songs,
> then the analysis bandwidth should be varied through the frequency domain
> based on actual physiological measurements of frequency discrimination
> ability in the birds themselves.

Each species is genetically programmed differently. Even each sex may be 
different. Take for instance the Coqui frog, where there is a difference 
so fine that the males discriminate the "co" part of the call and the 
females the "qui" part. And the frequency difference is not that much. 
It's in the very genetic makup of these frogs, the construction of their 
ears.

So to do it as you suggest would mean you would have to have a different 
sonogram program frequency profile for each species and/or sex. Down 
that path is the old way where each researcher wrote their own program 
for their specific work. It's very time consuming of valuable research 
resources. I once did it this way, and it got to where all my time was 
spent arguing with computers instead of doing science.

I personally consider that such concentration on the computer display is 
of little value. We identify the differences, but rarely construct a 
display for them. We are capable of visualizing or describing what such 
a display would look like much faster than getting a computer to do it. 
As long as we can communicate our findings we are all set.

And, what would a scientist do about a large frog chorus with 6 or 8 
species involved? Each species genetically programmed different.

That's why we tend to keep the display on human terms.

Even for humans a sonogram is a representation of the sound. It's our 
experience comparing that display to the sounds that allows us to use 
it. No way you could have a perfect display, as human hearing varies 
from individual to individual. It even varies with our mood and 
attitude. Sonograms, for instance, don't take into account the variation 
of our hearing with age. If we did that, little would show up on the 
upper half or third of a full range sonogram. And the low end would 
taper off too.

The sonogram is more to tell us what sound energy is there. We have to 
interpret what we might hear of that.

Walt




________________________________________________________________________
________________________________________________________________________

<Prev in Thread] Current Thread [Next in Thread>
Admin

The University of NSW School of Computer and Engineering takes no responsibility for the contents of this archive. It is purely a compilation of material sent by many people to the naturerecordists mailing list. It has not been checked for accuracy nor its content verified in any way. If you wish to get material removed from the archive or have other queries about the archive e-mail Andrew Taylor at this address: andrewt@cse.unsw.EDU.AU