Psychoacoustics is
the study of subjective human
perception of
sounds. Effectively, it is the
study of
psychology of
acoustical perception.
Background
In many applications of
acoustics and
audio signal processing it is
necessary to know what humans actually hear.
Sound, which consists of air
pressure waves, can be accurately measured with sophisticated
equipment. However, understanding how these waves are received and
mapped into thoughts in the brain is not trivial. Sound is a
continuous analog
signal which (assuming infinitely
small air molecules) can theoretically contain an infinite amount
of information (there being an infinite number of frequencies,
each containing both magnitude and phase information.)
Recognizing features important to perception
enables scientists and engineers to concentrate on audible
features and ignore less important features of the involved
system. It is important to note that the question of what humans
hear is not only a physiological question of features of the
ear but very much also a
psychological issue.
Limits of perception
The human ear can usually hear sounds in the
range 20
Hz to 22 kHz. With age, the range
decreases, especially at the upper limit. Lower frequencies cannot
be heard but loud sounds can be felt on the skin.
Frequency resolution of the ear is, in the
middle range, about 2 Hz. That is, changes in pitch larger than 2
Hz can be perceived. However, even smaller pitch differences can
be perceived through other means. For example, the interference of
two pitches can often be heard as a (low-)frequency difference
pitch. This effect is called beating.
The "intensity" range of audible sounds is
enormous. Our ear drums are sensitive only to the sound pressure.
The lower limit of audibility is defined to 0
dB, but the upper limit is not as
clearly defined. The upper limit is more a question of the limit
where the ear will be physically harmed (see also
hearing disability). This limit
depends also on the time exposed to the sound. Sometimes, the ear
can be exposed to short periods of sounds of 120 dB without harm,
but long times of 80 dB sounds will harm the ear.
A more rigorous exploration of the lower limits
of audibility determines that the minimum threshold for which a
sound can be heard is frequency dependent. By measuring this
minimum intensity for testing tones of various frequencies, a
frequency dependent
Absolute Threshold of Hearing (ATH)
curve may be derived. Typically, the ear shows a peak of
sensitivity (i.e., its lowest ATH) between 1kHz and 5kHz, though
the threshold changes with age, with older ears showing decreased
sensitivity above 2kHz.
The ATH is the lowest of the
equal-loudness contours.
Equal-loudness contours indicate the sound pressure level (dB),
over the range of audible frequencies, which are perceived as
being of equal loudness. Equal-loudness contours were first
measured by Fletcher and Munson at
Bell Labs in
1933 using pure tones reproduced
via headphones, and the data they collected are called
Fletcher-Munson curves. Because subjective loudness was difficult
to measure, the Fletcher-Munson curves were averaged over many
subjects.
Robinson and Dadson refined the process in
1956 to obtain a new set of
equal-loudness curves for a frontal sound source measured in an
anechoic chamber. The Robinson-Dadson
curves were standardized as
ISO 226 in
1986. In
2003, ISO 226 was revised using
data collected from 12 international studies.
What do we hear?
Human hearing is basically like a spectral
analyzer, that is, the ear resolves the spectral content of the
pressure wave without respect to the
phase of the signal. In practice,
though, some phase information can be perceived. Inter-aural (i.e.
between ears) phase difference is a notable exception by providing
a significant part of the
directional sensation of sound.
The filtering effects of
head-related transfer functions
provide another important directional cue.
Masking effects
In some situations an otherwise clearly audible
sound can be masked by another sound. For example, conversation at
a bus stop can be completely impossible if a loud bus is driving
past. This phenomenon is called masking. A weaker sound is masked
if it is made inaudible in the presence of a louder sound.
If two sounds occur simultaneously and one is
masked by the other, this is referred to as
simultaneous masking. A sound
close in frequency to the louder sound is more easily masked than
if it is far apart in frequency. For this reason, simultaneous
masking is also sometimes called frequency masking. The tonality
of a sound partially determines its ability to mask other sounds.
A
sinusoidal masker, for example,
requires a higher intensity to mask a noise-like maskee than a
loud
noise-like masker does to mask a
sinusoid. Computer models which calculate the masking caused by
sounds must therefore classify their individual spectral peaks
according to their tonality.
Similarly, a weak sound emitted soon after the
end of a louder sound is masked by the louder sound. In fact, even
a weak sound just before a louder sound can be masked by
the louder sound. These two effects are called forward and
backward
temporal masking, respectively.
Psychoacoustics in software
The psychoacoustic model provides for
high quality
lossy signal compression by
describing which parts of a given digital audio signal can be
removed (or aggressively compressed) safely -- that is, without
significant losses in the quality of the sound. It explains, for
example, how a sharp clap of the hands might seem painfully loud
in a quiet library, but hardly noticeable after a car backfires on
a busy, urban street. It might seem as if this would provide
little benefit to the overall compression ratio, but
psychoacoustic analysis routinely leads to compressed music files
that are 10 to 12 times smaller than high quality original masters
with very little discernible loss in quality. Such compression is
a feature of nearly all modern audio compression formats. Some of
these formats include
MP3,
Ogg Vorbis,
Musicam (used in digital radio --
DAB, or
DR --in Europe and elsewhere,
based on
Eureka 147), and the compression
used in
MiniDisc, to mention a few common
audio compression standards.
Psychoacoustics is based heavily on
human anatomy, especially the
ear's limitations in perceiving sound as outlined previously. To
summarize, these limitations are:
-
High frequency limit
-
Absolute Threshold of Hearing
-
Absolute Threshold of Pain
-
Temporal masking
-
Simultaneous masking
Given that the ear will not be at peak
perceptive capacity when dealing with these limitations, a
compression algorithm can assign those sounds outside the range of
human hearing a lower priority; by carefully shifting bits away
from the unimportant components and toward the important ones, the
algorithm ensures that the sounds the listener hears most clearly
are of the highest quality.
Psychoacoustics and music
Psychoacoustics includes many subjects and
produces discoveries which are relevant to
music and its
composition and
performance, and some musicians,
such as
Benjamin Boretz, consider the
results or some of the results of psychoacoustics to be meaningful
only in a musical context.
Yet to be done:
-
Bark scale,
Equivalent rectangular bandwidth
(ERB),
Mel scale and other scales
-
Loudness,
that is, perceived volume,
Bel,
sone
- Perception of non-existent sounds, such as,
missing
fundamental frequency, and
other
auditory illusions. Compare to
telephone which transmits 400
Hz to 3400 Hz.
- Auditory Scene Analysis (incl. 3D-sound
perception, localisation, etc.)
See also