From the December 1990 meeting,
BASS Vol. 19 No. 3
Preliminary
Discussion by E. Brad Meyer
Brad Meyer began with a
transparency detailing the steps through which a piece of classical
music reaches the ears of the audiophile. First of course is
the composer. Next come the players (and their instruments)
and the conductor, who realize the composer's ideas in acoustic
form. The sound goes out through the air into a hall, and is
changed by microphones into an electrical signal. These electrical
signals go through a mixer, and thence into a master recorder.
Then comes the editing, and perhaps processing, reverb, and
remixing. The result is transferred to some consumer music carrier--LP
once upon a time, today analog cassette or CD. Next comes the
home player, followed by connecting wires, preamp and a power
amp, speaker wires and loudspeakers, where the signal is transformed
back into sound. Finally there is the listening room. Mark Fishman
noted, to laughter, that Meyer had left out the influence of
the power plant and the ac lines.
Meyer drew a box around
the links in the chain over which the audiophile has control:
the sequence from the music carrier through the listening room.
Within this box lies the subject matter for consumer audio publications.
Tonight, however, Meyer
was going to focus his attention on the very end of the chain:
the listener. Many things affect how a listener perceives the
sound. Among them are hearing limits, experience, fatigue, mood
(one's own and that of others), and pharmacological substances
both medical and recreational. Ambient lighting affects mood
and is thus a factor, as Meyer discovered early on in his audio
pursuits. Lighting just the speakers and leaving the rest of
the room dark makes the sound more vivid and dramatic. Meyer
suggested that people try this, and also darkening the whole
room.
Listener hearing acuity
of course is a major factor. Meyer mentioned that one now can
get tested out to 20 kHz instead of just the standard 8 kHz
[see the May 1991 meeting summary, in v19/1--PSH]. Meyer had
had his ears' hearing thresholds (not the same thing as frequency
response) measured and found that he has measurable loss in
low-level detection at 12 kHz and a lot more above that. He
found it a sobering experience, as have many of us.
Meyer pointed out that the
ear's equal-loudness curves tend to bunch at the frequency extremes.
This means that once the highest and lowest sounds are above
the hearing threshold, a small change in level will sound louder
than a similar change in the mid-band. This became painfully
obvious during his high-frequency hearing tests. For example,
at 18 kHz Meyer's threshold is 106 dB spl. At 104 dB he cannot
hear it at all and yet at 106 dB; he yanked the phones from
his head. Fishman quoted Bob Berkovitz as saying that if the
sound is not audible it does not damage the ear even if its
level is quite high.
Returning to the playback
chain, Meyer went on to say that typically the audiophile can
affect only a small part of it--the playback system and the
listening room (both acoustically and how it may be made to
influence mood). There is no control over the recording process,
although Meyer suggested that those who have the opportunity
to do live recording really should try it--it is dismaying how
much influence microphone choice and placement have on the recorded
sound.
Meyer speculated that so
much attention has been paid by audiophiles to trivial aspects
of the playback chain such as the cables and ac power because
the advent of the CD has eliminated the audible distortion introduced
in the process of getting the signals from the master tape to
the playback preamp, hitherto an area ripe for great fussiness.
Things are a lot less interesting now for those looking for
controllable detail.
Mark Fishman brought up
an interesting comment from J. Gordon Holt on memory: Holt now
has better memory than hearing. His memory now hampers his enjoyment
of many musical performances because he misses the sheen of
the violin and the delicacy of the cymbals and triangle, which
he remembers but no longer hears. The discrepancy bothers him.
David Moran suggested that Holt might find it helpful to employ
wider-dispersion tweeters and, theoretically, some judicious
equalization, to get more audible treble into the reverberant
field.
Alvin Foster reported a
more cheerful result, saying that his own memory helps add sheen
to the strings rather than detracting from his current listening
enjoyment. Dan Banquer commented that, as a musician, he has
always felt that nothing is like being in the middle of the
music. No matter how many millions of dollars of equipment one
has, it cannot recreate the experience of performing. Meyer
added that he has a BSO violinist friend who complains that
the BSO broadcasts do not have enough string sound. Meyer asked
him how often he has listened to the BSO from out in the audience.
[This again poses the question of what "viewpoint"
the sound should be created for and/or played back from--PSH.]
The ABX Comparator
Historically, it was an
interest in tracking down the source of perceived differences
in the playback chain that led to the construction (by David
Clark and associates) of ABX boxes like the one Meyer was to
demonstrate at this meeting. During the next portion of the
evening he introduced the ABX box and played with it a bit to
show how it worked. The system assembled for the meeting comprised
an Apt preamp, Audio Dynamics power amp (Japanese, class AB,
bipolar), the Allison 205 3-piece satellite/woofer system (lightly
equalized with a dbx 10/20 to boost the low bass and help ameliorate
a presence wrinkle), and an AR turntable fitted with a JH Formula
Four arm and Stanton cartridge.
The ABX comparator switches
between two sources. The box has three buttons on the remote
and three LEDs on the front panel, labeled A, B, and X (hence
the product name). There is another pair of buttons, labeled
Down and Up, which change the numeric display on the unit. When
the box is powered on, it generates 100 random assignments of
X to either A or B, one for each possible displayed number on
a two-digit readout (00 to 99). A Reset button on the main control
unit returns the sequence to test number 01. Pushing A connects
source A to the output, and likewise for button B. Pushing X
connects the box-selected source, which is either A or B. Neither
the operator of the box nor the listeners have any notion of
which source is X until the answers are read out at the end
of the test. This kind of test is called double-blind, as neither
the tester nor the tested knows the answers.
During the test the subjects
(or the tester) switch among A, B, and X and then mark on an
answer sheet whether X is A or B. The test is repeated for a
series of separate trials. At the end of a series, pushing the
Answer button reveals the identities of X for all trials. In
the answer mode, X is on together with the selected source--if
X were A for trial number 01, for example, the LEDs for X and
A will both be lit.
The ABX box is designed
to determine how reliably the listener can detect differences.
Preconceptions affect perception and conclusions [in other words,
not only is seeing believing, but believing is also seeing-Ed.],
hence the need for single blindness. Double-blind testing is
required because the tester almost invariably (and unpredictably)
influences the test subject(s). One of many well-known examples
occurred when a group of psychology students tested many subjects
for IQ. The subjects were impartially tested for IQ beforehand,
and then sorted into two groups with similar IQ ranges. The
testers were told that group A was exceptionally intelligent
while group B was not. For each group, the testers were to read
the same script while administering the test. The result was
that the group touted as smart to the test-givers scored statistically
significantly better than the group labeled stupid. Somehow
the testers conveyed their expectations about performance while
reading the same instructions to the two groups, and the groups
responded to the cues.
Listening
Demonstrations consisted
of a range of comparative-listening tests to different musical
sources, including PCM-F1 tapes and LPs, with two different
devices inserted into the B path and compared with a straight-wire
bypass in the A path. This kind of line-level comparison is
easy to do well; at high, amp/speaker levels, there may be problems.
Meyer also has a high-current relay box (an extra-cost option)
for switching amplifiers or loudspeakers. The large relays in
this box make a soft clunk that is different for the two sources
and is audible in a quiet room; Meyer has identified X 10 out
of 10 times without any signal! While the sound is quiet enough
to be masked when any music is playing, testing hygiene dictates
that the relay box be enclosed or otherwise muffled.
Meyer handed out a sheet
photocopied from the ABX manual which showed typical level-matching
required for reliable detection of differences between sources
with 1/3 octave frequency-response aberrations. When the aberrations
span a wider spectrum, level-matching becomes increasingly critical,
dropping to less than 1/3 of a dB especially in the ear-sensitive
2-5kHz region. Acuity (ability to hear difference) also depends
sometimes on how close to the threshold of hearing the level
of the frequency is. At threshold, a small increase in level
will make the sound audible and enable the listener reliably
to distinguish A and B when different.
Steve Owades noted that
the use of the ABX box does not reduce bias in results due to
peer pressure when the box is used with more than one listener
at a time. Visible or audible reactions from surrounding listeners
may influence a subject's answer. Such bias makes the answers
dependent--what one listener chooses is influenced by what his
or her peers choose. This may invalidate the result for statistical
analysis, which requires that the trials be independent.
The Tests
Meyer first demonstrated
the operation of the ABX box by disconnecting the signal feed
to the B inputs. This simpleminded procedure--comparing an audible
signal with no signal--has proven helpful in clarifying how
the box works for those who, for example, fail to pick up the
point that the assignment of X remains constant for each trial.
The 18 subjects present went through the exercise of writing
their answers for X on the sheet. The result: 17 correct answers
and one abstention, from someone who deemed the test too obvious
to dignify with an answer.
Next Meyer inserted a Technics
SH-9010 parametric equalizer in the B loop and set the 3 kHz
slider for a 3 dB boost. The Q knob was set to 0.7 (the broadest
setting, for a bandwidth of about two octaves). Playing pink
noise through the system makes this alteration easy to hear,
and the group got a score of 18/18 without difficulty. With
choral music, whose broad frequency range makes it a good test
for response aberrations, the score was 16/17.
The next test was much tougher:
The 9010 was left in the circuit, but with all sliders set to
their midpoints. Unlike some consumer equalizers, the semi-pro
Technics has controls that really do what they say (boost, cut,
or stay flat), and the response is quite flat in this condition
except for a slight droop in the top octave. To make things
more difficult, we heard only the choral music for this trial.
The group got 7/17 correct.
The last two trials were
bypass tests of the Sony PCM-F1 digital processor. The F1's
video output was looped back to the input and the processor
was set to a gain of 1.0 and connected to input B. The signal
source was an LP made by Meyer and Peter Mitchell of organist
James Johnson--the same production whose digital version has
been excerpted on the first and second Stereophile test CDs.
The LP was made from an analog master, so we really were comparing
an analog source directly with an F1-digitized version. The
results on the two trials were 9/15 and 7/15; the total was
16/30, 53% correct.
Analyzing Results
If listeners are really
not able to detect any difference between A and B (whatever
they believe) -or if they were to guess--the outcome will tend
toward 50 percent correct (and 50 percent incorrect) answers
as the sample size increases. When the listeners can tell the
difference easily (as with the pink-noise test of the 3 kHz
boost) the result will be all answers correct. When the difference
is subtle and some can detect it reliably while others cannot,
the number of correct answers should lie between half and all
correct.
[Author's note: The number
of correct answers can fall below half if the trials are not
independent, i.e., if someone in the audience is influencing
others. Meyer told a story of an AES workshop he and Mitchell
gave when the box generated a run of successive trials in
which X was B. Many people selected A on one difficult trial,
apparently thinking that it was about time to get an A--which
is, of course, a form of dependence, though dependence on
previous trials and not on the other subjects in the room.
In an AES preprint by my brother and me (presented to the
BAS several years ago) we suggested a much more complicated
distributional function to analyze the data which would help
reduce the effects of dependent trials--PSH.]
Depending on the numbers
of trials, there is a definite number of correct answers beyond
which one can say that the probability of a listener's getting
that number by chance is less than five percent. This is what
is known as a 95% confidence level. Assuming independence, with
six trials one has to get all six correct to satisfy this criterion.
With 24 trials, 17 correct answers is the threshold. The percent
of correct answers needed to qualify for `reliably hearing differences'
decreases as the number of independent trials increases.
Stereophile carried out
a double-blind test and then examined the results of only those
subjects who got high scores. They concluded that this group
had demonstrated the ability to hear differences. This, however,
is statistically invalid: even for randomly generated answers,
in a large group 1 out of 20 subjects would be expected to satisfy
the 95% criterion by chance alone. (This group represents the
5% that you're 95% confident that a given subject doesn't fall
into.) To ascertain whether there really is a golden-eared group,
they should have selected the high scorers and used them for
another series of trials.
The tests we took showed
clear audibility to a confidence level well over 95% for the
first three tests, and null results for the last three. The
tests were conducted patiently and fairly, under generally good
conditions; for example, there was a minimum of cross-comment.
Meyer noted that people
typically get touchy, even grouchy, when two blind-compared
pieces of equipment are very similar. It must be noted here
that some high-end reviewers have said long-term listening to
each piece of equipment produces more-reliable answers than
short-period ABX switching. What they feel is that quick switching
is less revealing than long-term listening to each piece of
equipment--although there is good evidence that, to the contrary,
quick comparison increases acuity. In any case, contrary to
popular misconception, there is no law against leaving the ABX
box in position A for a month, then switching to B the next
month, and finally to X during a third month.
[Guest's addendum: Following
my own experience, I tried to switch among A, B, and X at
moments that would be the most revealing of differences. Still,
these tests were necessarily conducted with fairly rapid switching.
Needless, to say, the system and room were familiar to none
of us. The conditions were obviously not the best, and finally,
as always, a negative result does not conclusively prove the
nonexistence of anything. The test can and should be made
more sensitive when possible by using the subject's own system
and room, and by repeating musical selections through both
signal paths (a repeated-music test) rather than switching
back and forth while a selection is playing (a running-music
test).
The stress of the tests
did indeed tell after a while. Even the temperate Poh Ser
Hsu was heard to snap at someone two rows ahead of him to
quit moving his head around! While this was a less than ideal
test, then, I must point out that claims by writers like Robert
Harley that blind tests necessarily generate such stresses
are without foundation. EBM]
|