Understanding Natural Sound
Jan 13, 2012 2:32 PM, By Bob McCarthy
Tips for achieving the best acoustics.
Our aural system coexists with our visual sense, and the brain expects the results to match. The range and location seen by our eyes should match our ears. The audio range finder is the amount of tilt in the frequency response, and the audio localization is done by the mechanisms just discussed.
The last part is timing. Sound is slower than light, so natural sound transmission means that the sound will always be behind what we see. Interestingly, though, we are amazingly easy to dupe on this. We can easily be fooled and in fact, often get annoyed when we are not. I will explain with some examples. If we are 33 meters (110ft.) from a person speaking on stage, we should expect to notice the sound being approximately 100 milliseconds behind what we see. That is natural. But if there was video projection where we could see moving lips, we would complain that the “sound system” (i.e. natural sound) is out of sync. I watch fireworks and I am surprised over and over that sound and light are out of sync. I am firmly convinced no one would notice if the sound was magically accelerated to the speed of light for these events. Movies prove it. Such magic acoustics are found in the explosions in Hollywood films, where no matter the distance (and even in outer space), we are always in time. So when it comes to synching to video, we need not mimic natural sound behavior but rather improve it in order to be perceived as natural.
How Speakers Relate to Natural Sound
Now that we know how natural sound is transmitted and received, we can look at how we can push the envelope and make magic sound without detection. Let’s first look at the things to avoid. Visible speakers would be a start, but they’re not actually the most important. A set of black boxes can disappear from the mind very quickly if we cannot detect their usage. But even the most cleverly hidden speaker will give up its position if it breaks one of the big rules.
First, there can’t be distortion, hum, noise, rattles, or buzzes. No matter how loud a human shouts, they don’t go into clipping or feedback. The sound system must have enough headroom to be free from these risks. On the other side, the noise floor must be under control so there is no audible line frequency hum or hiss, or digital quantization.
A less obvious matter is the frequency range of the speaker system. There is nothing in natural acoustics that limits the frequency range of sound transmission. The limits are our ears and our speaker systems. To mimic natural sound requires a full-range sound system, not one that covers just the vocal range. Systems with truncated frequency extremes are easily detected by the absence of the full spectrum found in natural sound. Conversely, a system that exaggerates either the LF or HF regions will also be very easily detected. Equalization will be covered shortly.
Maintaining a plausible link between the visual placement of the original sound source and its amplified copy is critical to the illusion of natural sound. The location of the speaker(s) will be the most critical parameter. The ideal speaker locations would image well to the performers and have enough acoustic gain to be stable. How about for chest-mounted, battery-powered speakers for all performers?
There is no way to win the localization game without good locations, and yes, the plural is intentional. Since we can’t place speakers on the performers, we are left with two options: go up or go out. We will need to do both. Because localization errors can be detected in either the horizontal or vertical plane, we will need to have positions that can accomplish this for a wide variety of seats. A solo center cluster can be effective for a balcony, providing a centered horizontal image and, unless it is too high, an acceptable vertical image. By contrast, this location would create extreme vertical image distortion at the floor level. Now let’s go the other way: left/right systems alone. These can be placed low on the sidewalls, which gives us a realistically low vertical image. Unfortunately it will pull the off-center seats to one side or the other in the horizontal plane.
Therefore a triangular configuration (at least) is required to up our chances of realistic localization. The center helps to mitigate our sensitive horizontal localization mechanism. The level and timing relationship between the three sound systems and the fourth source (the original) will determine the sound image placement for each different seat in the hall. The geometry makes it absolutely impossible to develop a perfect timing and level scheme that will guide all seats to the original image source. This is a very complicated subject on its own, but suffice to say, that despite the fact that timing relationships get the most press on this matter, it is level that is the most dominant parameter.
First we must remember that level is half the game in the horizontal plane and practically all of it in the vertical. Level-based localization will hold up over much larger proportions of a hall, especially in large spaces. Level relationships between sources are ratios, and therefore they scale well. Timing relationships are not ratios; they are differences. And the trouble is we have only about 7 milliseconds of difference to work with in the horizontal plane before the game is over. That is about 2 meters of path length difference.
The shape of the room will help guide the strategy here. Narrow rooms can do well with big left/right systems and a small centerfill to help keep the horizontal image inward. Wide rooms, and particularly fan-shaped rooms, will favor a big central system and smaller left/right systems to bring the image down. All configurations benefit from frontfills, which provide both vertical and horizontal help.
Equalizing for a Natural Sound
A human speaker is a point source with a typical coverage angle of about 60 degrees and like a loudspeaker, is more directional in the highs than lows. That does not mean we need a 60-degree speaker system, but we do need to keep the system free from the frequency response distortions of multiple sources known as comb-filtering. We can create combing by cupping our hands around our mouth. We don’t want our speakers to sound like this, therefore we must minimize multi-sourcing between speakers and strong reflections. This is a giant subject of its own, big enough to write a whole book about (I did), but we will touch on it briefly here. The system must be designed with extreme care regarding the overlap between speakers. If speakers have lots of overlap, they must be kept physically close together. If they are far apart, then the overlap must be minimized. Otherwise a large proportion of our listening area will have large peaks and dips that cannot be solved effectively with an equalizer.
One of equalization’s primary roles in maintaining natural sound is range setting. This is done by how flat we make the system. The frequency areas of interest are the upper and lower extremes: below 100Hz and above 4kHz. These relate to mimicking the natural effects of sound transmission over distance described above. If a system is razor flat from top to bottom, then we are sonically placing the listener so close that there is minimal LF addition from the room and HF reduction from the air. Of course we want to move the audience closer but not so close it becomes uncomfortable or grossly out of proportion. A system with lows too far reduced or highs boosted above flat is patently unnatural since you are placing the listener practically in the source. Therefore we let the low end gradually rise and the high end gradually fade until the sonic range of the system is adjusted to the scale appropriate to maintain a magnified natural sound.
One of the characteristics of loudspeakers not found in natural sources is the acoustical crossover. Amazing story: I worked with a performer who was told for years that her voice had a crossover at 800Hz and that she should only use a particular speaker system with the same crossover. And we wonder why people are skeptical of sound engineers!
Our speaker systems sub-divide the spectrum to maximize power, frequency range, and control of coverage. Since audible response transitions are a clue to unnatural sound, we must take great care at the crossover. On one hand there are advantages to making sharp transitions to minimize interaction during the hand-off. On the other hand, sharp transitions are the most easily audible. Consider the movement from a front-loaded cone driver to a horn at 1kHz. The cone’s directional pattern may be very wide as it transitions into a narrow horn. This can make an easily detected transition as a source moves up the scale.
There is much more to this subject, but hopefully this will provide some insight into how to provide the most natural sound from your speaker system.
Acceptable Use Policy blog comments powered by Disqus