# The Modern Audio Analyzer

Mar 12, 2014 10:52 AM, By Bob McCarthy | Posted by Jessaca Gutierrez

Understanding dual-channel FFT audio analyzers

**Window Functions**

This is where the middle section, the time record comes into play. The time record is a modified version of the raw sampled waveform, making it ready for the transform. We cannot place limits on the program material coming through our system. We have to be able to measure any signal and we don’t have all day. We need our time record to give us a reasonably valid representation for all frequencies within the bandwidth. We will give up perfection for the few frequencies that are perfect multiples to gain equality for all.

The remedy is a time “window,” a mathematical simulation of a gain control between the buffered waveform and our second stage: the time record. Audio engineers can visualize the window function as a time triggered gate. Let’s resume with our example 5-millisecond time record. At the start (0 milliseconds) it is closed and then begins to open at some point after. Eventually but no later than the mid-point in time (2.5 milliseconds) the window is fully open. The second half mirrors the first and the window closes at the 5-millisecond end point. The final product is a modified version of the original waveform with amplitude weighting that favors the middle over the beginning and ending.

How does this help? First, we are assured that our Fourier transform assumption that we can place the time records end to end has been satisfied. Second, we can see there are certain to be costs to modifying the waveform. The costs vary by program material. Sine waves have distortion added to them. Transients may be ignored if they arrive at the beginning or end among other things.

There are different versions of the window functions and each has its own favorite source material. (None are fans of Zeppelin though.) They are all the same at the beginning (closed) and middle (fully open) but differ in the shape of the rise and fall between. The Hann and Blackman-Harris windows are often used for random sources (noise or music), while the Flattop window is favored for sine wave testing. How big are these errors? Most of them are 40dB-80dB down from a full-scale signal. Perfect? No. End of the world? No. (Our time record closed first.)

**The Frequency Response**

Now we have a full set of linear frequency response data, with amplitude and phase. The amplitude part is straightforward: bigger is bigger. The phase part is not as straightforward because the phase is just a position on a circle relative to … something. A steady sine wave will have a single-phase value. But if the input signal is random noise it will have random phase. Simply put, we can’t do anything with a single channel of phase data unless we have a time reference to compare it to, which is to say relative phase. As we will see, that is one of the key benefits to the dual-channel version of the FFT analyzer.

The next step is the move from linear to log. One option is to just stretch and squeeze the frequency response display to make the linear data fit in the log display. Visualize a Slinky. This is a video solution for an audio problem. The problem is not display; it’s resolution. We have very low resolution in the low-frequency range and very high resolution in the high end. Our 5-millisecond example has frequency data points every 200Hz. That’s 0Hz, 200Hz, 400Hz ... 19,600, 19,800, 20,000. Notice that we skipped over the subwoofers? On the other hand we have 50 slices in the octave between 10kHz and 20kHz.

If we take a longer time record, we get more resolution in the low end (a good thing) and high end (too much of a good thing). This is yet another battle with infinity. Instead, let’s try a compromise approach: We can take both short and long time windows. Take the best and leave the rest.

The scheme is a sequential doubling of time records. If we start with 5 milliseconds, then we also take 10 milliseconds, 20 milliseconds, 40 milliseconds, and onward. Each time we double the time span we will halve the sample frequency. This results in the same number of data points and octave lower. The key is that we use only the top octave in each one, (which in our example is regarded as a 48 points/octave) format. The chosen octave derived from each band is spliced together to make a composite full range response with the same resolution per octave. Inside each octave the data is linear data stretched to a log display. But since each linear span is just one octave the stretching is barely detectible.

**The Next Step: Dual Channel**

We have gone to a lot of trouble to create a high-resolution frequency response. How is this anything more than a high-definition version of the old 1/3-octave realtime analyzer? It’s not significantly different until you add the second channel, and then it is a whole new ball game. Any single-channel analyzer can give you the basic amplitude response, but it takes two channels to tell time (phase). With two channels we turn our analyzer into a differential input: a device that sees the difference between its two channels. When placed across the input and output of any single device (or series of devices) the analyzer will give us the transfer function (i.e., the difference between the two points).

The RTA does not know processor from speaker from room, our show from a forklift, or whether the sound is arriving directly or has made six trips to Timbuktu and back. The dual-channel FFT analyzer even has a function (coherence) that tells you whether or not the analyzer knows the answer or is guessing. Knowing all of these things will not assure that you will make good decisions, but it surely ups the probability. The RTA has a simplistic worldview. It only reads amplitude over frequency, so an RTA user is inclined to believe that the solutions must be in amplitude adjustment (i.e., equalization). The modern dual-channel FFT informs us of so much more (phase, signal to noise, echo structure, and more) that we are guided to see the sound system challenges in their full complexity. This makes for better diagnosis, which makes for better treatment. Solution options such as speaker aiming, splay angle, delay, crossover alignment, and level tapering are all aided by the information the FFT provides.

We can tell time, detect distortion, compression, and even changes in sound speed all while the band is playing and the audience is dancing. Even better, the multiple time windowed frequency response closely resembles how our ears detect the tonal shape of transmitted sound waves. This makes for a superior tool for guiding the equalization process.

In my next column we will engage the second channel and show how all of this is put to use for system optimization.

**Acceptable Use Policy**blog comments powered by Disqus