Digital Video for AV Integrators
When you transform AV into digital bits and bytes, it opens up a world of possibilities?and a few challenges. Here is everything you need to know about moving digital video around an AV installation.
IN THE BEGINNING
It all starts with video images captured from our analog world by special sensors known as charge-coupled devices (CCDs) and complementary metal oxide semiconductors (CMOS). The output voltages from these sensors are quantized into strings of bits that correspond to relative voltage levels created by bright and dark areas of an image. By taking more and more samples of the red, green, and blue video signals (increasing the bit depth), the analog-to-digital conversion can more accurately reproduce those signals when they are ready to be viewed.
Why is that important? Early flat-panel displays commonly used eight bits per pixel, or 256 red x 256 green x 256 blue pixels. That equals a total of 16.7 million possible colors, which would seem like more than enough. But it really isn’t. Images sampled at this bit rate often exhibit abrupt changes between shades of colors that resemble contours on a topographic map, creating unwanted image artifacts. That’s why video cameras sample at greater bit depths, and it’s also why professional flat-panel displays are moving to 10-bits-per-pixel sampling to create smoother gradients.
One thing that’s interesting about digital component video (YCbCr) is that the brightness (luminance) signal contains most of the picture detail. That means that we can sample the color information at half the rate, or even lower. So if we determine that four samples are needed of luminance, but only two samples of each of the color difference signals are required, we come up with the ratio 4:2:2which happens to be a very common digital format for professional video (one of which is the ITU standard BT.601).
In contrast, digital cinema cameras capture video in a red, green, and blue (RGB) format and must preserve as much detail in each color channel as possible. Accordingly, these high-performance cameras use a 4:4:4 sampling ratio, which results in extremely large files.
On the other hand, digital TV programs on cable and satellite as well as movies recorded to DVD use a 4:2:0 sampling ratio, reducing the color detail by half again from the 4:2:2 standard to conserve bandwidth. (And you probably didn’t even notice.)
PACK AND SHIP
The whole concept of digital video revolves around the idea of redundancythat is, redundancy between adjacent frames of video. If we shoot one second of video (30 frames interlaced, or 60 frames progressive-scan), there are bound to be parts of each frame that don’t change, or change only a little over time.
If we can come up with a system that analyzes each frame of video and identifies the parts that change versus the parts that don’t, we can record and distribute that video quite efficientlymuch more efficiently than if the video signal was analog, where each frame is repeated with any and all redundancies and the full bandwidth of a TV channel is required to pass the signal, no matter its resolution.
And that’s exactly how a video codec works. In everyday use, we speak of codecs being a system by which a video stream is encoded and decoded, using MPEG, JPEG, or wavelet processes. But the piece of hardware that we use to perform the compression is a video encoder.
There are several ways to compress video signals, but the most common is based on a principle known as discrete cosine transform (DCT). In a nutshell, DCT reduces the elements of a video image to mathematical coefficients. It is the heart of both the JPEG (Joint Photographic Experts Group) and MPEG (Moving Pictures Experts Group) standards, and is widely used for encoding everything from videos that you shoot on your $150 digital camcorder to those you shoot on a $10,000 broadcast camera.
While JPEG is used primarily for still images and digital cinema, MPEG is the standard for almost all compressed video. The MPEG system starts with a string of video frames, known as a group of pictures (GOP), which can be almost any length, but is typically 15 frames long, or a half-second.
The first video frame in the sequence has all of the information necessary to compress the frames that follow, and is known as an intracoded frame, or I-frame for short. (I-frames can also be called key frames.) Each I-frame has all of the picture information encoded into eight-pixel-by-eight-pixel blocks.
The second MPEG frame type, the predictive frame (P-frame) looks at the data in a previous I-frame and determines what is actually changing between two adjacent frames. Elements that change in position, color, or luminance are re-encoded, while elements that do not change are simply repeated. This allows for even greater compression of the video signal.
A third frame type, the bi-directional predictive frame (B-frame) looks forwards and backwards at reference frames to determine which pixels need to be re-encoded as new, and which pixels can be repeated.
Clearly, I-frames are critical to digital video playback. If the system drops an I-frame, the picture freezes up or disappears altogether until the next I-frame comes along. This is what commonly causes drop-out on satellite and terrestrial DTV signalsthe decoder can’t resume converting the compressed files to video until it has another reference point.