Temporally Scalable Video Compression

It is our goal to provide visual communication services to all users, regardless of their individual network bandwidths, quality-of-service, or terminal capabilities. Scalable video coding algorithms are key to enabling reliable "universal access" of visual communications over a variety of channels. Several algorithms have been proposed that allow scalability of video resolution, frame rate, and visual quality (SNR). The focus of this research, being investigated by Gregory Conklin, is development of video coding schemes to provide frame rate scalability with superior low-rate video quality.

Temporal Subband Decomposition

One common method of providing frame rate scalability is to apply a temporal subband decomposition. Thus, the full-rate video can be decoded using both the low-pass and high-pass subbands. While half-rate video can be decoded using only the low-pass subband. The resulting half-rate video sequence can then be passed through a second subband decomposition providing quarter-rate video. And so on.

Because each frame of the less than full rate video is a linear combination of frames of the full rate video, motion in the slower rate video tends to get "blurred". To reduce this blurring length 2 Haar filters are used in the subband decompositions. Effectively, this scheme generates a sum and difference sequence, both of which can be coded and transmitted separately.

7.5 fps video using
subband decomposition
7.5 fps video using
temporal subsampling

Above is a comparison of quarter rate (7.5 frames/second) video. The sequence on the left is the result of using the low-pass subband of a two level subband decomposition. That is, each frame is the average of 4 consecutive full-rate video frame. The sequence on the right is the result of subsampling frames of the full-rate sequence. That is, dropping all but every fourth frame.

Notice that burring is evident in the sequence on the left, while aliasing can be seen in the sequence on the right (the ball bouncing off the paddle without coming in contact with it). Also be aware that the "shimmering" especially evident in the background is due to color quantization to 256 colors (You're watching [animated] GIF files), and is not a result of the subband or subsampling processes. To see only one sequence individually, click on the sequence.

Camera Pan Compensation

In a attempt to reduce blurring affects seen in a temporal subband decomposition, camera pan compensation may be used. Here, motion between consecutive frames is modeled as a single overall motion vector. Thus, motion in video due to a camera pan can be accurately predicted over the entire frame. Once the camera pan has been determined, frames are pre-distorted so that objects (or the background) in two consecutive frames are located at similar pixel locations. This not only reduces blurred motion in the low-pass subband (sum frame), but improves compression by reducing data in the high-pass subband (difference frame).

Frame from the 30 fps "flower garden" sequence

Above is a frame from the full-rate (30 fps) "flower garden" sequence. This sequence is a slow "drive-by" of this house and garden. While all motion in this sequence is moving in the same direction, it is not considered a camera pan since objects in the sequence move at different rates depending on their distance from the viewer.

Frame from the 7.5 fps "flower garden" sequence
without "camera pan compensation"

Above is a frame from the quarter-rate sequence generated from the low-pass subband. Notice that motion in the frame is blurred depending on its relative speed in the frame.

Frame from the 7.5 fps "flower garden" sequence
with "camera pan compensation"

With camera pan compensation we can see (above) that motion in the background is no longer blurred. Objects in the foreground, however, do remain blurred since their motion in the frame differs from the motion of the background. In this case, motion in the foreground appears less blurred since these object are moving in the same direction as the background.

Here, the camera pan is determined for the "current frame" using a "reference frame" by...

1) Performing edge enhancement on the current and reference frame.
2) Zero-padding the resulting images so that their width and height are a power of 2.
3) Determining the horizontal and vertical translation of the current frame that produces the highest correlation with the reference frame using the FFT.
4) Using this "coarse estimate" of the camera pan as a starting point, refine the camera pan estimation to 1/8 pel accuracy for 16x16 pixel blocks.
5) The camera pan is then determined to be the motion vector corresponding to the majority of the 16x16 blocks.

Conjecture, Current and Future Work

As can be seen above, using temporal subband decomposition to generate low rate video gives poor visual results. Even with the addition of camera pan compensation, the low rate video still does not look as good as temporal subsampling (dropping frames). Furthermore, camera pan compensation can be very computationally intensive, and is designed to improve coding for only a small subset of typical video scenes. Generally, objects in a scene move independently of each other. In most cases, the camera is panned to keep the moving object of interest still in the image. Using camera pan compensation, this object will be blurred while the background will remain as clear as possible. In addition, in cases where the camera is "zooming" the camera pan can mistakenly judge the motion to be a "pan" and provide terrible visual results such as jittering.

In order to provide frame rate scalability with improved visual results, alternative coding schemes are being considered. The use of block-based motion compensation (as in MPEG) with spatial subband/wavelet decomposition can be used to provide the desired frame rate scalability. Questions that need answered include how to best combine block-based motion compensation with spatial scalability and SNR scalability, and how to provide improved transmission error resiliency.

Related Links

Visual Communications Lab (Cornell)
Research at the UCB VIP Lab (Berkeley)
Information Systems Laboratory (Stanford)