It is our goal to provide visual communication services to all
users, regardless of their individual network bandwidths, quality-of-service,
or terminal capabilities. Scalable video coding algorithms are
key to enabling reliable "universal access" of visual
communications over a variety of channels. Several algorithms
have been proposed that allow scalability of video resolution,
frame rate, and visual quality (SNR). The focus of this research,
being investigated by Gregory Conklin,
is development of video coding schemes to provide frame rate scalability
with superior low-rate video quality.
Temporal Subband Decomposition
One common method of providing frame rate scalability is to apply
a temporal subband decomposition. Thus, the full-rate video can
be decoded using both the low-pass and high-pass subbands. While
half-rate video can be decoded using only the low-pass subband.
The resulting half-rate video sequence can then be passed through
a second subband decomposition providing quarter-rate video. And
so on.
Because each frame of the less than full rate video is a linear
combination of frames of the full rate video, motion in the slower
rate video tends to get "blurred". To reduce this blurring
length 2 Haar filters are used in the subband decompositions.
Effectively, this scheme generates a sum and difference sequence,
both of which can be coded and transmitted separately.
|
|
subband decomposition | temporal subsampling |
Above is a comparison of quarter rate (7.5 frames/second) video. The sequence on the left is the result of using the low-pass subband of a two level subband decomposition. That is, each frame is the average of 4 consecutive full-rate video frame. The sequence on the right is the result of subsampling frames of the full-rate sequence. That is, dropping all but every fourth frame.
Notice that burring is evident in the sequence on the left, while aliasing can be seen in the sequence on the right (the ball bouncing off the paddle without coming in contact with it). Also be aware that the "shimmering" especially evident in the background is due to color quantization to 256 colors (You're watching [animated] GIF files), and is not a result of the subband or subsampling processes. To see only one sequence individually, click on the sequence.
Camera Pan Compensation
In a attempt to reduce blurring affects seen in a temporal subband decomposition, camera pan compensation may be used. Here, motion between consecutive frames is modeled as a single overall motion vector. Thus, motion in video due to a camera pan can be accurately predicted over the entire frame. Once the camera pan has been determined, frames are pre-distorted so that objects (or the background) in two consecutive frames are located at similar pixel locations. This not only reduces blurred motion in the low-pass subband (sum frame), but improves compression by reducing data in the high-pass subband (difference frame).
![]() |
Above is a frame from the full-rate (30 fps) "flower garden" sequence. This sequence is a slow "drive-by" of this house and garden. While all motion in this sequence is moving in the same direction, it is not considered a camera pan since objects in the sequence move at different rates depending on their distance from the viewer.
![]() |
without "camera pan compensation" |
Above is a frame from the quarter-rate sequence generated from the low-pass subband. Notice that motion in the frame is blurred depending on its relative speed in the frame.
![]() |
with "camera pan compensation" |
With camera pan compensation we can see (above) that motion in the background is no longer blurred. Objects in the foreground, however, do remain blurred since their motion in the frame differs from the motion of the background. In this case, motion in the foreground appears less blurred since these object are moving in the same direction as the background.
Here, the camera pan is determined for the "current frame"
using a "reference frame" by...
Conjecture, Current and Future Work
As can be seen above, using temporal subband decomposition to generate low rate video gives poor visual results. Even with the addition of camera pan compensation, the low rate video still does not look as good as temporal subsampling (dropping frames). Furthermore, camera pan compensation can be very computationally intensive, and is designed to improve coding for only a small subset of typical video scenes. Generally, objects in a scene move independently of each other. In most cases, the camera is panned to keep the moving object of interest still in the image. Using camera pan compensation, this object will be blurred while the background will remain as clear as possible. In addition, in cases where the camera is "zooming" the camera pan can mistakenly judge the motion to be a "pan" and provide terrible visual results such as jittering.
In order to provide frame rate scalability with improved visual results, alternative coding schemes are being considered. The use of block-based motion compensation (as in MPEG) with spatial subband/wavelet decomposition can be used to provide the desired frame rate scalability. Questions that need answered include how to best combine block-based motion compensation with spatial scalability and SNR scalability, and how to provide improved transmission error resiliency.
Related Links