ITU-T H.264 (also called MPEG-4 Part 10 or AVC) is the latest generation of video compression algorithm. H.264 provides significant improvement over previous compression algorithms. For example, H.264 can provide the same video quality as MPEG-2 at about one-half the data rate.
H.264 gets its improvements by providing a significant number of tools that can be used in the video compression algorithm. Some of the more significant tools are described below.
H.264 is a Discrete Cosine Transform (DCT) based algorithm like many of its predecessors. It also used I, P, and B Frames for forward and backward prediction, however, some profiles also use SI and SP Frames to allow seamless switching between steams.
H.264 gets a lot of its compression performance improvement from Intra-Frame Prediction. Normally, I-Frames consist of the input video pixels DCT transformed and quantized, similar to a JPEG frame. H.264 provides for prediction within the I-Frame using the pixels above and to the left of a block to predict the pixels within the block. A variety of prediction modes are available which can be used to match the block being coded. This prediction results in improved compression efficiency within the I-Frames.
Another tool that H.264 makes use of is variable macroblock sizes for motion compensation. Normally, compression algorithms use a 16×16 block when doing motion compensation. H.264 provides a variety of block sizes which allow finer movement of macroblocks resulting in improved predictions. The block sizes include 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4. This allows very precise positioning of block boundaries to moving objects. Motion compensation has also been reduced to quarter pixel resolution, allowing more precise alignment of the prediction blocks.
H.264 also allows multiple reference frames. Normally, a P-Frame uses a single prior frame as the basis for the prediction, and a B-Frame uses and single previous frame and a single future frame. H.264 allows up to 16 frames to be used to build up a prediction for the current frame. And weighting can be used to provide combinations of blocks from different references.
Finally, a new version of the DCT algorithm has been defined which is based on integer math as opposed to the original DCT algorithm that required floating point. This new integer DCT is based on 4×4 blocks as opposed to the original 8×8 blocks. The integer DCT provides exact encode and decode results which eliminated round-off errors.