Skip to Main Content
This paper focuses on prediction optimality in spatially scalable video coding. It draws inspiration from an estimation-theoretic prediction framework for quality (SNR) scalability earlier developed by our group, which achieved optimality by fully accounting for relevant information from the current base layer (e.g., quantization intervals) and the enhancement layer, to efficiently calculate the conditional expectation that forms the optimal predictor. It was central to that approach that all layers reconstruct approximations to the same original transform coefficient. In spatial scalability, however, the layers encode different resolution versions of the signal. To approach optimality in enhancement layer prediction, this paper departs from existing spatially scalable codecs that employ pixel domain resampling to perform interlayer prediction. Instead, it incorporates a transform domain resampling technique that ensures that the base layer quantization intervals are accessible and usable at the enhancement layer despite their differing signal resolutions, which in conjunction with prior enhancement layer information, enable optimal prediction. A delayed prediction approach that complements this framework for spatial scalable video coding is then provided to further exploit future base layer frames for additional enhancement layer coding performance gains. Finally, a low-complexity variant of the proposed estimation-theoretic prediction approach is also devised, which approximates the conditional expectation by switching between three predictors depending on a simple condition involving information from both layers, and which retains significant performance gains. Simulations provide experimental evidence that the proposed approaches substantially outperform the standard scalable video codec and other leading competitors.