• Abstract

SECTION I

## INTRODUCTION

Version 1 of the High Efficiency Video Coding (HEVC) standard [1] targets applications with 4:2:0 chroma formats at 8–10 bits per sample. HEVC was also expected to be attractive for 4:2:2, 4:4:4, and higher bit-depth applications, given the improved compression efficiency for 4:2:0 applications [2]. Examples of these application scenarios include the following.

• Content production in a digital video broadcasting delivery chain: This application commonly employs 4:2:2 chroma format at 10 bits per sample.
• Storage and transmission of video captured by a professional camera: 4:4:4 chroma format and $\text{R}'\text{G}'\text{B}'$ color space may be used for this application.
• Compression of high dynamic range (HDR) content: Up to 16 bits per sample may be used in this application.
• Improved lossless compression: This is used for video signals in content preservation and/or medical imaging.
• Coding of screen content: In addition to the preceding applications, a consideration of noncamera view or mixed content in the 4:4:4 chroma format with 8 or 10 bits per sample is in the scope of emerging consumer applications such as wireless display video.

Given the above applications not covered by HEVC Version 1, the Visual Coding Experts Group (VCEG) of ITU-T and the Moving Picture Experts Group (MPEG) of ISO/IEC decided to jointly develop the RExt of HEVC, for inclusion in Version 2 of the standard [3]. The effort started at the 10th meeting of the Joint Collaborative Team on Video Coding (JCT-VC) in Stockholm in July 2012 with the establishment of an ad hoc group. Their working objective was to gather the requirements, source material, and coding conditions, to set the experiments, and examine different implementations proposed as the code base for the RExt development activities [4], [5]. The final RExt text specification draft was submitted for approval in April 2014 [6].

The RExt development was primarily guided by the design principle of extending the existing HEVC Version 1 coding tools with minimal divergence from original design intentions. Moreover, whenever deemed appropriate, new coding tools were added and the existing coding tools were modified. This was the case when considering the use of video content such as screen content or $\text{R}'\text{G}'\text{B}'$ source material, greater flexibility, and for lossless and near-lossless coding conditions.

This paper provides an overview of HEVC RExt in three steps.

1. This paper describes how the design was extended to support additional video content formats, i.e., higher bit depths and chroma formats other than 4:2:0.
2. This paper describes how improvements to compression efficiency and throughput were achieved.
3. This paper describes how the profiles and levels were defined to address various applications.

The improved compression performance of HEVC RExt over the Fidelity Range Extensions (FRExt) of H.264/AVC is demonstrated with the coding results using different types of content. The remainder of this paper is organized as follows.

Section II highlights the specific features of HEVC RExt and briefly describes the underlying design principle and tools. Section III presents the mandatory changes to the HEVC Version 1 coding tools for enabling the support of chroma formats other than 4:2:0 as well as higher bit depth. New coding tools introduced by RExt are described in Section IV, while modifications to the existing Version 1 tools are described in Section V. An overview on the RExt specific profiles and levels is provided in Section VI and a comparison of compression efficiency for HEVC RExt and H.264/AVC FRExt is presented in Section VII. Section VIII concludes this paper.

The results presented in this paper use various common test conditions (CTCs) that were established during the RExt development for coding tool evaluation. Unless otherwise stated, the results are presented using the CTC of [7]. When evaluating lossy coding performance, the results are expressed in terms of Bjøntegaard delta rate (BD-rate [8]) reductions for the luma component. For lossless coding, the results are expressed as percentage bit-rate savings and derived as TeX Source$$\frac {\text {rate}_{\text {test}} - \text {rate}_{\text {ref}}}{\text {rate}_{\text {ref}}} \cdot 100\%.$$ In both cases, the reference data for comparison are generated with any of the new tools being disabled.

SECTION II

## FEATURE HIGHLIGHTS AND DESIGN CONSIDERATIONS

The main objective of HEVC RExt is the support for 4:2:2 and 4:4:4 chroma formats and sample bit depth beyond 10 bits per sample. In addition, extended functionalities and increased coding efficiency are intended to be provided by HEVC RExt to meet particular application scenarios. These include coding of screen content and direct coding of $\text{R}'\text{G}'\text{B}'$ source material as well as coding of auxiliary pictures, such as alpha planes or depth maps, and a very high bit-rate and lossless video coding.

The following sections contain a brief summary of the key features and tools of RExt by which these objectives are achieved. For the sake of presentation, a distinction is made between coding tools that are not included in HEVC Version 1 and modifications to the existing HEVC Version 1 tools. The latter category of tools includes a further differentiation between mandatory and non-mandatory modifications to the existing HEVC Version 1 tools. As a paramount design principle in the RExt development, the introduction and modification of coding tools were only adopted when sufficient benefit was present. Specifically, the benefit against the incremental cost of any divergence from the HEVC Version 1 design was considered. Prominent examples for a conservative design decision, i.e., in favor of the existing 4:2:0 HEVC design, are given in the following list.

• Interpolation of fractional-sample positions for inter-picture prediction, where the number of filter taps is kept lower for chroma (four taps) than for luma (seven or eight taps).
• Intra-picture prediction modes, where the number of explicitly signaled prediction modes is kept significantly lower for chroma (five modes) than for luma (35 modes).
• Support for interlace coding beyond the existing metadata scheme.
• Single quadtree syntax for partitioning the luma and chroma components into coding blocks (CBs) and transform blocks (TBs).

In addition to that, the specification text [3] includes a syntax element for enabling a separate color plane coding mode for the 4:4:4 format. When this flag is enabled, each of the three color components is separately processed, which seems to contradict the fourth item in the above list. However, due to a negative cost–benefit balance in view of the targeted applications, it is not supported by any HEVC RExt profile. Moreover, a further flag improving the weighted prediction for bit depths beyond 12 bits is obsolete due to the absence of 16 bit inter-predicted profiles.

Direct coding in the $\text{R}'\text{G}'\text{B}'$ domain implies that typically $\text{G}'$ would be interpreted as luma ($\text{Y}'$), and $\text{B}'$ and $\text{R}'$ as chroma (Cb and Cr) components. Hence, the naming convention of luma for the first component and chroma for the two additional components will be retained unchanged, regardless of the underlying color space of the input signal for the rest of this paper. As a general prerequisite, it is assumed that the reader is familiar with the basic concepts and coding tools of HEVC Version 1, as presented in [1] and [9].

### A. Mandatory Modifications for 4:2:2 and 4:4:4 Support

The basic support for 4:2:2 and 4:4:4 chroma formats is achieved by modifications to the residual quadtree (RQT) interpretation for the chroma components. The following two modifications are necessary for these extended chroma formats.

1. TB Partitioning: Adaptation of the chroma TB partitioning to account for the different chroma sampling rates, horizontally and vertically, of the extended chroma formats.
2. Chroma Intra Prediction: Adaptation of the intra-picture prediction mode applied in chroma components for the 4:2:2 chroma format.

### B. New Coding Tools Introduced by RExt

Three new coding tools were integrated into HEVC RExt: two of which specifically deal with processing and coding of the chroma components, and the third tool targets the lossless and near-lossless operation modes. The two chroma-related tools can each be enabled by a separate flag in the picture parameter set (PPS), while the latter tool provides two types of operation, each of which can be activated by a separate flag in the sequence parameter set (SPS). Note that, specific options are only available for certain RExt profiles, as will be detailed in Section VI. In the following list, each of these three new coding tools is briefly introduced, while a more detailed exposition is given in Section IV.

1. Cross-Component Prediction (CCP): Based on a linear model, each chroma TB is adaptively predicted by its colocated reconstructed luma TB. This block-adaptively switched CCP tool is available only for 4:4:4 chroma format and exploits the remaining statistical dependencies between the luma and both chroma component residual signals.
2. Adaptive Chroma Quantization Parameter (ACQP) Offset: A mechanism at the coding unit level allows the signaling and application of variable offsets for the derivation of the chroma quantization parameter (QP).
3. Residual Differential Pulse Code Modulation (RDPCM): For the use with lossy and lossless coding modes, where the inverse transform (and for lossless, also the scaling) stage is skipped, a sample-based horizontal and vertical differential pulse code modulation (DPCM) for the residual signal is employed. RDPCM is activated by two types: 1) acting only on intra-picture predicted blocks and 2) acting only on inter-picture predicted blocks.

### C. Additional Modifications to the Existing HEVC Version 1 Tools

Whenever deemed reasonable in view of the above-mentioned cost–benefit design principle, the existing coding tools in HEVC Version 1 were reused and appropriately modified in order to serve the specific needs of the applications targeted by HEVC RExt. In the following list, all modifications to the existing coding tools of HEVC Version 1 are briefly highlighted.

1. Filtering for Smoothing of Samples in Intra-Picture Prediction: Filtering of samples for intra-picture prediction can be completely disabled for all components by a flag in the SPS.
2. Transform Skip Mode (TSM) and Transform Quantizer Bypass (TQB) Mode: The use of the TSM is allowed for TB sizes larger than $4\times 4$ by signaling the maximum TB size in the PPS. Moreover, by the use of two corresponding flags in the SPS, a modification of the context modeling for the significance map and a rotation of the $4\times4$ residual signal can be activated. Both modifications improve the entropy coding stage of transform skipped residual signals.
3. Truncated Rice Binarization: The use of an alternative sub-block (SB)-persistent initialization procedure for the Rice parameter, which controls the adaptive binarization process of transform coefficient levels, can be activated by a flag in the SPS.
4. Internal Accuracy and $k$th Order Exp-Golomb (EGk) Binarization: By the use of a flag in the SPS, an extended precision can be enabled for the inverse transform as well as for the coefficient level parsing process. Moreover, by the use of the same flag, an alternative EGk binarization process with limited prefix length is invoked.
5. Decoding of Bypass Bins: For increasing the throughput in high bit-depth decoding, an alignment process prior to the bypass decoding operation for transform coefficient level data can be activated by a corresponding flag in the SPS. This has the effect that multiple bypass-coded bins can be decoded by a single bit masking and shift operation, albeit at the expense of an increase in bit rate.

Modifications to the existing HEVC Version 1 tools, as briefly presented above, are only available for particular RExt profiles, similar to the new coding tools introduced by RExt. More details on the modifications themselves and on their use in specific profiles can be found in Sections V and VI, respectively.

In the following section, the mandatory and implicitly given modifications of HEVC Version 1 are presented. These modifications are tied to the use of chroma formats other than 4:2:0 and higher bit depths, and include areas such as TB structuring, scanning and scaling of transform coefficient levels, deblocking, intra-picture prediction, and sample adaptive offset (SAO).

SECTION III

## MANDATORY MODIFICATIONS OF HEVC VERSION 1 FOR 4:2:2, 4:4:4, AND HIGHER BIT-DEPTH SUPPORT

Several modifications are necessary to support 4:2:2 and 4:4:4 chroma formats. This is mainly due to different sampling structures relative to the 4:2:0 chroma format. Some changes are obvious and straightforward, whereas others are more involved.

One of the more obvious mandatory modifications is the coding and prediction block partitioning. Since a single partitioning syntax is transmitted for all components, the partitioning of the chroma components only needs to be adjusted according to the different sampling ratios. Furthermore, motion vectors, given in quarter-sample precision of the luma component, need to be horizontally scaled for 4:2:2 chroma components. The SAO filtering was adjusted removing the limitation of the scaling value to 10 bits per samples. This is achieved by introducing a flexible scaling value signaled in the PPS.

The situation is more complex when dealing with the generalization of TB partitioning and intra-picture prediction with regard to different chroma formats. Consequently, the two following sections deal with each of the issues separately.

### A. Transform Block Partitioning and Related Changes

The RQT [1], [10] determines the partitioning of CBs into TBs, for both the luma and chroma components. In HEVC Version 1, a transform unit (TU) is composed of either one luma TB greater than $4\times 4$ or four $4\times 4$ luma TBs, together with two chroma TBs and the associated syntax structures, as illustrated in the top row of Fig. 1. The reason for this behavior is that, for the 4:2:0 chroma format, the RQT is allowed to split an $8\times 8$ luma TB but not the corresponding $4\times 4$ chroma TBs, since that would lead to a subdivision into $2\times 2$ chroma TBs, which are not supported in HEVC. This, in turn, implies that the RQT syntax need not to be altered for the 4:2:2 and 4:4:4 chroma formats. Instead, it is sufficient to adapt the interpretation of the existing RQT syntax.

Fig. 1. Composition of TUs for different chroma formats and block sizes $N$ (specified in luma samples). The numbering of TBs indicates their coding order.

This reinterpretation of the RQT in terms of constituting TUs is applicable to both the 4:2:2 and 4:4:4 chroma formats and is shown in the middle and bottom rows of Fig. 1, respectively. Since in the 4:4:4 case, luma and chroma TBs always have the same spatial resolution, splitting of an $8\times 8$ luma TB also involves splitting of the corresponding $8 \times 8$ chroma TBs, which is permitted. This leads to a minimum TU size, both in terms of luma and chroma samples of $4\times 4$ (bottom row, right graphic in Fig. 1).

For the 4:2:2 chroma format, chroma components are sampled at the same rate vertically and at half the rate horizontally as compared with the sampling of the luma component. This results in a rectangular array of chroma samples for each chroma component, as depicted in the middle, gray-shaded row of Fig. 1, and thus, would result in rectangular-shaped chroma TBs for each TU. Instead of introducing rectangular transform logic, the pre-existing square transform logic is reused by splitting the rectangular arrays of chroma samples into two square TBs for each chroma component: a pair of a top and a bottom TB for each chroma component [11]. This also implies that two coded block flags are required to control the TBs for a given chroma component. Similar to the 4:2:0 chroma format, a TU in the 4:2:2 chroma format is composed of either one luma TB greater than $4\times 4$ or four luma TBs, each of size $4\times 4$, but together with two pairs of chroma TBs. Fig. 1 also illustrates the coding order of all TBs within a TU for each of the different chroma formats and block sizes.

During the development of RExt, deblocking across the boundaries of a pair of reconstructed TBs in 4:2:2 chroma format (as illustrated by dashed lines in the middle row of Fig. 1) was deemed unnecessary, thereby minimizing the changes between the 4:2:0 and 4:2:2 design. In terms of transform coefficient scanning, additional, nondiagonal scan orders are made available for $8\times 8$ chroma TBs in the 4:4:4 chroma format, whereas for 4:2:0 and 4:2:2 chroma formats only $4\times 4$ chroma TBs can use nondiagonal scan orders.

Another modification for the 4:4:4 chroma format is required due to the introduction of $32\times 32$ chroma TBs, which are not present in HEVC Version 1. If quantization matrices (or scaling lists as denoted in the HEVC specification text [3]) are used, the matrix for this block size is derived from the matrix associated with $16\times 16$ chroma TBs [12]. This approach avoids signaling a separate scaling list for $32\times 32$ chroma TBs, thus minimizing the overhead in the PPS.

### B. Intra-Picture Prediction and Related Changes

In HEVC Version 1, no distinction is made between luma and chroma components regarding the interpretation of modes of intra prediction, because all color components utilize the same ratio between horizontal and vertical sampling rates. For instance, a mode that corresponds to predicting along a line 45° to the horizontal will utilize the same intra-prediction processing for the luma and chroma components and will predict along a 45° line in their respective arrays of samples. However, for 4:2:2, due to the different horizontal and vertical sampling rates in chroma components, the approach taken for 4:2:0 would result in, e.g., a 45° line through the array of chroma samples corresponding to a 27° line through the array of luma samples, and vice versa.

Although modifying the chroma intra-picture prediction process was considered [4], it was decided that modifying the mode passed into the prediction process for chroma would minimize the divergence from the HEVC Version 1 design. A mapping table has therefore been introduced [13], which modifies the chroma prediction mode to compensate for the difference in sampling rates used in the 4:2:2 chroma format. This mapping is also used when determining the coefficient scanning pattern for $4\times 4$ chroma TBs in the 4:2:2 chroma format.

SECTION IV

## NEW CODING TOOLS INTRODUCED BY REXT

Three dedicated tools are introduced by RExt, namely, CCP, ACQP offset, and RDPCM. Both CCP and ACQP target the chroma components, and the latter increases the flexibility for controlling the chroma QPs, whereas CCP is a purely compression efficiency coding tool. RDPCM was already included in H.264/AVC, but its application space is extended to include the lossy operation mode for HEVC. The detailed aspects and technical description are given for the aforementioned tools in the following section.

### A. Cross-Component Prediction

Statistical dependencies among the components of color spaces having absolute amplitudes (e.g., $\text{R}'\text{G}'\text{B}'$) are usually exploited by representing the video data in color spaces with the chroma components having amplitudes relative to the luma component, such as $\text{Y}'\text {C}_{b}\text {C}_{r}$. However, a small but still significant correlation, especially locally, typically remains after a fixed color space conversion. Furthermore, it is desirable for some applications, e.g., screen content, to directly compress in $\text{R}'\text{G}'\text{B}'$. In order to target this situation, linear luma to chroma prediction schemes were proposed to the JCT-VC. In linear model schemes, the prediction result $y$ is a weighted amount of the predictor’s value $x$ and an offset value $\beta$, as denoted in the following formula with $\alpha$ and $\beta$ being the model parameters:TeX Source$$y = \alpha \cdot x + \beta .$$

Different approaches to the design of such schemes were investigated during the RExt development, including the linear model chroma [14] and a residual-based [15] approach that had previously been proposed for the HEVC Version 1. Two main observations were made during the development.

1. Backward-adaptive approaches would burden the decoder with additional complexity while resulting in almost the same compression efficiency as forward signaling techniques [16].
2. For the subsampled chroma formats, the bit-rate reductions are lower than in 4:4:4 and a specification of up- (chroma) or down-sampling (luma) filters, to align the difference in spatial dimension between the luma and the chroma blocks, would be necessary.

Consequently, the forward-driven scheme in [17], including a modification mainly to the syntax element binarization in [18], finally led to the CCP specification used in all 4:4:4 RExt profiles of HEVC Version 2.

CCP operates in the spatial residual domain, and the slope parameter $\alpha$ of the linear model is transmitted in the bitstream for each chroma TB [19], [20] within a TU. It is sufficient to transmit only the slope parameter because it is assumed that the offset parameter $\beta$ is always close to zero. Specifically, it is assumed that the expected values of residual signals are equal to zero. Furthermore, due to the level in which the prediction is applied, i.e., for the RQT leaves, CCP can be effectively applied to a partial area of the prediction unit (PU), or for multiple PUs, when the CU is inter predicted. In the following sections, a detailed description is given for the residual reconstruction process, the slope parameter coding, and how the reference software encoder derives the slope parameter during its rate–distortion optimization process.

#### 1) Chroma Residual Reconstruction

From the decoder’s perspective, after the parsing and the reconstruction of the slope parameter $\alpha$ and the quantized residuals for a chroma TB, the chroma residuals are modified as follows when the luma and the chroma sample bit depths are equal:TeX Source$$r_{\text {chroma}} = \hat {r}_{\text {chroma}} + \left \lfloor{ \frac {\alpha \cdot \hat {r}_{\text {luma}}}{8}}\right \rfloor$$ where $r$ denotes the final residual sample and $\hat {r}$ denotes the residual sample reconstructed from the bitstream. Note that the luma residuals are unchanged, i.e., $\forall r_{\text {luma}} \in \text {TB}_{\text {luma}} : r_{\text {luma}} = \hat {r}_{\text {luma}}$. In the case of unequal bit depths between luma and chroma, the luma residuals, i.e., the predictor signal, are adjusted to the chroma bit-depth before the multiplication operation. The application of CCP does not take place when $\alpha = 0$.

#### 2) Syntax Signaling

Up to two syntax elements are transmitted for each chroma TB when the corresponding luma TB (i.e., at the same spatial location) consists of transmitted residuals (i.e., $\exists \hat {r}_{\text {luma}} \in \text {TB}_{\text {luma}} : \hat {r}_{\text {luma}} \neq 0$). The syntax element log2_res_scale_abs_plus1 specifies the absolute value of $\alpha$ and the syntax element res_scale_sign_flag specifies the sign when $\alpha \neq 0$. log2_res_scale_abs_plus1 is transmitted using truncated unary binarization [21], with a cutoff value equal to four, and $|\alpha |$ is reconstructed if ${log2\_{}res\_{}scale\_{}abs\_{}plus1} \neq 0$ as TeX Source$$|\alpha | = 2^{ {log2\_{}res\_{}scale\_{}abs\_{}plus1} - 1}.$$ Due to the truncated unary binarization and the reconstruction rule in (4), the permitted values for $\alpha$ are $\{ 0, \pm 1, \pm 2, \pm 4, \pm 8 \}$. Furthermore, in combination with the normalization as denoted in (3), the slope factor is effectively in $\{ 0, \pm ({1}/{8}), \pm ({1}/{4}), \pm ({1}/{2}), \pm 1 \}$.

In total, up to five context-coded bins are transmitted, i.e., up to four bins to specify log2_res_scale_abs_plus1 and optionally one bin for res_scale_sign_flag. For each bin, a separate context model is employed and different context model sets are used for each chroma component. This context modeling scheme was chosen due to the different probability distributions of the slope parameter for different input color spaces and different chroma components. For example, the distribution of the slope parameter is concentrated around 0 for $\text{Y}'\text {C}_{b}\text {C}_{r}$ content, while the distribution is concentrated close to ±1 for $\text{R}'\text{G}'\text{B}'$ content. In this context, a finer quantization of absolute slope parameter values greater than 1/2 results in insignificant improvement for $\text{R}'\text{G}'\text{B}'$ content, leading to the nonuniform permitted values of $\alpha$ as a balanced tradeoff between different slope parameter distributions and signaling overhead.

#### 3) Rate–Distortion Optimization

In general, the best $\alpha$ in the rate-distortion (RD) sense has to be derived by the encoder. A brute-force strategy, i.e., to evaluate the RD cost for all permitted $\alpha$ values can be expensive in terms of run time for software or logic for hardware. Therefore, the HM reference software implementation employs an algorithm to reduce the combinations tested to two: the RD cost for $\alpha = 0$ (i.e., CCP is disabled for the current chroma TB) is evaluated and compared against that for $\alpha = \alpha _{c}$, where $\alpha _{c}$ derived as TeX Source\begin{align} \alpha _{1}=&\frac {\text {cov}( \mathbf {r}_{\text {luma}}, \mathbf {r}_{\text {chroma}})}{\text {var}( \mathbf {r}_{\text {luma}} )} \\ \alpha _{c}=&\text {sign}(\alpha _{1}) \cdot \text {LUT}_\alpha (|\alpha _{1}|) \\ \text {LUT}_\alpha (x)=&\begin{cases} 0, & {x < \dfrac {1}{16}} \\[6pt] 1, & {x \in \left [{\dfrac {1}{16}, \dfrac {3}{16}}\right )} \\[8pt] 2, & {x \in \left [{\dfrac {3}{16}, \dfrac {3}{8}}\right )} \\[8pt] 4, & {x \in \left [{\dfrac {3}{8}, \dfrac {3}{4}}\right )} \\[8pt] 8, & {x \geq \dfrac {3}{4}}. \end{cases} \end{align} In the above equations, cov and var are the approximations of empirical estimators for the covariance and the variance, respectively, i.e., the implementation assumes the expected values $E(\mathbf {r}_{\text {luma}})$ and $E(\mathbf {r}_{\text {chroma}})$ to be equal to 0 due to the signal being residual errors, and hence, the calculation of the mean is skipped. Furthermore, $\mathbf {r}$ denotes a vector consisting of all residual samples for the corresponding TB. The intermediate value $\alpha _{1}$ is quantized to the permitted values of $\alpha$ using the lookup table $\text {LUT}_\alpha$.

#### 4) Reported Performance

CCP provides BD-rate reductions for all CTCs. The most notable results are the improvements for $\text{R}'\text{G}'\text{B}'$ and screen content. BD-rate reductions of 13%–18% and 21%–26%, for bit-rate ranges targeting consumer and professional applications, are reported in [19] for regular camera captured and for screen $\text{R}'\text{G}'\text{B}'$ content, respectively. For the corresponding content in 4:4:4 $\text{Y}'\text {C}_{b}\text {C}_{r}$, the BD-rate reductions are 0.2%–1.4% for regular content and 1.5%–3.6% for screen content. All reported values were generated using the random access prediction structure. Further evidence is given in [17] that, by appropriately modifying the encoder control, the direct coding of $\text{R}'\text{G}'\text{B}'$ content may result in BD-rate savings relative to that of $\text{Y}'\text {C}_{b}\text {C}_{r}$ content.

### B. Adaptive Chroma QP Offset

HEVC includes mechanisms to signal and vary the luma QP used for scaling transform coefficients prior to application of the inverse transform. One technique is referred to as delta QP and is applied at the CU level. In general terms, a chroma QP for a given TB is subsequently derived from the luma QP using (in Version 1) per-component offsets signaled in both the PPS and in the slice header. During the RExt development, several use-cases were given in [22] and [23] showing that increased flexibility could be desirable for non-4:2:0 chroma formats.

RExt extend the Version 1 functionality by providing an additional CU-level signaling mechanism for the chroma QP derivation process, used in all 4:2:2 and 4:4:4 RExt profiles. To avoid the potentially expensive overhead of frequently signaling an absolute offset, a table comprising up to six predefined pairs of offsets can be signaled in the PPS. Each pair defines two independent ACQP offsets, one for each chroma component, with each offset being in the range of −12 to 12, inclusive.

Each CU may control the application of any ACQP offset, wherein the first TU with a coded chroma residual may signal an enabling flag and an index into the offset table. Similar to the encoding mechanism of delta QP, a maximum CU depth at which an index may be signaled is configured in the PPS. All CUs below this maximum depth use the offset most recently signaled in CU scan order, unless within a CTU no offset has previously been signaled. No signaling occurs for CUs using the TQB mode.

Two context models are used for the coding of the syntax elements relating to ACQP. One context model is dedicated for the coding of the enabling flag and another is used for all bins resulting from the truncated unary binarization of the index.

ACQP provides greater flexibility to encoder designers over the Version 1 design, which required the slice-level or PPS-level QP offsets to be decided prior to coding a slice (wherein, all CUs used the selected offset). In addition, ACQP may be used to extend the maximum allowed QP variation in Version 1, where the sum of the slice and PPS QP offsets for a given component must be in the range −12 to 12, inclusive. When ACQP mode is enabled, QP offset range is increased to −24 to 24, inclusive.

### C. Residual DPCM

HEVC Version 1 specifies two modes of operation that allow the transform stage to be bypassed while retaining the use of the entropy coding stage, namely, TSM and TQB. Both modes reflect the demand for simple but effective support for particular applications in HEVC Version 1. TSM was introduced to improve the compression efficiency for screen content and its usage can be signaled for $4\times 4$ TBs and TQB bypasses both the transform and quantization stages and provides the option to compress a CU without distortion, i.e., losslessly. A detailed description on the HEVC Version 1 lossless coding mode is available in [24].

However, for advanced consumer and professional applications, such as desktop sharing using wireless displays and archiving, higher compression efficiency is desirable and new coding tools were investigated during the development of RExt. RDPCM, as specified in H.264/AVC FRExt, was initially considered as a starting point. During the RExt development, this tool was extended to include the support for lossy coding and use in inter-picture predicted blocks, improving the coding efficiency and helping to address the aforementioned applications. Although RDPCM introduces limited serialization to the processing, parallelism is possible across rows and columns.

#### 1) Lossless Operation Mode

RDPCM is the application of sample-based reconstruction along either the horizontal or vertical directions to reduce the redundancy among residuals. From encoder’s perspective, let $r(x,y)$ be the elements of an $N\times N$ residual block, and let $\tilde {r}_{d}(x,y)$ be the residuals obtained after applying RDPCM along a direction $d$, with $d$ being either horizontal (hor) or vertical (ver). In lossless coding mode, i.e., TQB is selected, $\tilde {r}_{\text {hor}}(x,y)$ and $\tilde {r}_{\text {ver}}(x,y)$ are given as TeX Source\begin{align} \tilde {r}_{\text {hor}}(x,y)=&\begin{cases} r(x,y), & {x = 0} \\ r(x,y) - r(x-1, y), & {\mbox {otherwise}} \end{cases} \\ \tilde {r}_{\text {ver}}(x,y)=&\begin{cases} r(x,y), & {y = 0} \\ r(x,y) - r(x, y-1), & {\mbox {otherwise}}. \end{cases} \end{align} Reconstruction by the decoder is the output of accumulators that sum up residuals samples over the column or row for vertical or horizontal directions, respectively.

#### 2) Lossy Operation Mode

In lossy coding mode with TSM applied to a given TB, an encoder would generally use reconstructed samples when performing RDPCM. Let $\hat {r}(x,y)$ denote the reconstructed residual sample, i.e. after inverse quantization, at spatial location $(x,y)$. Then, $\tilde {r}_{\text {hor}}(x,y)$ and $\tilde {r}_{\text {ver}}(x,y)$ are given as follows, where $Q(\cdot )$ denotes the quantization operator:TeX Source\begin{align} \tilde {r}_{\text {hor}}(x,y)=&\begin{cases} Q(r(x,y)), & {x = 0} \\ Q(r(x,y) - \hat {r}(x-1, y)), & {\mbox {otherwise}} \end{cases} \\ \tilde {r}_{\text {ver}}(x,y)=&\begin{cases} Q(r(x,y)), & {y = 0} \\ Q(r(x,y) - \hat {r}(x, y-1)), & {\mbox {otherwise}}. \end{cases}\qquad \end{align} Again, reconstruction by the decoder is the output of accumulators that sum up the scaled residual samples over the column or row for vertical or horizontal directions, respectively. Moreover, to alleviate encoder complexity, sign data hiding [25] is disabled when RDPCM is applied for lossy coding.

#### 3) Implicit and Explicit RDPCM

RExt provide two types of RDPCM: implicit and explicit, depending on how the direction $d$ is derived at the decoder. Implicit RDPCM is applied only for intra-predicted blocks whose prediction direction is either horizontal or vertical. For implicit RDPCM, $d$ corresponds to the prediction direction and no signaling is required. Conversely, explicit RDPCM is applied only to inter-predicted blocks and $d$ is signaled in the bitstream, since no implicit direction can be inferred from other PU data. When implicit RDPCM is enabled, boundary smoothing for horizontal and vertical intra-prediction directions is disabled for TQB CUs.

For each TB and color component, a flag is coded to indicate whether RDPCM is applied, and, if this is the case, a second flag indicates the direction. The luma and chroma components use a separate context model set for each flag. Both implicit and explicit RDPCM can be enabled at the sequence level by configuring two flags (implicit_rdpcm_enabled_flag and explicit_rdpcm_enabled_flag) in the SPS.

#### 4) Reported Performance

The compression efficiency for both implicit and explicit RDPCM [26]–[27][28] is assessed over the test set and coding configurations agreed for RExt development [7]. Average bit-rate savings up to 5.7% has been reported for lossless coding and bit-rate reduction up to 3.3% for lossy coding mode, respectively. The improvements are mostly achieved for $\text{R}'\text{G}'\text{B}'$ screen content materials. Implicit and explicit RDPCM for lossy coding generally provides a more significant BD-rate reduction for screen content than other forms of content, since TSM is often selected in this category of test material, thereby allowing RDPCM to be applied more frequently.

SECTION V

## EXTENSIONS OF HEVC VERSION 1 TOOLS

The introduction of dedicated coding tools increases the compression efficiency and extends the flexibility for applications such as $\text{R}'\text{G}'\text{B}'$ content or lossless compression. However, modifications to the existing coding tools, present in the Version 1 design, can also improve the compression efficiency for applications targeted by RExt, e.g., on higher bit rates/depths (including lossless and near-lossless), chroma sampling formats other than 4:2:0, and different input characteristics, such as screen content. These modifications are chosen due to their good balance between compression efficiency improvement and cost in terms of complexity or design changes. They are clustered into four different categories in the following description: smoothing for intra prediction, TSM and TQB, Truncated Rice binarization, and support for high bit-rate/-depth coding.

### A. Smoothing for Intra Prediction

In the Version 1 design, neighboring reference samples may be smoothed prior to intra prediction using predefined low-pass filters. This filtering process depends on the used intra-prediction mode or direction and results in improved RD performance for lossy operation points. The chroma signal is generally already subsampled, often using a low-pass filter, and hence the filtering process of chroma reference samples would not result in improved compression efficiency. Accordingly, the filtering process is not applied to the reference samples of the chroma components in 4:2:0 and 4:2:2 chroma formats. However, this is not the case for the 4:4:4 chroma format, leading to the use of the luma filtering process being applied to the chroma components.

In addition to these implicit modifications, a flag included in the SPS provides the capability to completely disable the filtering process. This can be suitable for screen content that contains different signal characteristics or lossless applications.

### B. Transform Skip and Transform Quantizer Bypass

To fulfill the demand for improved compression performance of screen content without the introduction of additional dedicated coding tools, TSM is not restricted to $4\times4$ TBs in all 4:4:4 profiles and in the 16 bit monochrome profile. This is achieved by the introduction of a syntax element (log2_max_transform_skip_block_size_minus2) in the PPS, controlling the maximum TB size for which TSM can be used. Furthermore, scaling lists are not applied to TBs using TSM, with the exception of $4\times 4$ TBs in order to keep compatibility with the Version 1 design.

Modifications in the entropy coding stage for TSM and TQB mode further improve the compression efficiency. Two extensions were introduced to reflect the fact that the residual signal is not compacted anymore, i.e., residual signal energy is no longer concentrated in the top-left residual coefficients of a TB, due to the absence of the transform stage. Both modes are controlled by flags introduced in the SPS for RExt.

#### 1) Context Modeling for the Significance Map

A significance map specifies the presence of nonzero valued residual samples (transform coefficient levels when using transforms) for each spatial location within a TB, and is scanned using predefined scan patterns. When the transform stage is bypassed, the probability of significance does not increase for low-frequency scan positions in the TB. Instead, the probability of significance tends to be uniform across all scan positions in the TB. In order to avoid interference with the context models used for coding the significance maps of TBs when the transform stage is not bypassed, a separate single context model can be employed for the coding of the significance map when TSM or TQB are used [29].

#### 2) Rotation of Residual Samples

Without the energy compacting property of the transform, the following is observed for intra predicted $4\times 4$ TBs using either TSM or TQB: the absolute magnitudes of the residual samples are usually greater with increasing spatial distance from the top and left border of the TB. The reason is that the predictor signal, i.e., the reference samples locating at the top and left border of the TB, tends to become less accurate with increasing spatial distance. In order to exploit this observation, the residual samples are rotated by 180°, which is equivalent to a horizontal plus vertical flipping of the TB. The result is a statistical model for the absolute residual samples that is similar to that of the absolute transform coefficient levels, and can be exploited by the existing binarization and context modeling approach of the Context-Based Adaptive Binary Arithmetic Coding (CABAC) design in Version 1. Note that the reordering can be realized by applying a forward direction to the existing scan patterns, i.e., without increasing the memory storage requirements.

#### 3) Reported Performance

The dedicated context modeling of the significance map and the reordering of the residual samples improve the coding efficiency in use cases in which TSM and TQB are often employed, i.e., for screen content and lossless compression. Both modifications result in reported bit-rate savings of up to $\sim 0.6$%, and of up to $\sim 2.4$% in lossless operation mode [29].

### C. Truncated Rice Binarization

For applications targeted by RExt, i.e., increased bit rates/depths and enhanced screen content support, the model distribution of transform coefficient levels is usually maintained but has different distribution parameters, e.g., the absolute transform coefficient levels tend to be larger for such applications. This aspect was addressed while maintaining the entropy coding structure of HEVC Version 1, by only adjusting the controlling parameters of the adaptive binarization of absolute transform coefficient levels.

#### 1) Version 1

In general, TBs larger than $4\times 4$ are always divided into $4\times 4$ processing units, referred to as SBs [30], for both binarization and context modeling. The binarization of absolute transform coefficient levels specified for CABAC in Version 1 is backward adaptively controlled by previous absolute levels within the same SB. This adaptive and combined Truncated Rice/Exp-Golomb binarization was introduced in Version 1 to increase the number of bins coded in the low-complexity bypass mode of CABAC while maintaining RD performance [31]. For the consumer application oriented operation points, for which the Version 1 had been developed, it is sufficient to initialize the Rice parameter $k$ equal to 0 ($k_{\text {init}} = 0$) at the beginning of each SB. Within each SB, $k$ is updated as follows with $k_{\text {max}} = 4$ and $c$ being the reconstructed absolute transform coefficient level:TeX Source$$k_{\text {next}} = \begin{cases} \mbox {min}( k_{\text {max}}, k + 1 ), & {c > 3 \cdot 2^{k}} \\ k, & {\mbox {otherwise}}. \end{cases}$$ The rule in (12) is modified as follows in all 4:4:4 and 16 bit RExt profiles.

#### 2) Modification of Truncated Rice Binarization

Due to the changes in the distribution parameters of absolute transform coefficient levels for screen content and high bit rates/depths, the restriction on $k_{\text {max}}$ is removed. Furthermore, based on the fact that the first absolute transform coefficient level within an SB tends to be larger than for Version 1 applications, the initialization is modified as follows. Let $s$ be a counter of a set containing four elements selected according to the current TB’s color component (luma/chroma) and whether the block has been transformed. Then, for each SB of the current TB, $k_{\text {init}}$ is derived based on the counter $s$ of the same category as TeX Source$$k_{\text {init}} = \left \lfloor{ \frac {s}{4} }\right \rfloor .$$ The counter is updated at most once per SB using the value of the first coded coeff_abs_level_remaining syntax element, denoted by $\omega$, of the SB as TeX Source$$s_{\text {next}} = \begin{cases} s + 1, & {\omega \geq 3 \cdot 2^{\left \lfloor{ \frac {s}{4}}\right \rfloor }} \\ s - 1, & {2 \cdot \omega < 2^{\left \lfloor{ \frac {s}{4}}\right \rfloor } \wedge s > 0} \\ s, & {\mbox {otherwise}}. \end{cases}$$ The counter values are treated similar to the context models of CABAC. They are initialized to be equal to 0 whenever the context models of CABAC are initialized.

#### 3) Reported Performance

The modified Truncated Rice binarization achieves bit-rate savings when operating at high bit rates and being applied to screen content. For 8- and 10 bit lossless configurations, the bit-rate saving is up to 3.6% for regular content, while for screen content the bit-rate saving is up to 21%. For the corresponding lossy configurations, bit-rate savings up to 4% have been observed [32].

### D. High Bit-Depth and High Bit-Rate Coding

High bit-depth applications, such as dealing with medical content or some output of specialized imaging sensors typically, employ up to 16 bits per sample, hence, RExt supports the coding of up to 16 bit input. Furthermore, coding of such data at a very high quality level contributes to very high bit rates.

During the RExt development, the CTC [7] was extended to include a test condition for very high data rate applications. The quality level being targeted implies that the peak signal-to-noise ratio (PSNR) should be increased at a rate of approximately 6 dB per additional input source bit. Moreover, at the operating point under consideration for these applications, the serial nature of CABAC and its throughput required examination. At this operating point, the transform coefficient levels are the dominant portion of the bitstream, compared to the CU, PU, TU signaling, prediction mode, and prediction parameters. Therefore, to increase the achievable throughput, it is sufficient to address the coding of transform coefficient levels. In particular, due to the Truncated Rice binarization scheme, only bypass-coded bins need to be considered. HEVC RExt include three additional extensions to handle high bit-depth and high bit-rate coding, as described in the following sections.

#### 1) Internal Accuracy

The output of the transform coefficient level parsing, scaling, and intermediate values between stages of the inverse transform process are clipped to signed 16 bit integers in HEVC Version 1. Moreover, the scaled coefficients passing through the inverse transform are independent of the bit depth. Then, at the output stage of the inverse transform, a shift operation is applied that normalizes the data to the correct range for the selected bit depth. This approach maximizes the precision obtained from multipliers present in hardware or software implementations for different bit depths.

Although sufficient for 8- and 10-bit video data, an internal representation with restrictions to 16 bits would not be suitable for high bit-depth input. The limit is increased for all 16-bit RExt profiles by provision of an extended precision mode. When the extended precision mode is enabled, the internal accuracy and the maximum coded transform coefficient value is increased to $\max (16,\textit {bitDepth}+7)$ (signed) bits. Maintaining this increased accuracy through the inverse transform stage was shown to improve the linearity between PSNR and bit-rate, which is beneficial for rate control [33].1

Although explored in [33], the inverse transform matrix coefficients are not altered by the use of the extended precision mode. Instead, it is strongly recommended that an encoder uses a higher precision forward transform. The default forward transform uses the same 6-bit transform matrix coefficients as used in the inverse transform specified in HEVC Version 1, and provides sufficient performance for 8- and 10-bit operation. The higher precision forward transform affords a more accurate representation of the inverse of the inverse transform specified in HEVC [34]. Note that the high bit-depth coding conditions CTC [7] mandates the use of the higher precision forward transform, as provided in the HM reference software. The effect of using the higher precision forward transform over using the default forward transform (as used during the development of HEVC Version 1) is illustrated in Fig. 2. Data points for Fig. 2 were generated without the use of nontransformed coding paths, i.e., TSM was disabled. Fig. 2 shows that the luma PSNR for the 6-bit forward transform matrix coefficients plateaus as the bit rate increases. This is due to the mismatch between the default forward transform and the inverse transform. For 14-bit matrix coefficients, as used in the higher precision forward transform, no such limitation exists. Thus, a linear relationship between PSNR and bit-rate is maintained.

Fig. 2. Effect of forward transform coefficient matrix accuracy (DCT- and DST-based) on compression performance generated using HM 16.2 using a range of QPs and an all-intra configuration.

#### 2) Binarization of Transform Coefficient Levels

If the HEVC Version 1 binarization of transform coefficient levels would be used when extended precision mode is enabled, then the maximum bypass code length (although extremely rare) would be 46 bypass-coded bins. To reduce decoder complexity, when extended precision mode is enabled, a different coefficient binarization is utilized that limits the length of the Exp-Golomb prefix according to the internal accuracy. If the prefix limit is reached, the suffix length is set to a value that can represent the remaining bins. As a consequence, the maximum number of bypass-coded bins required to code a coefficient is limited to 32. This tool has negligible impact on actual coding performance [35], [36] but curtails the worst case coefficient codeword length to that of HEVC Version 1.

#### 3) Coding of Bypass-Coded Bins

As the data rate increases, the bitstream consists of proportionally more bypass-coded bins, and their processing becomes a significant overhead for software and hardware implementations. The coding of these bypass-coded bins was therefore examined during the RExt development in the context of very high bit-rate.

CABAC utilizes two internal states: the 9-bit $\textit {ivlCurrRange}$ and the current $\textit {ivlOffset}$. To decode a bypass bin $\textit {binVal}$, the bitstream feeds into the lower bits of $\textit {ivlOffset}$, as denoted by the function read_bits(1), as bits are consumed by the process:

This serial conditional subtraction process complicates implementation at very high data rates, e.g., hardware implementations require long sequential logic paths. To reduce the implementation design complexity at very high data rates, $\textit {ivlCurrRange}$ is set to 256 immediately prior to the coding of the coefficient bypass bins. This adjustment to the range allows simplification of the conditional subtraction process to bitwise expressions: the $n$ bypass bins to be decoded are directly visible in a concatenation of the CABAC $\textit {ivlOffset}$ variable and the bitstream. As a consequence, decoding of bypass-coded bins can be implemented using a shift register:

When the bit stream is aligned, the top bit of $\textit {ivlOffset}$ will always be 0, since it is a requirement and property of the general CABAC coding process that $\textit {ivlOffset} < \textit {ivlCurrRange}$ before and after each CABAC operation. Hence, the $n$ decoded values are in the top-but-one $n$ bits of $\textit {ivlOffset}$.

This bypass alignment mode in the CABAC coding process causes a small BD-rate penalty for the benefit of simplifying high-throughput design of the entropy decoder. To alleviate the BD-rate penalty, the bypass alignment mode is only applied immediately before the bypass coding of coefficient data in each $4\times 4$ SB when the bypass data includes coefficient magnitude data (not simply sign bits). This provides an upper limit of 16 conventionally coded bypass-coded bins per $4\times 4$ SB, which occurs when an SB includes only sign bits as bypass-coded data; for an SB, where the bypass-coded data includes magnitude data, all bypass-coded bins in that SB are aligned. The conditional application of the bypass alignment mode results in a BD-rate increase of 0.5% for the high bit-depth coding conditions, and up to 1% for the All-Intra RExt test conditions [37], [38].

SECTION VI

## PROFILES AND LEVELS

The set of coding tools specified in RExt for HEVC is not necessarily required by each of the different application scenarios. Moreover, mandating a decoder to implement all the tools would be prohibitively expensive for many applications. In order to keep the decoder complexity appropriate for different application scenarios, video coding standards such as HEVC define subsets of tools known as profiles.

Profile definition limits decoder complexity in terms of support for various coding tools. In addition, aspects such as the coded picture buffer (CPB) size, picture size, frame rate, and bit rate are constrained using a combination of a level and a tier. The following section presents the additional profiles defined in RExt, grouping them according to the supported chroma formats. A description of the levels concludes this section.

### A. Profiles

Version 2 specifies 21 profiles for RExt, in addition to the Main, Main 10, and Main Still Picture profiles of Version 1, to cover a wider spectrum of video coding applications. In both versions of HEVC, the profiles are defined to generally form an onion-like structure in terms of bit depth, chroma sampling format, and permitted prediction modes (intra prediction or both intra prediction and inter prediction), as illustrated in Fig. 3. Specifically, a decoder conforming to a profile supporting a given bit depth and chroma sampling format must also be able to decode bitstreams encoded with a profile having a lower supported bit depth or chroma sampling format (lower spatial dimension for chroma), or both. However, two exceptions to this rule exist. Although monochrome profiles are introduced in RExt, the definition of Version 1 profiles is unchanged and hence they are not considered as a subset of the pre-existing Version 1 profiles (as denoted in the inter-profile coordinates of Fig. 3). This allow existing and conforming Version 1 decoders to be also conformant with the Version 2. Furthermore, the High Throughput 4:4:4 16 Intra profile does not follow the aforementioned onion-like structure due to the different application spaces that necessitated a modified entropy coding design.

Fig. 3. Illustration of the onion-like structure of the specified RExt profiles. Intra profiles are only permitted to use intra-picture prediction, while inter profiles are permitted to use both intra-picture and inter-picture prediction. The arrows denote the inclusion of a profile specifying a lower bit-depth or a chroma format having lower spatial dimension in chroma, or both. An exception is given for the Version 1 profiles, which do not include the RExt monochrome profiles for backward compatibility, and the high-throughput profile. Furthermore, the illustration shows that almost all RExt coding tools are available for 4:4:4 and 16-bit profiles, ACQP for 4:2:2 and 4:4:4 profiles, and extended precision for 16-bit profiles.

A detailed overview of the RExt profiles is given in Table I. Table I lists all profiles defined in RExt and Version 1, along with their maximum bit depth, supported chroma format, and associated coding tool options, with the latter also illustrated in Fig. 3. In general, ACQP is specified for 4:2:2 and 4:4:4 profiles, CCP, RDPCM, and modifications to TSM, TQB, and Truncated Rice binarization are specified for 4:4:4 and 16-bit profiles. The two high bit-rate/depth tools can be found in the 16-bit profiles, also shown in Fig. 3 and Table I. Moreover, in Table I, the profiles are divided into two main categories: video profiles and still picture profiles, the latter specifying a class of bitstreams, each consisting of a single picture. Video profiles are further classified into intra and inter, where profiles categorized as intra are prohibited from using inter-picture prediction. In the following description, relevant applications for the RExt profiles are briefly discussed.

Table I Profiles Defined in HEVC Version 1 and RExt and the Availability of Features

#### 1) Monochrome Profiles

Monochrome content, i.e., content having one color component, is used in magnetic resonance imaging applications, where high bit depths (usually greater than 10) are used and lossless or near-lossless compression is required to avoid coding artifacts that could interfere with diagnosis. In these applications, however, high compression efficiency remains desirable and therefore RDPCM should be used. Other examples of monochrome content are alpha channels for video editing and depth maps for 3D video coding. These signals usually have 8 bits per sample and do not require particular coding tools to achieve good compression efficiency. To address all applications using monochrome content, the Monochrome, Monochrome 12, and Monochrome 16 profiles have been specified, whereby the Monochrome profile is expected to be mainly used for compression of alpha channels and depth maps while the remaining two profiles are suitable for applications such as medical imaging, which require high bit depths and compression efficiency.

The monochrome profiles are not limited to the coding of video with one color component, they can also be used as auxiliary layers as defined in the scalable extension of Version 2 [3]. Examples of information conveyed in auxiliary layers include alpha planes and 3D depth maps, with associated side information (e.g., value for opaque and transparent samples in alpha planes) transmitted using dedicated supplemental enhancement information messages.

#### 2) 4:2:0 and 4:2:2 Profiles

Applications using 4:2:0 and 4:2:2 video content that are not covered by the Main or Main 10 profiles of Version 1 are generally related to broadcast, e.g., for content contribution and distribution.

For contribution, the 4:2:2 chroma format is commonly used, with bit depths of up to 12 bits per sample. Content with 4:2:0 chroma format at 12 bits could be included for distribution applications, e.g., for future ultra high-definition (UHD) services, where HDR video is expected to be considered. Both contribution and distribution generally deal with camera captured content, which does not significantly benefit from the RExt coding tools (although ACQP may be beneficial).

To address these application scenarios, the Main 12, Main 4:2:2 10, and Main 4:2:2 12 profiles have been included, with two variants; one supporting intra-picture prediction only and another supporting both intra-picture and inter-picture prediction.

#### 3) 4:4:4 and High-Throughput Profiles

High fidelity content is used in applications such as studio and professional content production where high bit rates are common, since high fidelity is required and bit depths of up to 16 bits per sample are considered. In addition, consumer applications using screen content (e.g., desktop sharing) are emerging. Such applications typically use the 4:4:4 chroma format in $\text{R}'\text{G}'\text{B}'$ color space to better preserve the sharp details associated with this content. For both application domains, high compression efficiency is required and therefore all the RExt coding tools should be available.

The Main 4:4:4, Main 4:4:4 10, and Main 4:4:4 12 profiles have been specified for intra only and both intra coding and inter coding to address these applications scenarios. These profiles support all chroma formats and RExt tools. In addition, for intra only coding, the Main 4:4:4 16 Intra and High Throughput 4:4:4 16 Intra profiles are specified to target applications where high bit rates and high bit depths are employed. An example of these applications are codecs embedded in professional cameras, which are expected to use the High Throughput 4:4:4 16 Intra profile, which supports the tools described in Section V-D.

For still picture use cases two RExt profiles have been defined, supplementing the HEVC Version 1 Main Still Picture profile, and all still picture profiles have been augmented with a new level (8.5) that removes restrictions on picture size and the number of tiles and slice segments.

### B. Levels

Picture size and frame rate are the two main parameters defining the levels in HEVC Version 1. No new levels and tiers were introduced in RExt relative to Version 1. Instead, additional constraints and parameters have been specified to account for higher bit rates associated with the applications envisaged for RExt. This approach allows interoperability between intra profiles and inter profiles and to limit the maximum CPB size.

#### 1) Support for High Bit Rates

Given the introduction of profiles to support extended chroma formats and bit depths higher than 10 bits per sample, it is expected that the bit rates associated with RExt profiles will be higher than those associated with Version 1 profiles. To account for these increased bit rates, the FormatCapabilityFactor parameter has been included in Version 2: a 16-bit profile has double the maximum bit rate of a corresponding 8-bit profile; a 4:4:4 profile has double the maximum bit rate of a corresponding 4:2:0 profile.

#### 2) Interoperability Between Intra Profiles and Inter Profiles

By exploiting the temporal redundancy between frames, inter profiles can achieve better compression efficiency than their intra counterparts. Therefore, it is expected that the bit rates associated with intra profiles are higher than those of the inter ones. However, in some applications, it may be required that an inter-profile compliant decoder is capable of decoding an intra-profile compliant bitstream. To enable this interoperability, a constraint flag (general_lower_bit_rate_constraint_flag) is defined to set the minimum compression ratio for intra profiles to the one defined for inter profiles. This flag is set to 1 for inter profiles and may be 0 or 1 for intra profiles. By removing the constraint, the bit rate can be doubled and thereby halving the minimum compression ratio: for the high tier (HT) of Level 5.1, a compression ratio as low as 20:1 is possible for RExt Main profiles, or 40:1 for main tier (MT), Level 4.1 (the exception being for Main 10 Intra, where the ratios are higher due to the correspondence with the Main 10 profile of Version 1.

This mechanism is also used to increase the defined bit rates twelvefold for the High Throughput 4:4:4 16 Intra profile, with the MT having a bit rate (in general) three times higher than the HT of the corresponding level in the Main 4:4:4 16 Intra profile. The HT of the High Throughput 4:4:4 16 Intra profile thereby allows compression ratios as low as 2:1 for UHD resolution.

#### 3) Maximum CPB Size

The maximum CPB size also scales with the maximum bit-rate value described above. In particular, for the RExt profiles defined in Version 2, the maximum CPB size remains at 1 s equivalent when general_lower_bit_rate_constraint_flag is 1, but is only 0.5 s when general_lower_bit_rate_constraint_flag is 0.

#### 4) Summary

With this definition of constraints, a full spectrum of operating points exists, from very low to very high compression ratios, and may be indicated by a conformant bitstream as an application requires.

SECTION VII

## PERFORMANCE EVALUATION

With the finalization of the RExt development, HEVC now includes the specification for 4:2:2 and 4:4:4 profiles (among others). Included in the profiles are chroma coding tools, dedicated high bit-rate (and near-lossless) coding tools, improved lossless coding tools, and improved coding tools for nonregular camera captured content including mixed content and screen content. The experiments described in the following present a brief performance overview relative to H.264/AVC, but are focused on just two applications. For this purpose, the following section is divided into three parts. The first part describes the experimental setup. The second part gives an overview on the lossy compression performance for mostly regular, i.e., camera captured, 4:2:2 and 4:4:4 content. In the final part, the performance results for lossless compression using a test set consisting of mainly mixed and screen content are presented. The performance presented in the following refers to the average results across groups of video sequences. For the results on a per sequence basis, the interested readers are referred to [39]. Subjective performance is reported in [40], and the results show that the bit-rate reduction over H.264/AVC FRExt is more than 50%.

### A. Experimental Setup

For the conducted experiments, the HEVC reference software implementation (HM) 16.2 was used, as a representative implementation of an HEVC encoder and decoder, for generation of the candidate data points. Comparison with H.264/AVC was performed using the JM 18.6 reference software implementation to generate the reference data points.

The presented data use the encoder configurations, tested QP ranges, and source video outlined in the JCT-VC’s CTCs for RExt development [7]. The HM is configured according to the CTC and the JM is configured using the High profile, or nearest suitable equivalent for the chroma format of the source material. For the latter purpose, default configuration files that correspond to those of the HM CTC are used; these are included in the JM software package.

The results for the lossy case are expressed in terms of BD-rate reduction. In the case of lossless encoding, the results are presented as percentage bit rate savings, when compared to the reference bit rate.

### B. Lossy Performance for Regular Content

The performance of HM for lossy coding is investigated using the video material from the regular content coding conditions of [7]. This test set primarily consists of regular content having 4:2:2 and 4:4:4 chroma formats, with the latter including both $\text{Y}'\text {C}_{b}\text {C}_{r}$ and $\text{R}'\text{G}'\text{B}'$ content.

Three QP ranges, as specified in [7], are simulated, i.e., the main tier (MT), the high tier (HT), and the super high tier (SHT).2 Furthermore, three different temporal structures, referred to as all-intra, random access, and low delay, are simulated in combination with the specified QP ranges. In the all-intra configuration, each picture is coded using a single intra slice only. For random access, intra pictures are inserted at regular intervals of about 1 s. Furthermore, the temporal structure of random access uses hierarchical B pictures and the group-of-pictures size is set equal to eight. The low delay configuration uses bipredicted blocks for inter-picture prediction, and the pictures are coded in display order to minimize the system latency, i.e., to avoid delays due to picture reordering.

An overview of the lossy coding performance of HM 16.2 relative to JM 18.6, using the above conditions, is summarized in Table II. For clarity, Table II contains BD-rate values of the luma component only. Clearly, HM, as the representative implementation of HEVC, outperforms JM for all input source material and QP ranges.

Table II Lossy Coding Performance of HM 16.2 Relative to JM 18.6 for Mainly Regular Content

The random access MT configuration, which plays an important role for different consumer and professional applications, yields bit-rate reductions higher than 30%. Also notable is the highest bit-rate reduction, which is achieved for 4:4:4 $\text{Y}'\text {C}_{b}\text {C}_{r}$ low delay configuration with an average value of $\sim 39.8$%, and the ability of HEVC to compress $\text{R}'\text{G}'\text{B}'$ content, which shows improvements similar to those for 4:4:4 $\text{Y}'\text {C}_{b}\text {C}_{r}$ content, mainly due to the CCP scheme.

### C. Lossless Coding Performance for Mixed and Screen Content

The lossless coding performance of HM is investigated using video material from the screen content coding conditions. This is a composition of various content, including animated content, mixed and screen contents, and also 4:2:0 chroma formatted content.

A summary of the lossless coding performance for mixed and screen content is given in Table III. Table III uses source material classifications as defined in the CTC [7]. In particular, Class F sequences are screen content sequences in the 4:2:0 chroma format with various spatial dimensions, Class B sequences represent regular 4:2:0 $\text{Y}'\text {C}_{b}\text {C}_{r}$ HD content, and the RangeExt class contains a sample of 4:2:2 and 4:4:4 chroma format content from the regular test set used in the aforementioned lossy test case.

Table III Lossless Performance of HM 16.2 Relative to JM 18.6 for Mixed Content

The improvements in terms of bit-rate reductions for the screen content classes are ranging from $\sim 10$% to $\sim 13.2$%. This is mainly due to coding tools such as RDPCM and modified Truncated Rice binarization. Outside the screen content scope, up to 1.9% bit-rate reduction is achieved for the Class F content, while up to 1.3% is observed for 4:2:2 and 4:4:4 regular content.

SECTION VIII

## CONCLUSION

This paper has presented an overview on the RExt for HEVC Version 2, which was jointly developed by experts of ITU-T VCEG and ISO/IEC MPEG. The primary focus of the development, i.e., the support of advanced consumer and professional applications, is achieved by the specification of profiles for monochrome, 4:2:2, 4:4:4 chroma format, and high bit depths. To provide this support, changes to the Version 1 design were minimized as far as possible, while for applications requiring an improved compression efficiency, new tools were introduced. The RExt development resulted in the definition of 21 RExt profiles that guarantee that a broad range of applications using chroma formats different from 4:2:0 and bit depths higher than 10 bits per sample, will benefit by the adoption of the Version 2 of HEVC. Moreover, no new levels or tiers were introduced, but only constraints and parameters to guarantee interoperability and span a wide spectrum of operating points. As a demonstration of the achievement, average BD-rate reduction ranging from 25% to 36% are measured for the HEVC Main 4:4:4 profiles compared with the High 4:4:4 Predictive profile of H.264/AVC, depending on the content format and the temporal coding structure.

APPENDIX

All the JCT-VC documents can be found in the JCT-VC document management system at http://phenix.int-evry.fr/jct/. All cited VCEG documents are also publicly available and can be downloaded at http://wftp3.itu.int/av-arch in the video-site folder.

### Acknowledgment

The authors would like to thank all the experts of the involved standardization organizations, which cannot be individually mentioned here. The Range Extensions are the results of their joint efforts and contributions.

## Footnotes

This paper was recommended by Associate Editor T. Wiegand.

1This work presents values as magnitudes rather than the corresponding signed value range, therefore internal accuracies are indicated as $\textit {bitDepth}+6$.

2In this context, the term tier denotes a set of QP values, and not the set of levels presented in Section VI.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available