Low Complexity Quantization in High Efficiency Video Coding

The rate-distortion optimized quantization (RDOQ) provides an excellent trade-off between rate and distortion in High Efficiency Video Coding (HEVC), leading to inspiring improvement in terms of rate-distortion performance. However, its heavy use imposes high complexity on the encoder in real-world video compression applications. In this paper, we provide a comprehensive review on low complexity quantization techniques in HEVC, including both fast RDOQ and all-zero block detection. In particular, the fast RDOQ relies on rate and distortion models for rate-distortion cost estimation, such that the most appropriate quantized coefficient is selected in a low complexity way. All-zero block detection infers the all-zero block by skipping transform and quantization, in an effort to further reduce the complexity. The relationship between the two techniques is also discussed, and moreover, we also envision the future design of low complexity quantization in the upcoming Versatile Video Coding (VVC) standard.


I. INTRODUCTION
With the fast development of the network technology and acquisition devices, videos play a more and more critical role in numerous applications, ranging from industrial production to consumer entertainment.The explosive growth of video data creates an urgent demand to develop more efficient compression technologies to promote the video coding efficiency.The video coding technologies have evolved over the past few decades.After the standardization of the most prevalent standard H.264/AVC [1] in 2003, the ISO/IEC Moving Picture Experts Group (MPEG) and the ITU-T Video Coding Experts Group (VCEG) have collaborated to develop the High Efficiency Video Coding (HEVC) standard [2], which reduces around 50% bit-rates under the same perceptual quality compared to H.264/AVC.HEVC was finalized in 2013, which is a milestone in the video coding.
The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney.
Compared with its ancestor H.264/AVC, more efficient coding tools were explored and adopted by the HEVC standard.More specifically, HEVC employs flexible block tree structures to better adapt to the local characteristics of the videos, such as coding unit (CU), prediction unit (PU) and transform unit (TU) [3].Moreover, the intra prediction modes are expanded from 9 to 35, with the goal of better interpreting the complicated texture directions [4].Meanwhile, a series of advanced inter prediction techniques have been adopted to remove the temporal redundancies [5].Regarding the quantization, the hard-decision quantization has gradually evolved into soft quantization strategies [6], and the rate distortion optimization (RDO) is introduced to the quantization process, leading to rate distortion optimized quantization (RDOQ) [7] by which the most efficient quantization level for individual coefficient can be determined following the sense of RDO.Regarding the removal of the statistical redundancy, context-adaptive binary arithmetic coding (CABAC) [8] is utilized, which collaborates the context modeling with the binary arithmetic coding and removes the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/statistical redundancies in a lossless manner.Recently, the continuous development of video coding technologies leads to the next generation of video coding standard Versatile Video Coding (VVC) [9], which was launched in 2018.
In the hybrid video coding framework, almost all the modules have been enhanced during the development of VVC, such as more flexible coding partitions [10], advanced intra/inter predictions [11]- [16], as well as advanced transform cores [17] supported for signal energy compaction.Moreover, along with the development of video coding standards, a series of video coding techniques have also been proposed, including extended quad-tree (EQT) [18], [19], history-based motion vector predictions [16], [20] and cross-component linear model prediction [21], [22].These techniques have also been shown to significantly improve the coding performance.Regarding the quantization which serves as one of the core stage in the hybrid video coding framework, the trellis-coded quantization scheme has been introduced in [23], where the quantization candidates are elaborately mapped into a trellis graph collaborating with the state transferring mechanism.
Generally speaking, the quantization schemes evolve from hard-decision quanitzation which relies on the input transform coefficient and quantization step only, toward soft-decision quantizaion which optimizes the quantization process based on RDO.Given a quantization parameter (QP), the uniform hard-decision quantization straightforwardly maps a transform coefficient to the corresponding quantized level, which has been widely adopted in early video codecs.Moreover, the uniform hard-decision quantization with a dead-zone was adopted in the H.264/AVC, wherein the rounding offset is determined by the distribution of residual coefficients [24].With soft-decision quantization, the inter-dependencies among the quantized residuals within one transform block (TB) are also taken into account during the determination of the quantization level.In particular, the RD cost of quantization candidates will be elaborately evaluated such that the derived quantization levels which are determined in a soft manner could strike an excellent trade-off between the coding bits of the residuals and the quantization distortions.It was reported that the soft decision quantization could bring 6% to 8% bit-rate savings at the cost of high computational complexity compared with the conventional hard-decision quantization with the deadzone [25].However, as the residual coding bits should be calculated synchronously through the entropy coding, the high complexity of soft decision quantization could hinder its application.
In the literature, numerous schemes have been developed to achieve soft decision quantization in video coding.It was implemented with trellis searching in H.263+ and H.264/AVC [26], wherein transform coefficients and the context states are deployed to the trellis graph which delicately represents the combination of the available quantization candidates.Moreover, the associated RD cost of each quantization candidate is integrated to the trellis branch, such that the optimal path can be decided by dynamic programming or Viterbi search.However, it is acknowledged that executing full trellis search in quantization involves extremely high computational complexity.In view of this, trellis searching is simplified as the RDOQ by which sub-optimal quantization can be achieved.In particular, RDOQ has been widely employed in the H.264/AVC, HEVC and AVS2 [27] encoder, which examines limited number of quantization candidates, and finally the one with the minimum RD cost for the current transform coefficient is retained.
In this paper, we focus on the quantization in video coding, which is of prominent importance in controlling the distortion level and coding bitrate by reproducing the residuals with different quantization levels.The advanced quantization techniques adopted in HEVC are first reviewed, following which the low complexity quantization techniques are introduced.The aim of the developed fast quantization techniques is to infer the best quantized coefficient in a most efficient way, and numerous approaches have been proposed towards this goal from different perspectives, as illustrated in Fig. 1.More specifically, the systematic review is conducted based on the categories of the low complexity quantization techniques.In particular, we divide them into two categorizes, including fast RDOQ and all zero block (AZB) detection.In fast RDOQ, we introduce the statistics and RD model based methods.In AZB, we review the genuine AZB and pseudo AZB detection methods.As such, all aspects that could lead to low complexity quantization in the literature have been considered.Finally, we discuss the future quantization optimization techniques in the upcoming VVC standard in which more advanced quantization techniques have been adopted.Overall, the aim of this paper is not limited to only providing a review on the low complexity implementation of quantization in HEVC, but it is also highly anticipated that it could shed light on developing low complexity quantization optimization schemes for VVC with a principled way.

II. QUANTIZATION IN HEVC
In this section, we revisit the quantization in the HEVC standard.In principle, the RDOQ serves as the optimization tool to further optimize the hard-decision quantization strategy without leading to any variation to the decoder.Given the transform coefficient C i,j and quantization step size Q step , the quantization level l i,j with the hard-decision quantization can be formulated as follows, where f represents the rounding offset which is usually set according to the slice type [24].
In RDOQ, the RDO strategy is embedded to pursue the optimal quantized coefficient.More specifically, the target of the RDO is to minimize the distortion D under the constraint of the coding bits budget R, which can be expressed as follows, min where L i,j denotes the set of the quantization candidates of the transform coefficient at position (i, j).To convert such constraint problem to an unconstrained one, Lagrangian multiplier λ is introduced, leading to the following optimization problem, min l i,j ∈L i,j J , where The RDOQ selects the optimal quantization levels according to Eqn. (3).More specifically, there are two main procedures in RDOQ.First, a pre-quantization is conducted for a transform coefficient C i,j following the reverse scan order (diagonal, or vertical/horizontal allowed for certain blocks).The quantiziation candidates l ceil i,j and l floor i,j can be derived as follows, As such, the optimal quantization can be selected accordingly, li,j = arg min where RDOQ and residual entropy coding are applied based on the coefficient group (CG) [28], which is defined as a group of 4 × 4 sub-blocks within one TB.The second step of RDOQ aims to determine whether the current CG can be quantized to all-zero CG based on RD examination wherein RD-costs regarding the original quantized CG and all-zero CG are respectively calculated.Meanwhile, the position of last non-zero coefficient will be checked in the sense of RDO following the traversing order.
In RDOQ, the distortion is typically defined as the sum of square error (SSE) between the original transform coefficients and the dequantized coefficients, and the coding bits are obtained through CABAC coding.As such, RDOQ involves a considerable number of RD cost calculations.The RD calculation and checking procedure should be iteratively applied to each individual transform coefficient within a CG, and each individual CG within one TB.Each transform coefficient is associated with at least two quantization candidates.Furthermore, the context model updating in CABAC further imposes burdens on the computational complexity.Therefore, it is highly desirable to investigate the fast RDOQ scheme to facilitate the application of RDOQ.In the literature, numerous works have been done to accelerate the RDOQ.One typical category concentrates on simplifying the quantiztion procedure, and another attempts to detect all zero blocks in advance to bypass the tedious processes including transform, quantization, entropy coding, inverse quantization and inverse transform.

III. FAST RDOQ
RDOQ aims to determine an optimal set of quantization results that achieves the lowest RD cost for a TB.Such kind of RD-based determination undoubtedly brings compression performance gains while in turn increasing the computational complexity.Experimental results in [29] on the latest HEVC test platform reported that RDOQ can achieve around 3% to 5% BD-Rate [30] savings along with 12% to 25% encoding time increment for HEVC.In the literature, there are two main strategies to achieve the low complexity RDOQ, including the statistics-based methods and the RD model based methods.

A. STATISTICS-BASED FAST RDOQ
The statistics-based approaches target to empirically skip the RDOQ according to the statistical analyses, avoiding the unnecessary computations in the recursive RDO process of the encoder.
In [31], a special block type named All Quantized Zero Block except DC Coefficient (AQZB-DC) is detected, which occupies around 20% of the non-zero blocks.Moreover, statistical results reveal that over 30% of the DC coefficients in AQZB-DC block maintain the l ceil i,j .As such, a prediction model is investigated for AQZB-DC block, which adaptively regulates the quantization level of DC coefficient.In this way, the RDOQ procedure can be bypassed for this type of TBs.
The residual quad-tree structure implicitly aggravates the computational burden of quantization [32], since the RDOQ is repetitively invoked under the recursively TB division structure.Moreover, statistical results show that RDOQ brings ignorable influence on the TB size determination, VOLUME 8, 2020 and the TB partitioning accuracy still reaches to 95% if the hard-decision quantization is employed.As such, the authors in [32] proposed to directly apply the hard-decision quantization in the TB decision rounds, and employ the RDOQ after obtaining the best TB sizes.This method reduces 27% quantization complexity and the BD-Rate loss is 0.25% under low delay P configuration.
In [33], the RDOQ bypass scheme is proposed based on the statistics of transform coefficients.In particular, even though RDOQ achieves considerable performance improvements for HEVC, it cannot always bring variations to the quantized level compared to the hard-decision quantization.One particular example is that if the current TB is an all-zero block after pre-quantization, where the RDOQ is not necessary.Moreover, if the quantization outcomes of the hard-decision quantization are identical to the those of RDOQ, calculating the RD cost is rather wasteful.According to the statistical experiments, it is found that when all the transform coefficients within one TB are smaller than a threshold that is governed by the quantization step size, the current TB can be directly determined to be zero TB without RDOQ.Moreover, based on the statistical result that when the sum of the absolute quantized coefficients in one TB is smaller than a given threshold, indicating that the non-zero coefficients occupy a small fraction, RDOQ will be bypassed and hard-decision quantization will be invoked to economize the encoding time.
In [34], the authors proposed to simplify the selection of quantization candidates and the searching of last non-zero coefficient for RDOQ.In particular, the conditional probability P( li,j = l ceil i,j |l ceil i,j = L) is evaluated, where li,j denotes the quantization level selected by RDOQ for position (i, j), as defined in Eqn.(6).L could be 1, 2, 3 or larger than 3. Statistical results under random access configuration with varied QPs reveal that when L is larger than 3, the P( li,j = l ceil i,j |l ceil i,j = L) is 4% on average, indicating that quantization level is prone to remain unchanged with the result of the pre-quantization, such that RDOQ can be skipped.Furthermore, the simplification of the last non-zero coefficient searching is investigated for 4 × 4 TBs, on which the searching scale is shrunk to the first four non-zero coefficients.
In [35] and [29], based on the observations that RDOQ tends to adjust the quantization level ''1'' to ''0'' for the coefficients locating at high frequency domain in larger TBs, an early quantization level decision scheme is proposed, which forces the quantization level to be zero without RDOQ process [29], [35], where p CG corresponds to the explicit position of the current CG following the scanning order and N CG denotes the total number of CGs within one TB.Here, W represents the TB width.The proposed scheme brings 12.84% time savings for quantization and the BD-Rate [30] loss is 0.21% under all intra configuration.In addition, considering that for some sequences the probability of the adjustment case (i.e.P( li,j = 0|l ceil i,j = 1)) is less than 70%, the authors propose to employ adaptive rounding offset for calculating l ceil i,j during the pre-quantization stage.In particular, the rounding offset f is adjusted as follows [35], The proposed adaptive rounding offset achieves 15.29% quantization time savings with very negligible BD-Rate loss (0.01% on average) under all intra configuration.

B. RATE DISTORTION MODELS FOR FAST RDOQ
To obtain the RD cost with respect to each quantization candidate, the actual entropy coding collaborated with the context modeling and updating are carried out, which is considered to be the major component that aggravates the computational burden.Therefore, the key to achieve the low complexity RDOQ is to establish an accurate RD model to estimate the RD cost, instead of actual encoding.In the literature, various rate and distortion models have been investigated [36]- [39], generally with the aim of efficient bit allocation, rate control, and fast RDO decision.However, only a few RD modeling studies targeting on accelerating quantization have been conducted [33], [40]- [42].
Considering the fact that RDOQ adjusts quantization levels by comparing the RD cost of l ceil i,j and l floor i,j , the RD cost difference with respect to different quantization candidates are derived.In particular, Lee et al. [33] formulated a simplified level adjustment method with the J estimation model as follows [33], wherein D and R denote the differences of the distortions and rates between l ceil i,j and l floor i,j for the coefficient at position (i, j).By involving the float expression l float i,j of the pre-quantization result, the D is given by [33], where b is the decimal part of l float i,j .In this way, the dequantization process can be safely removed.Regarding the rate estimation, a series of syntax elements such as ''signif-icant_flag'', ''greater_than_one'', ''greater_than_two'' and ''remaining_level'', are involved in the coding of the quantized coefficient, such that the coding bit differences between l ceil i,j and l floor i,j can be represented as follows [33], The estimation of R with three syntax elements by referencing the value of l ceil i ,j [33].
The rate difference of the first three syntax elements can be deduced according to the value of l ceil i,j , as illustrated in Table 1.The explicit value of those four syntax elements can be obtained through a look-up-table that has been defined in HEVC test model.
The J model has also been established in [40] for low complexity RDOQ.Typically, the coding bits of the sign flag and the bits for representing the position of the last significant coefficient are additionally involved in the R estimation.Subsequently, by setting the J in Eqn.(10) to zero, a threshold for T l i,j can be derived as follows [40], where Herein, β is a scaling factor in the transition of λ and QP that is defined in the HM platform as follows [43], In this manner, the optimal quantization level li,j can be determined [40], In [41], the philosophy of comparing the rate-distortion cost based on J is employed again, wherein the coding bits regarding the residual coding syntax elements are inferred from statistical probability and information entropy.Besides, based on the rate estimation in [41], a parallel RDOQ scheme for HEVC open-source encoder x265 [44] is proposed [45], with which the RDO procedures can be parallelly executed on GPU, leading to the real-time encoding for 4K sequences.Moreover, Yin et al. [25] proposed a soft-decision quantization scheme which reveals great benefits in enhancing the throughput of the hardware.
To establish the rate and distortion models for RDOQ, Cui et al. [42] proposed to model the transform coefficients with hybrid Laplacian distributions at TB level.In HEVC, residual quad-tree partitioning is employed, leading to varied TB sizes from 4 × 4 to 32 × 32, such that the behaviors of coefficient distribution are distinct.The proposed hybrid Laplacian distributions contain a succession of models with different parameters, with the goal of better accommodating the characteristics in varied TB sizes.Moreover, the transform types regarding the 4 × 4 TUs are taken into account in the modeling, such as DCT-II, DST-VII and transform skip.The hybrid Laplacian distribution can be formulated as follows [42], where λ k denotes the Laplacian parameter and ω k is a weighting factor.k denotes the TB layer index.Layer 0 to layer 3 correspond to the TUs with sizes of 32 × 32 to 4 × 4. Layer 4 and layer 5 indicate the 4 × 4 TBs that employ DST-VII and transform skip, respectively.An online updating strategy is involved for the parameter refinement.After obtaining the model parameters, the cumulative probability with respect to different quantization levels can be derived, such that the coding bits can be acquired by integrating the self-information of the quantized symbol.Furthermore, the estimated coding bits Rk are derived according to a linear mapping with the self-information rk wherein the linear model parameters ξ and γ are initialized and updated according to least square regression.In terms of the quantization distortion modeling, the quantization level originated from the hard-decision quantization is employed, with which the SSEs between the dequantized coefficients and original transform coefficients are regarded as the quantization distortion.In this manner, the RD cost for each quantization candidate can be derived.Finally, an estimated optimal quantization result can be derived in an analytical way by minimizing the RD-cost with different quantization candidates as follows [42], where α is an off-line trained parameter used for adjusting the model accuracy.All-zero CGs can also be effectively determined according to the threshold as follows [42], If all the transform coefficients within one CG satisfy Eqn.(19), such CG can also be determined as the all-zero CG.

IV. ALL ZERO BLOCK DETECTION
All zero block (AZB), for which the prediction signals reassemble as the reconstruction pixels, has been commonly observed especially in low bit-rate coding scenarios.The quantized coefficient levels within an AZB are all zeros, such that early detecting AZB before transform or quantization is beneficial to economize the encoding computational resources.In this manner, the encoding procedures, such as transform, quantization, residual coding, inverse quantization and inverse transform can be straightforwardly skipped.
There have been numerous works focusing on the forecast of AZBs [31], [32], [46]- [53].In particular, Wang and Kwong [48] proposed to detect the zero quantized coefficients with a hybrid model, which is typically designed for 4 × 4 blocks with integer DCT transform in H.264/AVC.The spatial domain residuals are modeled with Gaussian distribution, and subsequently, multiple levels of the determination thresholds with respect to the sum of the absolute difference (SAD) are derived for the detection of the zero coefficients.To accommodate the Hadamard transform invoked by H.264/AVC, the hybrid model is adjusted accordingly [49], where the sum of the absolute transform difference (SATD) is used to replace the SAD in [48].
Compared with H.264/AVC, HEVC adopts a series of advanced prediction technologies, leaving more spaces for the improvement of the AZB detection.Moreover, considering the RDOQ, which quantizes the coefficients in a soft manner in HEVC, more AZBs are generated since the all zero cases may achieve superior RD performance.In addition, HEVC introduces larger TB sizes (i.e.16 × 16, 32 × 32), making the zero block detection more challenging, as the larger TUs involve more coefficients with distinct properties.As such, the AZB detection methods investigated for H.264/AVC may not be applicable to HEVC.
To better collaborate with RDOQ process in HEVC, several investigations concentrate on the detection of two types of AZBs, including the genuine AZB (G-AZB) and pseudo AZB (P-AZB) [50]- [52].In particular, G-AZB denotes the TBs that can be quantized to AZB through the hard-decision quantization.P-AZB represents those that could be potentially placed to AZB through RDOQ.For clearer explanations, the hard-decision quantization in Eqn. ( 1) is equivalently interpreted as follows, where M is a multiplication factor relevant to QP. offset denotes the scaled rounding offset relying on slice types.Q sh depends on the QP, TB sizes and the coding bit-depth [2].
In G-AZB, the absolute value of l i,j should be less than 1, such that given the TB size W and quantization parameter QP, the detection threshold for individual DCT coefficient C i,j can be described as follows [31], [33], [50]- [52], The threshold T has been widely employed for the detection of the AZBs.Cui et al. [50] proposed a hybrid AZB detection method for HEVC.Initially, Walsh-order Hadamard transform is employed for 4 × 4 and 8 × 8 TUs to replace the DCT, in an effort to reduce the computational complexity, and DCT is used for 16 × 16 and 32 × 32 TUs.The associated SATD for different TB sizes is extracted and normalized.Two G-AZB detection thresholds are proposed wherein the first threshold T (1) SATD is derived by adding up the single-coefficient based threshold in Eqn.(21) as follows [50], where W denotes the size of the TB.Subsequently, by modeling the prediction residuals with the Laplacian distribution, another threshold for G-AZB detection can be represented as follows [50], where Herein, R is a relevance matrix.A denotes the sparse matrix, and H ω represents the core of Walsh-ordered Hadamard transform defined in [53].As such, the G-AZB threshold with respect to SATD can be obtained as follows [50], SATD .
In [51], the authors proposed to modify the Hadamard transform based all zero block detection [54] to better adapt to the transform and quantization characteristics of HEVC.In particular, the G-AZB detection thresholds with respect to different TB sizes are defined as follows [51], where Since the Hadamard transform will be performed by default for 4 × 4 and 8 × 8 TUs on the HM platform, employing the Hadamard-based AZB detection for smaller TUs will not additionally bring in computational costs.However, for larger TUs such as 16 × 16 and 32 × 32, the computational burden regarding the Hadamard transform is much heavier.Therefore, the uniformity of the Hadamard coefficients 145164 VOLUME 8, 2020 within larger TUs is evaluated.First, the 16 × 16 and 32 × 32 TUs will be divided into 8 × 8 sub-blocks on which the 8 × 8 Hadamard transform is conducted.Subsequently, the top left DC coefficients within each 8 × 8 sub-blocks are extracted, forming the 2 × 2 and 4 × 4 DC blocks.Then the Hadamard transform is performed again on the DC blocks termed as the DC Hadamard transform.It should be noted that the DC coefficients within each 8 × 8 sub-blocks can be efficiently obtained by adding up the residuals in the spatial domain.If all the coefficients are smaller than T H , the TB can be determined as an AZB.
Based on the G-AZB detection threshold in [51], Fan et al. [52] additionally introduced a lower and a higher SAD thresholds to classify the all-zero and non-all-zero block.The lower threshold T low SAD is set as ( d 100 • T ), where d denotes the TB depth.If the SAD of the TB is smaller than the lower threshold, it can be determined as the G-AZB.Moreover, the higher threshold T high SAD is defined as follows [52], where |C i,j −C i,j | denotes the quantization error which should be lower than the T derived in Eqn.(21) with specific QP and TB size. is empirically set as follows [52], The P-AZB detection will be performed on the non-G-AZBs.In [51], the threshold for deducing P-AZB is enlarged to be the twice of T H in Eqn.(26).Moreover, to prevent the false determination, the conditions for P-AZB detection are tightened.Essentially, the key to the P-AZB determination is whether the RD cost of AZB is lower than the non-all-zero block (NZB).The associated RD cost can be formulated as follows, In particular, R AZB denotes the coding bits of an AZB which is approximated to be 1.Moreover, D AZB can be estimated to be the square of the residual coefficients r i,j in spatial domain.
As such, J AZB can be described as follows [52], Regarding the D NZB , it can be formulated as follows [52], where By exploiting the rate estimation scheme in [55], the R NZB in [52] is estimated with the self-information.Considering different TB sizes, the transform coefficients are modeled with the generalized Gaussian distribution.
Regarding the P-AZB detection in [50], to better adapt to the characteristics of the larger TUs, the TB will be divided into the high-frequency region and low-frequency region according to the QP and the maximum allowed QP value QP max .The size of the low-frequency region can be derived as follows [50], The P-AZB detection is specifically performed on the low-frequency region.In particular, the TB will be regarded as the P-AZB if the maximum transform coefficient within the low-frequency domain is smaller than the following threshold [50], where ξ is empirically set to 2.2.To further investigate the P-AZBs incurred by RDOQ, RD comparisons regarding the J AZB and J NZB are investigated.In particular, J AZB [50] is derived as follows, where e is defined in Eqn.(34).Moreover, the D NZB can be obtained by off-line training of the quantization distortions R NZB is estimated with the linear combination of the SATD and the number of Hadamard transform coefficients that are larger than a threshold.Consequently, two determination ranges are derived to forecast the P-AZB.

V. RATE-DISTORTION PERFORMANCE AND CODING COMPLEXITY
The performances of the existing fast RDOQ and AZB detection algorithms regarding the RD performance and encoding complexity are presented and discussed in this section.In particular, the RD performance is measured by the BD-Rate [30] of luma component, where positive BD-Rate indicates the loss of compression performance.For the fast RDOQ algorithms, the quantization time savings T Q are used to evaluate the computational efficiency, VOLUME 8, 2020 where T anc Q and T pro Q denote the quantization time of the anchor encoder and the one with proposed fast RDOQ scheme, respectively.Moreover, for the AZB detection algoritions, the time savings T TQ regarding the transform, quantization, inverse quantization and inverse transform are measured as follows, where T pro TQ and T anc TQ denotes the time consumed by transform, quantization and associated inverse processes with and without AZB detections, respectively.The fast RDOQ schemes and AZB detection schemes are all implemented on HEVC test platform (different versions) wherein the fast RDOQ schemes are evaluated under all intra (AI), random access (RA) and low delay (LD) configurations.Since it is rare for AZBs under AI configuration, the performance of  AZB detection algorithms are mainly validated under RA and LD configurations.
The performance of several fast RDOQ schemes including Cui et al. [42], He et al. [41], Lee et al. [33], Xu et al. [29], Wang et al. [40], Zhang et al. [32] and Wang et al. [31] are presented in Table 2, Table 3 and Table 4.The version of test platforms, as well as the performance of individual sequences are all presented.The hybrid Laplacian based fast RDOQ scheme [42] strikes excellent trade-off between the coding performance and computational complexity, where around 70% quantization time can be saved compared to the conventional RDOQ along with only 0.3% BD-Rate increases.Moreover, He et al.'s method [41] could achieve competitive acceleration whereas the coding performance loss is slightly higher.The performance of RDOQ bypass method combined with J estimation model proposed by Lee et al. [33] is relatively conservative, as it introduces very negligible performance loss with moderate speedup.The statistical based fast RDOQ schemes proposed in [29], [32], [40] and [31] could bring 30% to 40% quantization time reductions.
Furthermore, the performance of AZB schemes proposed by Cui et al. [50], Fan et al. [52] and Lee et al. [53] are tabulated in Table 5. Fan et al.'s method achieves the highest time savings (over 40%) in terms of the transform and quantizaion.Meanwhile, the BD-Rate loss is around 0.5%.In addition, Cui et al.'s method [50] could more precisely predict the AZB including the G-AZB and P-AZB, such that more than 20% time savings could be achieved with only 0.06% performance loss.It is observed that Cui et al.'s method [50] performs extremely well on 4K sequences in Class A1 where even performance gains can be noticed with 20% to 26% time savings, providing enlightenment to the optimization of the 4K video application scenarios.

VI. DISCUSSIONS
The fast RDOQ and all zero block detection have interesting connections.Both of them aim to alleviate the issue of complicated rate-distortion cost calculations in deducing the optimal coefficient level.However, fast RDOQ focuses on the inference at the coefficient level while all zero block detection works on the block level.As such, they could seamlessly work together towards the low complexity quantization optimization.In particular, traditional all zero block detection methods rely on the threshold according to statistical data combining the traits of the hard-decision quantization.With the cooperation of the RDOQ, the RD models can be refined as finally the level with the minimal RD cost is selected.As such, the collaborative design based on fast RDOQ and all zero block detection is beneficial, since eliminating the all zero blocks before conducting the fast RDOQ could yield reasonable optimization results in reducing the overall computational complexity.
In VVC, trellis-based quantization scheme is adopted wherein two quantizers are employed synchronously cooperating with four transition states under the control of the parity of quantization levels.Meanwhile, the number of quantization candidates are doubled, and the associated RD costs are arranged in the trellis graph.As such, the computational complexity regarding the trellis-based quantization is elevated compared with the RDOQ.Inspired by the existing works regarding the fast RDOQ and all zero block detection schemes for HEVC, a more promising way of low complexity quantization in VVC should enjoy a series of desired properties.First, the coefficient speculations should be of extremely low complexity, and friendly for real applications.Second, the low complexity quantization optimization algorithm for VVC should well accommodate the trellis structure.Third, since the entropy coding scheme of residuals in VVC have been advanced accordingly to better adapt to the quantization behaviors, the RD model should be re-established.Last but not least, multiple transform selection strategy introduces new types of transform cores, which are better to be taken into account as well in the RD modeling.
With the surge of deep learning, quantization has also found its applications in other scenarios such as feature compression [56]- [58], end-to-end compression [59], [60] and deep neural network compression [61]- [63].Though the majority of state-of-the-art quantization methods in these tasks still rely on hard-decision scalar quantization, it is highly expected that the soft decision and vector quantization techniques can further improve the performance, along with the advanced quality assessment techniques developed for corresponding modalities.To enable these techniques in those tasks for real applications, similar methodologies motivated from the low complexity quantization in video coding are also indispensable.Compared to the existing methods, better rate-distortion performance with the marginally higher complexity are expected to be achieved.Moreover, such low complexity quantization design philosophies in video coding could have interesting connections with the recently developed compression techniques such as network prunning.The learning based quantization that directly maps the tobe-quantized signals to be the optimal representation is also highly desirable in these compression tasks.

VII. CONCLUSIONS
Quantization plays a critical role in balancing rate and distortion in the current video coding standard.In this paper, we review the low complexity quantization techniques in HEVC, and envision future development of quantization optimization in the VVC standard.In essence, fast quantization techniques rely on the derivation of the best quantized index without going through the tedious transform, quantization and entropy coding process.In the future, it is also anticipated that the quantized coefficient can be intelligently determined in a low complexity and high efficiency way based on the recent advances of machine learning, leading to better performance in terms of rate-distortion-complexity cost.

FIGURE 1 .
FIGURE 1. Taxonomy of the low complexity quantization schemes.

TABLE 2 .
Coding performance and quantization time savings of fast RDOQ methods under AI configuration.

TABLE 3 .
Coding performance and quantization time savings of fast RDOQ methods under RA configuration.

TABLE 4 .
Coding performance and quantization time savings of fast RDOQ methods under LD configuration.

TABLE 5 .
Coding performance and quantization time savings of AZB detection methods under RA and LD configurations.