Rate-Distortion Optimization Based on Two-Pass Encoding for HEVC

Rate-distortion optimization (RDO) is a crucial technique in block-based hybrid video encoders, which determines the coding option for a certain coding unit to achieve optimal rate-distortion (R-D) performance. However, the current RDO, which is implemented into High Efficiency Video Coding (HEVC) test model, i.e., HM, is far from being global optimal as it ignores the interaction among coding options. Recent studies have shown that the dependent RDO methods can improve the coding efficiency of encoders by exploring the R-D dependency among coding units, but these methods do not consider the R-D characteristics of coding units. In this paper, we proposed a two-pass encoding based RDO method, which combines both the R-D dependency and R-D characteristics, to further improve the coding efficiency. First, a frame is encoded with the original HEVC scheme to obtain the R-D model of coding tree units (CTU) and the bit budget of the frame. Second, an optimal equation combines the R-D model and R-D dependency is established to adaptively determine the Lagrange multiplier and quantization parameter (QP) for each CTU. Finally, the current frame is encoded with a new scheme again. Experimental results show that compared with the original HM, the proposed method obtains R-D performance improvements of 5.3% and 5.6% on average under configurations of low-delay B and P frames, respectively.


I. INTRODUCTION
With the rapid development of electronic information technology and the use of various video data acquisition methods, digital video has become the main carrier of multimedia information, but the amount of uncompressed digital video data is very huge. In order to improve the performance of video compression and support the compatibility among video codecs of different manufacturers, several video coding standards have been developed since 1980s. High efficiency video coding (HEVC) [1] is the international video coding standard after The associate editor coordinating the review of this manuscript and approving it for publication was Jinjia Zhou .
H.264/advanced video coding (AVC) [2], and it still adopts the traditional hybrid coding structure including prediction, transformation, quantization and entropy coding, etc. Video coding standards specify the decoder and the syntactic structure of a bit stream. But they do not specify the implementation for video coding at the encoder side. Therefore, designers can flexibly develop the coding strategies according to the requirement for the practical application in order to achieve the optimal coding performance. Rate-distortion optimization (RDO) [3] is a very important technology in video coding, which is used to select coding parameters and regulate the trade-off between coded distortion and rate consumed. For a video divided into a series of coding units, the RDO problem can be represented as, where D i and R i denote the distortion and rate of the i-th coding unit. N is the total number of coding units and R T is the total available rate budget. The solution of (1) is to find the optimal bit allocation for each coding unit under the constraint of rate budget, so as to minimize the distortion of compressed video. In video encoders of H.264 and HEVC, the constrained optimization problem of (1) is converted into an unconstrained optimization by introducing a Lagrange multiplier, and each basic coding unit is processed individually and independently in a coding order. Due to the rate-distortion (R-D) dependency among coding units, the RDO method, which is implemented into HEVC test model (HM), do not achieve the optimal solution. Some recently researches focus on how to utilize the R-D dependency to further improve the coding efficiency. For the HEVC hierarchical coding structure, some studies concern selecting quantization parameter (QP) adaptively instead of the fixed QP setting. For example, by exploring inter-frame dependencies, [4]- [8] proposed adaptive QP cascading (QPC) methods under lowdelay (LD) configuration, and [9]- [12] presented adaptive QPC methods under random access (RA) configuration. These QPC methods adaptively set different QPs for the frames to be encoded according to both coding structure and video content changes. In addition, Ropert et al. [13] modeled the temporal distortion propagation at block level, and set different QPs for coding units within the coding frame. On the basis of [13], Bichon et al. [14] further investigated the skip mode characteristics, and took the skip probability into the temporal distortion propagation model for the adaptive QPC algorithm at coding unit level. Furthermore, Jia et al. [15] studied adaptive Lagrange multiplier selections for the QPC schemes.
In the HEVC reference software HM [16], multiple QP (MQP) optimization is an alternative scheme, where a userdefined parameter ''MaxDeltaQP'' determines the number of QP candidates [17]. The QP candidates are attempted in coding process to identify the best one with the smallest R-D cost. Such a MQP optimization definitely increases the coding complexity substantially depending on the number of QP candidates attempted. Some recently improved RDO methods [18]- [22] focus on Lagrange multiplier adaptation according to the temporal R-D dependency. In [18], an efficient Lagrange multiplier selection method is presented for HEVC under the RA configuration according to analyze the temporal relationship between distortion and rate. In [22], a temporal dependence based RDO method is proposed for HEVC under the LD configuration, where the optimal Lagrange multiplier is estimated by adaptively selecting the scaling factor according to the video motion characteristics at different temporal layers. By only considering the direct distortion propagation, Guo et al. [23] proposed a Lagrange multiplier adaptation method based on preencoding for HEVC under the LD configuration. Based on a source distortion temporal propagation (SDTP) model [24], Gao et al. proposed a temporally dependent RDO (TD-RDO) for HEVC under the LD configuration [25], and a layer-based TD-RDO method under the RA configuration [26], respectively. From the perspective of rate propagation, Li et al. [27] also proposed a Lagrange multiplier adaptation approach. In addition, Wang et al. [28] presented a joint RDO framework by considering the dependency among coding decisions, which incorporates the Lagrange multiplier as one of parameters to be optimized. Rate control [29] can also be considered as a special RDO problem involved R-D model estimation, bit allocation, and decisions of QP or Lagrange multiplier. Zhang et al. [30] proposed a twopass rate control to provide constant coding quality, while Wang et al. [31], [32] proposed two-pass encoding based rate control methods to improve coding quality measured by structural similarity (SSIM). By modeling the relationship between rate and Lagrange multiplier, Li et al. [33] proposed a rate control algorithm in λ domain, which determines the Lagrange multiplier according to the bit budget for a frame or coding tree unit (CTU). Based on the framework of rate control in λ domain, [34]- [36] presented several optimal bit allocation approaches in order to improve the R-D performance.
The above RDO methods improve the R-D performance to some extent by exploring the dependency among coding units. However, they hardly consider the R-D characteristics of coding units. In this paper, we present an improved RDO method based on two-pass encoding for HEVC under the LD configuration, where the Lagrange multipliers and QPs for the CTUs to be encoded are determined adaptively according to both the R-D dependency and R-D characteristics. The experimental results show that the proposed method achieves significant R-D performance improvement against the original HEVC under the LD B-frame (LDB) and LD P-frame (LDP) configurations.
The rest of the paper is organized as follows. We first review the background knowledge in Section II, where the TD-RDO and recursive Taylor expansion (RTE) are introduced. We subsequently analyze the R-D characteristics of coding units in Section III. Section IV presents the two-pass encoding based RDO method in detail. Experimental results are presented in Section V. Finally, Section VI concludes the paper.

II. BACKGROUND KNOWLEDGE
In this section, we first review the TD-RDO, and introduce the distortion propagation factor, which can quantitatively measure the R-D dependency among coding blocks. Then we introduce the RTE method, which would be employed to solve the equation established in Section IV. VOLUME 9, 2021 A. TEMPORALLY DEPENDENT RATE- DISTORTION OPTIMIZATION In general, the constrained optimization problem of (1) can be converted into an unconstrained form according to the Lagrange multiplier method as follows, where J is the Lagrangian cost function, and λ g is called as a global Lagrange multiplier. Typically, independent assumption among coding decisions of the units is made to greatly simplify the solution of (2), and then each coding unit is encoded individually and independently. In the HEVC reference software HM, the independent RDO is made for encoding the current CTU as follows, where J i is the Lagrangian cost function for the i-th CTU, and λ HM indicates the Lagrange multiplier used in HM, which is determined according to both coding structure and QP. In fact, due to the strong R-D dependency among CTUs in temporal domain, the coding decision of the current CTU would affect the achievable R-D performance of subsequent frames. Therefore, the independent RDO used in HM is far to achieve the global optimal R-D performance. By analyzing the correlation among coding decisions for different coding blocks, Yang et al. [24] proposed the SDTP model based TD-RDO for IPPP coding structure in H.264. The form of TD-RDO is expressed as, where k i is called as temporal propagation factor (TPF), which indicates the influence of the i-th coding block's distortion to the subsequent coding blocks' distortions. Theoretically, the TPF can be expressed as, where ∂D i indicates the distortion change of the i-th coding block, and ∂ M j=i+1 D j indicates the distortion change of the subsequent blocks, which directly and indirectly refer to the i-th coding block, due to ∂D i .
In [24], to obtain the TPFs for each coding block within the current coding frame in H.264, a temporal propagation chain as show in figure 1 is first constructed by a simplified forward motion search [37] with the original video sequence, and then the SDTP model is proposed to estimate the distortion of coding blocks in the temporal propagation chain. After obtaining the TPFs of each coding block, the implementation of TD-RDO can be based on the framework of the independent RDO because the form between (3) and (4) is very similar. For the HEVC hierarchical coding structure, multiple propagation chains with branches need to be constructed to estimate the TPF of a CTU due to a complex reference relationship. In [25], an extension of TD-RDO is proposed for HEVC under the LD configuration, where the TPFs are estimated with the multiple propagation chains with branches. In our proposed two-pass encoding based RDO method, the TPF would be applied to quantitatively measure the R-D importance of CTUs.

B. RECURSIVE TAYLOR EXPANSION SOLUTION
The RTE method would be used to solve the equation established in Section IV. Therefore, we give a brief review about the RTE method which is proposed in [38]. Specifically, the aim of the RTE method is to solve x from a polynomial equation as follows, where a i , b i and T are all known parameters. To deal with different exponent b i in (6), the exponential term is rewritten by the Taylor expansion as follows, Then, an approximation of the exponential term is obtained through discarding the biquadratic and higher-order terms in (7) as follows, Accordingly, (6) can be approximated by (9), Let ln x be equal to X (namely x = e X ), then (9) can be simplified to be a cubic equation as follows, By applying Shengjin formula [38], when the discriminant of the cubic equation meets = F 2 − 4EG > 0, there only exists one real root for (10), and the solution can be obtained as, where . finally, an approximate value of x can be obtained as follows, The approximation error of the above scheme is caused by the truncation of higher order terms in the Taylor expansion. To address this issue, a decay rate term is defined as follows, Apparently, if the value of δ is very large, the values of truncated terms will decay rapidly, and then decreasing the approximation error of the above scheme. To increase the value of δ, the value of ln (a i /x) is expected to be zero (i.e., (a i /x) → 1), then the value of δ approaches infinity. To this end, (6) can be rewritten as follows, wherex is the pre-estimated solution for equation (6), and then the term (a i /x) b i is a known number. It has been proved in [38] that when |ln (x/x)| < |ln (a i /x)|, the approximation error of solving (14) is smaller than that of solving (6) by the above Taylor expansion method. To further reduce the approximation error, the Taylor expansion process can be iterated via utilizing the solution of (14) as the inputx. Basically, after three or less iterations, the RTE method is able to obtain a closed-form solution for the equation (6) with an extremely small approximation error.

III. RATE-DISTORTION MODEL
Video coding is a kind of lossy source compression. Intuitively, the coding distortion decreases with the increase of rate budget for a certain video source. The relationship between rate and distortion describes the property of video coding. According to the rate distortion theory [3], the R-D function is the low bound of achievable compression ratio for video coding. In HEVC, a CTU is a basic coding unit which individually performs RDO to determine the coding parameters including coding modes, motion vectors, reference frame indices, and quantized transform coefficients, etc.
To investigate the relationship between rate and distortion for CTUs in HEVC, we carried out the following simulation experiment.
In the simulation experiment, the HEVC reference software HM16.7 is as the test platform, and the encoder is modified to record the rate and distortion for each CTU. The encoder configuration files are ''encoder_lowdelay_main.cfg'' and ''encoder_lowdelay_P _main.cfg'', and the main encoder parameters are set as TABLE 1, which comply with the common test conditions (CTC) [39] specified by joint collaborative team on video coding (JCTVC). Each sequence is encoded four times with QP values of 22, 27, 32 and 37, respectively. Using the four R-D points of a CTU, we can fit the R-D curve for each CTU. A large number of statistical data show that the power function can well represent the R-D curve of most CTUs, and figure 2 shows several examples. The power function based R-D model is expressed as follows, where η and θ are two model parameters, and the value of η is always greater than zero, while the value of θ is always less than zero. Once these two parameters are determined, the R-D function of a CTU can be obtained. However, to determine η and θ by fitting R-D curves in the above method needs fourpass encoding. According to rate-distortion optimization for video compression [3], the Lagrange multiplier used in RDO is the negative slope of tangent line of R-D curve, which can be expressed as follows,  After one-pass encoding, the rate R, distortion D, and Lagrange multiplier λ of a CTU are all known. Combining equation (15) and (16), the model parameters η and θ are expressed as follows, Therefore, according to (17) and (18), the R-D models of each CTU within a frame can be obtained by one-pass encoding.

IV. TWO-PASS ENCODING BASED RATE-DISTORTION OPTIMIZATION
In the dependent RDO methods above mentioned, such as adaptive QPC and Lagrange multiplier adaptation, the R-D dependency is explored to optimal bit resource allocation. However, these methods do not consider the R-D characteristics of CTUs. In this Section, we proposed a two-pass encoding framework to optimize bit resource allocation for CTUs under the constraint of bit budget at frame level, which combined with the R-D model and TPF. In the twopass encoding framework, each frame is encoded twice in succession. In the first-pass, the original HEVC scheme is used to encode the frame, but the bit-stream is not outputted, where the number of bits consumed for whole frame and the distortion, rate, and Lagrange multiplier for each CTU in the current frame are recorded. After the first encoding, the bit budget of the current frame and the R-D model of each CTU can be obtained. Then an equation combined with the R-D model and TPF of CTUs is established to optimize the bit resource allocation under the bit budget of the current frame. By solving the equation, the optimized Lagrange multipliers and QPs for each CTU in the current frame are obtained to perform the second-pass encoding. Specifically, the proposed method is described as follows.

A. PERFORMING FIRST ENCODING
The purpose of the first-pass is to collect related data, including the Lagrange multiplier used, the number of bits consumed for the current frame, the rate and distortion of each CTU. Therefore, the first encoding adopts the original HEVC scheme to set QP and Lagrange multiplier, and the bit-stream is not outputted after encoding. Specifically, the QP of the current frame is determined by both input QP and frame layer. In the LD configuration, a group of pictures (GOP) includes four frames whose layers are 3, 2, 3, and 1, respectively. The QP of the n-th frame is set as follows, where QP 0 is the input QP, and L n is the n-th frame layer.
Then the corresponding Lagrange multiplier is calculated according to the QP n as follows, where W L is a weighted coefficient related to the frame layer, and the lower the layer is, the smaller the coefficient is. In the original HEVC scheme, all CTUs within a frame adopt the same Lagrange multiplier and QP to perform the RDO based mode decision. After the first encoding, the model parameters (η and θ) for all CTUs in the current frame can be obtained according to (17) and (18). It should be noticed that in order to avoid the situation that R is very small when the CTU adopts skip mode, the value of R is clipped to a certain range as follows, In addition, the encoder needs to be restored to the status before the first encoding to prepare the second encoding.

B. PERFORMING SECOND ENCODING
The second encoding adopts a new scheme to encode the current frame, and outputting the bit-stream and storing the reconstructed image normally. Specifically, according to (16), the number of bits consumed by the i-th CTU can be expressed as follows, where M i is the number of pixels in the i-th CTU, and λ i is the adopted Lagrange multiplier for encoding the i-th CTU.
Taking the TD-RDO into consideration, according to (4), the Lagrange multiplier of the i-th CTU should be as follows, Therefore, the number of bits consumed by the i-th CTU is expressed as, where Both a i and b i are known after the first encoding. Taking the number of bits consumed by the current frame in the first encoding as the bit budget for the second encoding, we can have the following equation.
where N is the number of CTUs in a frame, and T f is the available bit budget for the current frame. In equation (25), only λ g is unknown. The form of both (6) and (25) is similar. Therefore, the equation (25) can be solved by the RTE method to obtain the value of λ g . Then the Lagrange multiplier for all CTUs can be calculated by (23).
In the second encoding, each CTU adopts different Lagrange multiplier. Therefore, the QP of CTUs should be adjusted accordingly. The QP setting adopts the QP refinement according to Lagrange multiplier [40] as follows, Note that for both the first-pass and the second encoding, their reference frames are same, which are the previous reconstructed images after the second encoding. In addition, considering that the coding quality of the I frame has a great influence on the coding distortion of the subsequent frames, the QP of I frame is appropriately reduced according to the input QP range as follows, where according to the input QP, the value of QP is from 0 to 4. Specifically, the setting of QP is as follows, Finally, combined with the previous description, algorithm 1 summarizes the process of two-pass encoding based RDO method.

V. EXPERIMENTAL RESULTS
In this section, we evaluate the performance of the proposed two-pass encoding based RDO method on the platform of HM16.7. We first introduce the simulation setup and the selected competitors. Then the experimental data are shown and performing R-D performance comparisons. We finally further analyze the rate distribution and coding quality, and discuss the encoding complexity.

A. SIMULATION SETUP
The proposed method was implemented into the HEVC reference software HM16.7 for performance comparisons. The original HM16.7 without rate control is set as the benchmark. In addition, the following five methods are selected as competitors. Specifically, the multiple QP optimization with 5 VOLUME 9, 2021  QP candidates is referred to as MQP-5 [16]. the temporal dependency based Lagrange multiplier adaptation, which is our previous work, is referred to as TD-LMA [23]. The temporal dependent rate-distortion optimization for low-delay hierarchical video coding, which was shifted from HM13.0 to HM16.7 by us, is referred to as TD-RDO [25]. The λ domain optimal bit allocation algorithm, which was integrated in rate control of HM16.7, is referred to as λ-OBA [34]. The optimal bit allocation at frame level for rate control, which is our previous work, is referred to as OBA-F [35].
The simulation environment is set as suggested by the CTC [39], and the LDB and LDP configurations were tested. The test sequences are all the 16 video sequences from the Classes B, C, D, and E suggested by the CTC, with varying resolutions and motion characteristics. According to the CTC, each video sequence was encoded four times with four bit rate points by a method, respectively. Specifically, the original HM16.7, the proposed method, the MQP-5 [16], the TD-LMA [23], and the TD-RDO [25] set the input QP as 22, 27,32, and 37. The λ-OBA [34] and the OBA-F [35] set the target bit rate as the four bit rates generated by the original HM16.7 with four different input QPs (22,27,32,37). Table 2 and table 3 provide the rate and peak signal-to-noise ratio (PSNR) generated by different competitors under the LDB and LDP configurations, where the PSNR is referred to that of the luminance component. The four test sequences shown in tables have different resolutions, which are selected from Class B, C, D and E, respectively. In the tables, the PSNR with the maximum value in each row is shown in bold. We can see that the proposed method achieves the maximal value of PSNR for most cases at the same input QP. By contrast, the PSNRs obtained by the TD-LMA [23] and the TD-RDO [25] are small in almost all cases. However, it does not mean that the compression efficiency of the  TD-LMA [23] and the TD-RDO [25] are worse than that of the original HM16.7, because the rates generated by the TD-LMA [23] and the TD-RDO [25] are also relatively lower at the same input QP in these seven methods.

B. PERFORMANCE COMPARISON
The λ-OBA [34] and OBA-F [35] are optimization methods for R-λ model based rate control. Therefore, the rates generated by these two methods are identical to that by the original HM16.7. From table 2 and table 3, we can directly observe the compression efficiency of λ-OBA [34] and OBA-F [35] over the original HM16.7. Whereas the other competitors are RDO methods which do not strictly limit the number of bits consumed in the process of video coding. For most cases, at the same input QPs, the rates generated by the TD-LMA [23] and TD-RDO [25] are slightly less than that generated by the original HM16.7, while the rates generated by the MQP-5 [16] and the proposed method are slightly higher than that generated by the original HM16.7. Therefore, we cannot directly observe the compression efficiency of these competitors over the original HM16 .7 from table 2  and table 3. Indeed, in order to visualize the comparison of compression efficiency, figure 3 shows the R-D curves of two sequences encoded by these seven methods under the LDB and LDP configurations, respectively. In the figure, the upper the curve, the better the coding quality at the same rate. It can be seen that for the sequence BasketballDrill under the LDB configuration, all competitors outperform the original HM16.7, and the coding quality of proposed method is the best at all low, medium and high bit rates. Moreover, for the sequence FourPeople under the LDP configuration, the coding quality of proposed method is also the best at both VOLUME 9, 2021  low and medium bit rates, but it is slightly worse at high bit rate.
For more comprehensive comparing the R-D performance, the Bjøntegaard delta bit rate (BD-rate) is used as metrics, and the original HM16.7 without rate control is set as the benchmark. Note that the BD-rate indicates the bit rate savings of the test method against the benchmark under the same coding quality, and the positive value means performance loss while the negative value means performance improvement. Table 4 and table 5 provide the BD-rates of each test sequence for the above six competitors under the LDB and LDP configurations, respectively. In these tables, Y, U and V indicate the BD-rate calculated according to the PSNR of luminance and chrominance components, respectively. From  table 4 and table 5, we can see that for the luminance component, except the λ-OBA [34], the R-D performance of other competitors is better than that of the benchmark. For the proposed two-pass encoding based RDO method, the R-D performance improvements of 5.3% and 5.6% on average for luminance component, can be observed, under the LDB and LDP configurations, respectively, and up to 13% bit rate saving is achieved for the sequence FourPeople under the LDP configuration. More reasonable bit resource allocation is the reason for the significant R-D performance improvement. The proposed method not only consider the R-D dependency, but also combine the R-D characteristics to optimize bit resource allocation at CTU level. In addition, improving the coding quality of the first frame also helps to improve the coding performance of the whole video sequence, especially for the test sequence with relatively fixed background, such as the sequences in Class E. All in all, the proposed method achieves better R-D performance than other competitors.

C. DISCUSSIONS
The proposed two-pass encoding based RDO method combines both the R-D dependency and R-D characteristics among coding units to adaptively determine the Lagrange multiplier and QP for each CTU. Therefore, it obtains the better R-D performance than other competitors. In order to further verify the effectiveness of the proposed method, we would like to give the subjective quality comparison of two test sequences encoded by the original HM16.7 and the proposed method, respectively. As we known, for a certain coding frame, the more bits consumed, the better the coding quality. Therefore, in order to make a convincing subjective quality comparison, figure 4 first shows the number of bits and the corresponding PSNR for each coding frame encoded by the original HM16.7 and the proposed method, respectively. We can see that the number of bits consumed by the same frame in the HM16.7 and the proposed method is almost the same, but the PSNR of the proposed method is always larger than that of the HM16.7. We select the 82nd coding frame of BasketballDrill and the 78th coding frame of FourPeople for subjective quality comparisons, because the coding frame consumes the similar number of bits in these two method. Figure 5 shows the decoded pictures and their local enlarged regions, where (a) and (b) are the 82nd frame of BasketballDrill encoded by the original HM16.7 and the proposed method, respectively. From the local enlarged regions, we can find that the basketball hoop and the floor in the proposed method are clearer than that in the original HM16.7, VOLUME 9, 2021 and the wood grain of floor in the proposed method is effectively preserved. In figure 5, (c) and (d) are the 78th frame of FourPeople encoded by the original HM16.7 and the proposed method, respectively. Similarly, we can find that the proposed method obtains better subjective quality. Note that, for the 78th frame in FourPeople, the total bits generated by our method is 1152, while the total bits generated by the original HM16.7 is 1176, which is slightly more than that generated by our method.
For the encoding complexity, since the proposed algorithm performs two-pass encoding, it is expected that the encoding time is increased by 100%. In addition, the MQP-5 [16] and TD-LMA [23] are also time-consuming methods. Therefore, the encoding time ratio of the three methods against the baseline version, the ratio of the geometric average of encoding time for all sequences under four different QPs, are presented in table 6. We can see that the encoding complexity of our approach is about 2 times compared with the original HEVC encoder. It is noted that the bit-stream generated by the proposed scheme is HEVC compliant, so the decoding complexity does not change.

VI. CONCLUSION
In this paper, we have presented a two-pass encoding based RDO method, whose goal is to further improve the compression efficiency of HEVC encoders. The existing dependent RDO methods adjust Lagrange multipliers or QPs by exploring temporal dependencies among coding units, and achieve R-D performance improvement to some extent. Different from the existing methods, the proposed algorithm adaptively determines the Lagrange multiplier and QP of CTUs not only according to the R-D dependency among coding units, but also combined with the R-D characteristics of coding units. Experimental results show that under configurations of low-delay B and P frames, the proposed method can achieve 5.3% and 5.6% bit rate savings on average, respectively, with around 2 times encoding complexity, compared with the original HEVC encoder. It is an interesting topic that improving coding speed of the second-pass by reusing coding information from the first pass in future work.