Optimal CTU-Level Rate Control Model for HEVC Based on Deep Convolutional Features

This paper proposes an optimal rate control model based on deep neural network (DNN) features to improve the coding tree unit (CTU)-level rate control in high-efficiency video coding for conversational videos. The proposed algorithm extracts high-level features from the original and previously reconstructed CTU blocks based on a predefined DNN model of the visual geometry group (VGG-16) network. Then, the correlation of the high-level feature and quantization parameter (QP) values of previously coded CTUs is explored for subjective visual characteristics to estimate the CTU-level rate control model parameters (alpha and beta) and the bit allocation of each CTU. Therefore, this paper also proposes a new model for Lambda estimation for each CTU by improving its relationship with the estimated bits per pixel to control the rate and relative distortion. Furthermore, the Lambda and QP boundary settings were adjusted based on the proposed perceptual model to ensure the rate control accuracy of each CTU. The results of experiments with the proposed algorithm, when compared to the rate control model in HM-16.20, reveal higher bitrate accuracy and an average BD-rate gain based on PSNR, SSIM, and MSSSIM metrics using the low-delay-P configuration.


I. INTRODUCTION
Rate control in all video coding applications is important for optimizing visual quality by appropriately allocating bits to each rate control stage at the group-of-picture-level, picturelevel, and block-level for a given bitrate condition. An excellent rate control model must be capable of assigning an accurate number of bits to each step by maintaining better visual quality in compressed videos. Conversely, a rate control model can be designed to require a lower bitrate to alleviate bandwidth bottlenecks and maintain the same subjective image quality [1]. Many rate control algorithms, for example, TM5 for MPEG-2 [2], VM8 for MPEG-4 [3], and H.264/AVC, have been studied extensively for different video coding standards [4]. Two rate control models were proposed and implemented on the HM reference software during the standardizing process for high efficiency video coding (HEVC), the latest video coding standard that The associate editor coordinating the review of this manuscript and approving it for publication was Byung-Gyu Kim . adopts several advanced technologies [6]- [15]. The pixelwise unified rate quantization (URQ) model [16], [17] was the first rate-control algorithm adopted in HM-6.0. The URQ model selects a corresponding quantization parameter (QP) that can be applied to the coding tree unit (CTU) block to set a target bitrate at the pixel-level. However, this is only effective when all coding parameters besides the QPs are fixed. Hence, rate-distortion (R-D) performance and bitrate accuracy are not optimal for CTU since the QP computation is acknowledged to suffer from the well-known 'chicken and egg' dilemma. Another rate control approach introduced for HEVC is the R-λ model [18], [19] that was developed to compute a target bitrate for a CTU in the λ-domain and demonstrated better rate control performance than several existing algorithms. However, the R-λ model for HEVC has several challenges in estimating the proper model parameters. At the CTU-level rate control, the R-λ model cannot achieve optimal performance for higher-resolution videos in low bitrate scenarios [20], especially for videos coded under a low-delay configuration [21], [22]. It is confirmed that, according to the human visual system (HVS), there is much perceptual redundancy of a video sequence that can be exploited further to improve the coding efficiency as well as the rate control performance of HEVC significantly. Note that the CTU-level rate control is essential for regulating the bitrate allocation, improving the coding performance, and comprehensively controlling the visual quality of a compressed video. In regards to the perceptual redundancy for the coding efficiency improvements, our previous study found that a perceptually adaptive QP at frame-level can significantly improve the coding efficiency of the HM-16.20 by taking advantage of the subjective characteristics of a video sequence [23].
This paper presents a CTU-level rate control algorithm for an HEVC encoder based on a deep neural network (DNN) feature. Investigations confirm that the existing R-λ model estimation does not accurately represent the relationship between the λ value and model parameters (α and β). Hence, the estimation process does not favor a proper R-λ relationship, and this leads to inaccurate rate control and failing to minimize distortion. Several studies have demonstrated that CTU-level rate control efficiency in HEVC can be achieved by diminishing the error due to the selected R-D model [24]- [26] and the error due to inaccurately estimating the model parameters [27]- [30]. In this study, the proposed algorithm establishes a new model for α and β estimations to improve the correlation between them and the estimated λ, based on the high-level features and QP values of previously coded CTUs. As in our previous study [23], highlevel features are obtained from the original and reconstructed CTUs using a pre-trained visual geometry group (VGG-16) network model [23], [31]. Based on the proposed models for estimating α and β, the R-λ estimation for each CTU is then reformulated to minimize distortion over the entire frame at a given target bitrate. Consequently, the QP estimation and the boundary settings of λ and QP are also adjusted based on the extracted high-level features to maintain visual quality and ensure the bit accuracy of each CTU. Evaluated against HM-16.20 and other algorithms using the PSNR, SSIM, and MSSSIM indices, the proposed algorithm demonstrates significant coding gains with notable visual quality enhancement.
The remainder of this paper is organized as follows. In Section 2, the CTU-level rate control in HM and related works are briefly discussed. Section 3 details the proposed CTU-level rate control for HM. The proposed algorithm is evaluated in Section 4, and the paper is concluded in Section 5.

II. CURRENT STATE OF R−λ MODEL FOR CTU-LEVEL RATE CONTROL ALGORITHM IN HEVC ENCODER
The R-λ rate control model is designed by taking λ into account as a crucial parameter during the rate control loop process that is estimated as follows: where bpp is the bit per pixel term for the estimated bitrate for a CTU. α and β are the tuning parameters related to video contents that will be updated after the encoding process for the co-located CTU of the subsequent frames. In HEVC, α has initial values set to +3.2003, and β is initialized to −1.367. Different initial values of α and β are claimed to have little impact on the compressed videos, R-D performance, and bitrate accuracy [19]. However, during the updating process, both model parameters are unable to fit the relationship between distortion and bitrate for each CTU [24]- [30] accurately. This is because, in general, the λ real value is not equal to α × (bpp real ) β . Therefore, the interrelationship between α and β towards λ is also critical to the R-λ rate control model performance. Note that bpp real and λ real respectively denote the actual consumed bpp and the actual λ used for calculating the QP after encoding each CTU. Once the bpp value is estimated for each CTU, the λ estimation process can be calculated based on equation (1). Then, the boundary adjustment of the estimated λ, the QP estimation process, and the QP boundary adjustment are affected. Please refer to the discussions in [18], [19], [32], [33]. Many studies have attempted to improve the CTU-level rate control for HEVC. Li et al. [33] proposed a weightbased R-λ rate control by applying the visual variation of pixels within a CTU to enhance the perceptual quality of conversational videos. However, the visual attention model proposed for the R-λ rate control is designed to affect only the region of interest, which may cause imbalanced bit allocations in other regions, resulting in the deterioration of the R-λ model accuracy. Li et al. [28] also proposed an optimal bit allocation (OBA) algorithm using a recursive Taylor expansion by arguing that bit allocation for the lowdelay-P (LDP) case of the existing CTU-level R-λ model is not optimal. However, the proposed algorithm does not consider any initial QP scheme that leads to insignificant performance when sequences are coded with higher initial QP values. Li et al. [29] proposed a model parameter estimation algorithm (MPE) for α and β for CTU-level rate control by analyzing the CTU scan order that correlates to the complexity of the CTU contents. However, the proposed algorithm does not examine the bit allocation, λ, and QP estimation models, which limits the MPE algorithm performance. Wang et al. [30] presented a novel rate control algorithm based on the improved λ parameter that inhibits bit fluctuation and improves video quality. However, the proposed algorithm did not identify the characteristics of the test sequences adequately. In short, the algorithm inaccurately estimated the model parameters. Zhou et al. [26] designed an SSIM-based R-D model to improve the CTU-level rate control algorithm. This algorithm formulates the CTU-level bit allocation as a global optimization problem and disregards several model estimation issues present in the CTU-level rate control. This limits the overall performance of the proposed algorithm.
In conclusion, the estimation models in the CTU-level rate control are crucial for obtaining high bitrate accuracy in all VOLUME 8, 2020 rate control models. This paper presents the development of a new estimation model for the α and β parameters as well as estimation models for the bit allocation, parameter λ, QP decision, and boundary adjustment of both λ and QP for the CTU-level rate control in the HEVC encoder. Furthermore, an adaptive QP decision at the frame-level [23] is also used for the proposed CTU-level rate control algorithm. The proposed algorithm is designed to impove the rate control in HM-16.20 reference software as its performance is significantly degraded with rate control enabled. The proposed algorithm achieves around 20% coding gain and enhances the subjective quality over the HM-16.20 with rate control on. The proposed algorithm is based on high-level features extracted from a predefined DNN model [31]. Note that the use of the DNN model for video coding is becoming more appealing to the video coding community [34]- [45]. However, investigations into using the DNN model specifically for CTU-level rate control for conversational videos are still rare. Therefore, a CTU-level rate control algorithm based on a predefined VGG network for HEVC is presented in this paper. The code of the proposed algorithm is available online: https://bit.ly/2QRxOjA.

III. PROPOSED CTU-LEVEL RATE CONTROL ALGORITHM FOR THE HEVC ENCODER
The R-λ rate control model in HEVC can be written as Eq. (1). There are three main factors involved: the estimated bpp and two model parameters, α and β. The bpp term in the CTU-level rate control of HM-16.20 is calculated by including the α and β parameters that may also influence the accuracy of the bpp estimation. Regardless of the block characteristics, for all CTU blocks in the first frame, α and β are initialized with fixed values, respectively set to +3.2003 and −1.367. Although the original study [18], [19] argues that the video contents determine α and β, the model parameters in the existing CTU-level rate control are still unable to represent the bpp-λ relationship accurately. Based on our investigation, improper α and β parameters are responsible for the deficiency in achieving minimal distortion and rate control accuracy. Figure 1 shows an example of the visual quality of the 'BQTerrace' sequence encoded by the existing CTU-level rate control model in HM-16.20 under the LDP configuration. The subjective quality of the sequence gradually decreases from the first to the last frames. This phenomenon is caused mainly by the failure of rate control when estimating the model parameters. Therefore, a strong interrelationship between α-λ and β-λ is also crucial for the CTU-level rate control to improve the bpp-λ relationship, which also enhances the visual quality of a sequence.
In this paper, the proposed algorithm explores the use of visual feature extractions based on a particular convolutional layer of a DNN model for CTU-level rate control purposes. This paper proposes the use of a pre-trained DNN model to tackle model parameter estimation for rate control at the CTU-level that is intended to improve the rate control performance for conversational video services. Both spatial and temporal features are considered for estimating the α and β parameters that also influence the design of other estimation processes, including the estimations of bit allocation, λ, and QP. In addition, the boundary settings of λ and QP are also fine-tuned to be more perceptual-friendly. Since rate control algorithms also rely on the initial QP of a frame, the proposed CTU-level rate control takes into account our previous study of perceptual adaptive QP decision to handle the frame-level initial QP decision. Please refer to the detailed algorithm, as discussed in [23]. Figure 2 illustrates the flow of the proposed algorithm. The proposed CTU-level rate control algorithm is applied after the frame-level QP initialization in [23] is employed. The VGG feature for the proposed algorithm is first estimated during the CTU-level rate control loop. Note that the estimated VGG feature in the proposed algorithm is different from our previously published work. For the first frame in a sequence, the proposed algorithm is designed straightforwardly. The standard deviation value (StD) of the original CTU block is examined to estimate the model parameters of each CTU within an Intraframe. Then, the estimations of bit allocation, λ, and QP are kept the same as in the HM-16.20 default processes. After encoding the first Intraframe, a pretrained VGG-16 model is employed to extract visual features from the original and the reconstructed CTU to estimate the α and β parameters, λ, and QP, as well as the boundary settings of λ and QP for consecutive frames. The designed visual features result in a perceptual loss value based on the Euclidean distance measure.
A detailed description in several sub-categories is given as follows.

A. ESTIMATION OF THE PROPOSED VGG FEATURE
The use of our previously published work here [23], as illustrated in Figure 3, is to investigate further our findings on the visual features extracted from the original and reconstructed CTUs using a pre-trained VGG-16 network. The pre-trained VGG-16 trained on the ImageNet dataset [31] is directly employed for examining the HM-16.20 CTU-level rate control model without conducting a separate training phase. The reason for the use of the pre-trained VGG-16 network has also been discussed in our previously published paper in [23]. Briefly, the VGG-16 has an extremely deep convolutional layer, which results in convolution filters intimately to search universal patterns and generalize them. Therefore, many studies use the predefined VGG-16 model as a feature extraction technique. Note that the visual features in our previous study [23] are based on the averaged VGG feature of all CTUs within a picture used for determining the frame-level adaptive QP, which is different from the feature used for the proposed CTU-level rate control.
In this proposed CTU-level rate control, instead of using the averaged VGG features, individual CTU-level VGG features are applied to favor the model estimations for the proposed CTU-level rate control algorithm. The correlation between the visual feature and the QP value of each CTU isused, as shown in Figure 4. The 'BlowingBubbles' test sequence is coded with QP 22, 27, 32, and 37. The normalized StD values of the original CTU blocks in Figure 4(a) coded under the 'All Intra' configuration were observed to have a strong correlation with the selected QPs. It is emphasized that the StD value from the original CTU block is regarded as the visual characteristic for the proposed CTU-level rate control algorithm of the first frame only. The remaining CTUs in the frames following will apply the proposed VGG feature. Figure 4(b) shows a high correlation between the VGG feature and QP selection per CTU under the IPPP structure.

B. ESTIMATION OF BIT ALLOCATION AND MODEL PARAMETERS FOR THE PROPOSED CTU-LEVEL RATE CONTROL ALGORITHM
The R-λ relationship in Eq. (1) of the CTU-level rate control model in HM-16.20 is shown to be inadequate for optimally allocating bits for each CTU and inaccurately computes the model estimations. The estimated α, β, and bpp are the key parameters to overcome the disadvantages mentioned above. First, the estimated α, β, and λ of each CTU from the first 17 frames of the 'BlowingBubbles' test sequence are collected. Then, the correlations of each estimated α and β parameter with the λ values of all CTUs within a frame are observed using the Pearson product-moment correlation  coefficient, and the correlation results are tabulated per frame, as shown in Table 1. From the table it can be seen that the α estimated − λ and β estimated − λ in the CTU-level rate control of HM-16.20 are very weak, which is critical for the bpp−λ relationship. In particular, when POC = 0, α estimated and β estimated are set to their default values, which shows no correlation to the estimated λ in Table 1. Inaccurate α estimated and β estimated on the first frame lead to the miscalculation of allocating bits, estimating λ, degrading distortion, and decreasing visual quality; all the results may also affect the following frames.
To improve the correlations of α estimated − λ and β estimated − λ in the CTU-level rate control, the estimation process of the α and β parameters is analyzed by taking advantage of the VGG feature and QP value relationship in Figure 4 before completing the encoding process. The updating parameter process of α and β for a CTU based on the existing work in [18], [19] is observed, symbolized as α new and β new and respectively can be computed as per Eq. (2) and Eq. (3) According to [18], [19], Eq. (3) can also be formulated as Note that α estimated denotes the estimated α, and β estimated corresponds to the estimated β. δ α and δ β are constants, bpp real represents the bits per pixel required during the encoding process, and λ real and λ comp represent the actual and the estimated λ, respectively. Inspired by the updating process of α and β in Eq. (2) and (4), the VGG feature and QP value of each CTU are investigated experimentally to improve α estimated -λ and β estimated -λ. The parameters λ real , bpp real , δ α , and δ β in Eq. (2) and (4)   in (5) and (6), before the encoding process, expressed as where VGG prevCTU and QP prevCTU represent the VGG feature and QP value of the previously coded CTU. Note that CTUs of POC = 0 set the StD value of the original CTU block as the VGG features, and the QP value is set to the frame-level initial QP value. As a result, the proposed model of the estimated α new and β new can demonstrate stronger correlations for the relationship α new -λ and β new -λ compared with the existing estimated α and β models in HM-16.20, as shown in Table 1. Consequently, the bit allocation formula is also proposed for the CTU-level rate control based on the estimated α new and β new , which can be defined as where bpp CTU is the bit per pixel allocation of the current CTU, T CTU stands for the current CTU target bits, and N CTU denotes the total number of pixels in a CTU. Other parameters, such as T pic and λ pic , are the target bits and the estimated λ from the frame-level rate control. In (7), η CTU is calculated to represent the weight of the current CTU to satisfy the T CTU constraint.

C. ESTIMATION OF λ AND QP FOR THE PROPOSED CTU-LEVEL RATE CONTROL
Since Eq. (5) to (7) are intended to improve the estimation α, β, and bit allocation of each CTU, and the distortion minimization can also be achieved for the R-D performance.
However, to improve bpp-λ in (1), a new estimated model λ new is also proposed as Note that the term VGG prevCTU ×2.07052 +16.771 is introduced to minimize distortion and be more visual-friendly over the rate, and vice versa. As a result of the above analysis, the correlation coefficients of bpp CTU and λ new of each CTU block from several frames of the 'BlowingBubbles' sequence are depicted to observe how well these statistical models fit the experimental analysis. As shown in Figure 5, the bpp CTU -λ new relationship strongly matches the model in (7) and (8) with the given adaptive α new and β new parameters. Note that α new and β new , the parameters in Figure 5, are found in the average values of all CTUs in a frame. Table 2 shows comparisons between the proposed bpp CTU − λ new relationship and the existing bpp-λ in HM-16.20 applied to the first 17 frames of the 'BlowingBubbles' sequence in percentage. The relationship models of the proposed and HM-16.20 are compared using R 2 , illustrated in Figure 5, as the coefficient of determination, which has a value in the range [0, 1]. R 2 value that is closer to 1 is the better model. The proposed models tend to have better correlations of bpp-λ than the existing HM models.
Finally, based on the proposed visual feature, given the newly estimated λ new , the estimation QP denoted as QP new can also be further computed as QP new = 4.2005 × log λ new + 13.7122 + VGG prevCTU (9) In addition, for every CTU, the boundary settings of λ and QP respectively should also be adjusted to impose an appropriate bit allocation on each CTU based on the visual feature characteristic. Specifically, the λ new boundary setting can be defined VOLUME 8, 2020  in Eq. (10), as shown at the bottom of the next page, and correspondingly, the QP new boundary smoothing is modified as Eq. (11), as shown at the bottom of the next page. Note that λ prevCTU and QP prevCTU stand for the newly estimated λ and QP of the previously coded CTU block. From (10) and (11), the proposed CTU-level rate control promises more variations for λ new and QP new to facilitate better visual quality for the coded CTU based on the proposed visual feature. Therefore, CTUs that have a higher VGG prevCTU may have more considerable latitude for visual quality improvements.

IV. EXPERIMENTAL RESULTS
The proposed rate control algorithm was evaluated under experimental conditions in Table 3. Specifically, the proposed algorithm was assessed based on objective and subjective performance evaluations under the common test conditions of HEVC [46]. For objective performance, several assessments were carried out, such as bitrate accuracy, bitrate error, coding efficiency, and objective visual quality based on three different metrics: PSNR, SSIM, and MSSSIM. The subjective evaluation was performed by conducting the mean opinion scores (MOS) test and calculating the difference MOS (DMOS) scores. The assessments were completed by comparing the proposed algorithm with the HM-16.20, URQ model [17], OBA model [28], and MPE model [29].

A. OBJECTIVE PERFORMANCE EVALUATIONS
The same experimental environment was set for both the anchor CTU-level rate control algorithm in HM-16.20 and the proposed algorithm to obtain fair comparisons, as listed in Table 3. All the objective performance measures are listed in Table 4. A total of 16 test sequences are encoded under the LDP configuration with the IPPP structure, which is a typical case in practical applications [19], using all QP parameters: 22, 27, 32, and 37. All experiments of each QP were then summarized by averaging the results of every test sequence. The bitrate error BE is defined as where BA proposed and BA HM denote the bitrate accuracy produced by the proposed algorithm and the CTU-level rate control model of the anchor software, respectively. The smaller the BE value, the better the improvements. Note that BA proposed and BA HM are calculated based on the same target bitrate expressed as where BA represents the result of the bitrate accuracy, TB denotes the given target bitrate, and AB stands for the actual bitrate generated by the tested rate control algorithm. Thus, AB may differ according to the rate control model that is being tested. The main objective of the BA evaluation is to check how accurately the tested models can meet the given target bitrate TB. In terms of the objective visual quality, the assessments were applied by observing the difference between the generated PSNR, SSIM, and MSSSIM metrics of the proposed algorithm (symbolized as PSNR proposed , SSIM proposed , and MSSSIM proposed ) compared with the HM-16.20 software (symbolized as PSNR HM , SSIM HM , and MSSSIM HM ), defined as Y PSNR , Y SSIM , and Y MSSSIM denote the difference values of the objective visual quality under the PSNR, SSIM, and MSSSIM metrics, respectively. A positive value for (14) to (16) indicates that the objective visual quality of the proposed algorithm is better than that of HM-16.20. Finally, the coding efficiency performance (BD-BR) of the proposed algorithm was also measured against the anchor algorithm with the PSNR, SSIM, and MSSSIM metrics, denoted as BD-BR-PSNR, BD-BR-SSIM, and BD-BR-MSSSIM, respectively. A negative value of BD-BR indicates gains over the anchor CTU-level rate control algorithm.
From Table 4, the proposed algorithm can surpass the objective performance evaluations of the existing HM-16.20 algorithm. Primarily, test sequences that have larger background areas, many homogeneous regions, and slow motions, such as 'BQTerrace,' 'BQSquare,' 'FourPeople,' 'Johnny,' 'KristenAndSara,' etc., the proposed algorithm can facilitate significant coding gains in all objective quality aspects. For instance, in terms of the BD-BR PSNR, BD-BR-SSIM, and BD-BR-MSSSIM, the proposed CTU-level rate control algorithm distributes gains up to −57.08%, −76.29%, and −75.57%, respectively. Conversely, the proposed algorithm can yield moderate coding improvements for 'PartyScene,' 'Kimono,' and 'RaceHorses' that have more moving textures and more motions.
Comparisons of the objective performances of the proposed and other algorithms in the HM-16.20, URQ, OBA, and MPE models are presented in Tables 5-8. The BD-BR performances of the proposed algorithm and other algorithms applying the PSNR, SSIM, and MSSSIM metrics are compared in Table 5.  Table 5, Class E contributes the highest BD-BR gain in all metrics that significantly influences the averaged BD-BR results of the proposed algorithm. The visual characteristics of sequences classified to Class E, i.e., 'FourPeople,' 'Johnny,' and 'KristenAndSara,' play a prominent role in obtaining higher objective measures for the proposed algorithm.
For the PSNR, SSIM, and MSSSIM quality comparisons in Table 6, the proposed algorithm can produce substantially more quality over the other algorithms, mainly against HM-16.20, OBA, and MPE. In other words, the proposed algorithm cannot attain the PSNR difference from the URQ model because URQ tends to require more bitrates than the given target bits abnormally during the encoding process.    Therefore, the giher PSNR scores can be obtained. Consequently, the URQ model bit rate accuracy and bitrate error are also less precise than the other algorithms, as shown in Table 7 and Table 8, respectively. Fortunately, among all existing models for the SSIM and MSSSIM comparisons, the proposed algorithm is the most effective. However, a relatively small SSIM and MSSSIM rate is achieved when compared with the URQ model, on average, at 0.00342 and 0.00197, respectively. These effects are mainly caused by the above issues of the URQ model. Accordingly, the proposed   algorithm exhibits the highest performance for bitrate error comparisons of approximately −1.12042 against the URQ model, as shown in Table 8.

B. SUBJECTIVE PERFORMANCE EVALUATIONS
The main goal of the subjective quality evaluation is to compare the proposed CTU-level rate control algorithm with other algorithms, including HM-16.20, URQ [17], and OBA [28]. For all the test sequences, the double stimulus continuous quality scale (DSCQS) method [47] was performed. Sixteen reviewers participated in the test, of which 11 were in the relative field, and the rest were naïve in image processing. Simple demonstrations were conducted to introduce the evaluation process to the reviewers. The reconstructed frames from the proposed algorithm, HM-16.20, URQ, and OBA, were randomly displayed twice with all the QP values for each participant. Then, the observers were instructed to give MOS values on a continuous scale ranging from 1 to 5. Finally, the MOS values were processed and depicted in the results, as shown in Figure 6. As shown in Figure 6, the proposed algorithm produces higher visual quality at all QP settings compared to conventional algorithms. Moreover, the proposed algorithm controls a better tradeoff between the rate and distortion with QP = 37 than the competing models and still maintains visual quality. In figure 6, the URQ model is the competing algorithm with the smallest MOS against the proposed algorithm. This is mainly due to the inability of the URQ to control the overflow bitrates. In our analy- sis, the URQ model tends to require more bits and abnormally exceeds the given target bitrate. In the reconstruction frames, the model ensures better visual quality than the other algorithms. In Figure 7, two example frames from POC = 77 of 'BQTerrace' coded at 2,352 Kbps and POC = 89 of 'PartyScene' coded at 1,559 Kbps are presented to show the visual quality comparisons of the proposed and the other existing algorithms. From the visual quality comparisons depicted in Figure 7, the proposed algorithm achieves better visual quality in the two frame examples than in those produced by the HM-16.20, URQ, and OBA rate control models. To verify the visual quality comparisons from the existing algorithms and the proposed algorithms, the DMOS scores were then calculated, respectively symbolized as MOS otherModels and MOS proposed , which can be defined by Table 9 shows all the DMOS test sequences. The average DMOS per sequence for all the QP values is listed to quickly find the visual quality comparisons of the generated reconstruction frames. Positive values indicate that the video quality of the proposed algorithm is subjectively better than that of existing algorithms. As tabulated in Table 9, the DMOS scales for the entire test sequences are moderately outperformed over the HM-16.20, URQ, and OBA models.

C. COMPLEXITY PERFORMANCE EVALUATIONS
The proposed algorithm requires a tradeoff between a significant performance and running time in comparison with the existing rate control algorithm in HM-16.20. The additional complexity originates from constructing the high-level features from the original and reconstructed CTUs using the VGG-16 network. However, the proposed algorithm can be fully-optimized in parallel to speed up CTU-by-CTU feature extractions. Hence, throughput can be enhanced with a parallel machine such as GPU that suppresses more encoding time.
In addition, the proposed CTU-level rate control with the predefined VGG-16 employment requires about 24× more encoding time over the HM-16.20 reference software with rate control enabled. The proposed algorithm also takes a higher running time of about 18× than the MPE rate control model.

V. CONCLUSIONS
In this paper, a deep-learning feature-based CTU-level rate control is proposed to obtain better objective and subjective coding performance for HEVC under the low-delay-P configuration. The proposed algorithm utilizes a predefined model of the VGG-16 network to extract features from both the original and reconstructed CTU blocks. The proposed algorithm was designed by exploring a perceptual loss function based on the extracted features combined with the QP value of each CTU to remodel the estimation functions of the existing CTU-level rate control of HM-16.20. Compared to the anchor, the proposed algorithm conludes better rate control performances by enhancing the visual quality through significant coding improvements. For future work, the proposed algorithm in [23] will be advanced with several more adjustments that benefit the rate control model for the randomaccess configuration of HEVC.