Deep Convolutional Feature-Driven Rate Control for the HEVC Encoders

This work proposes a rate control model based on deep convolutional features to improve the video coding performance of the HEVC encoders under the random access (RA) configuration. The proposed algorithm extracts high-level features from the original and previous coded frames using a pretrained visual geometry group (VGG-16) model by considering characteristics of a different temporal layer for the RA configuration. Subsequently, R–λ parameters (alpha and beta), bit allocation, λ estimation, and quantization parameter decision at frame-level are formulated by utilizing the extracted high-level features to maintain video quality and bitrate accuracy control. In addition, bit allocation at the group-of-picture (GOP)-level rate control is proposed with perceptual-based thresholding to control smooth bitrates and visual quality between adjacent GOPs. The results verify that the proposed algorithm is efficient in coding performance and bit accuracy by keeping visual quality. Compared with the existing R–λ rate model in HM-16.20, the proposed models can achieve an average BD-rate gain of -4.39% and -8.74% in PSNR and MSSSIM metrics for the RA configuration, respectively.


I. INTRODUCTION
The media services on the wired and wireless internet, such as video streaming and video communication, have gained popularity. More people at diverse locations and situations access media content and services. Thus, delivering highquality video content and minimizing rates over heterogeneous networks is a consistent requirement of content providers and consumers. In video coding, the rate control has been widely known to play a critical role in transmitting high-quality video under a particular network bandwidth and limited buffer capacity. Hence, the rate control algorithm that can serve high video quality under the bandwidth constraint designated for such video applications becomes essential.
Many rate control studies exist for high-efficiency video coding (HEVC) encoders [1][2][3][4][5]. During the standardization process of HEVC, the R-λ rate control model has been adopted for the HEVC encoders [2,3]. This model aims to (1) control the high accuracy of bitrates, (2) maintain ratedistortion performance, and (3) optimize visual quality by appropriately allocating bits to different rate control stages: group-of-picture (GOP)-level, frame-level, and block-level at a given bitrate. However, to this day, it is challenging to find rate control algorithms designed for these video applications targeting perceptual quality optimization. In HEVC, a random-access (RA) configuration is introduced for the typical video streaming and broadcasting applications. Generally, the RA configuration in HEVC employs a dyadic high-delay hierarchical B-prediction structure. Frames in a video sequence are arranged into different temporal layers for a higher compression rate and visual quality [6]. Such a hierarchical B-prediction structure and advanced coding tools [7][8][9][10][11][12][13][14] make developing a RA configuration rate control more challenging.
In contrast, the current rate control model in HEVC for the RA configuration has been more stable than the other coding configurations. However, some limitations of the existing rate control under this configuration need to be addressed appropriately. The bitrate allocation for each frame could still be misallocated. It turns into failing the bitrate accuracy control and causes an imbalance in a compressed video visual quality. Previous studies reveal that the existing rate control model miscarries several rate control parameters that degrade rate-distortion (RD) performance, bit accuracy, and subjective video quality [15][16][17][18][19][20][21]. The human visual system (HVS) supports that there is much perceptual redundancy in videos. However, the current rate control model in HEVC encoder reference software, e.g., HM-16.20, does not entirely consider perceptual characteristics.
To this end, this work proposes a deep neural network (DNN) feature-based GOP-level and frame-level for the HEVC encoder by extending a perceptual adaptive quantization parameter (QP) selection algorithm [22]. There have been many attempts to leverage the capability of DNN for perceptual quality purposes [22][23][24][25][26][27][28]. A common conception is that the DNN-based extracted features are primarily implicated in capturing global characteristics of the input images appealing for adaption perceptual quality problems. At the frame-level, the proposed algorithm formulates an estimation model for α and β parameters based on high-level visual features extracted from the original and reconstructed frames using a pretrained visual geometry group (VGG-16) network model [29]. It is expected to improve the quality of α and β decisions, affecting the R-λ relationship quality. Then, the proposed algorithm redefines the R-λ estimation for each frame to minimize distortion and save more bitrates without inducing perceptible visual depressions in the compressed video frames.
Further, the hierarchical B-prediction structure in the RA configuration is considered when extracting the visual features and taking other coding parameters, i.e., temporal layer id, initial QP value at the frame level, and base QP value. These coding parameters maintain the λ value that can improve visual quality. Consequently, adjustment on the QP estimation is also determined based on the proposed visual features. Moreover, a more perceptual-friendly bit allocation model at the GOP-level is proposed by considering the frame rate of sequence information and proposing a perceptualbased threshold to control bitrates smoothness between the adjacent GOPs. Thus, the proposed algorithm demonstrates higher coding gain evaluated under PSNR and MSSSIM by approximately -4.39% and -8.74%, respectively, with some visual quality improved than HM-16.20.
The remainder of this paper is organized as follows. Section 2 briefly discusses rate control in HM and related works. Section 3 details the proposed rate control. In Section 4, an evaluation of the proposed algorithm is described. The conclusion of the work is presented in Section 5.

II. CURRENT STATE OF THE R− MODEL
This section discusses the existing R-λ rate control model in HM-16.20 for RA configuration. The λ value at frame-level in the R-λ model [2,3] is estimated as The bpp term is the estimated bit per pixel, whereas α and β at the frame-level rate control are set as +6.7542 and +1.7860 for I-frame, and other frames are α = +3.2003 and β = −1.367. The original studies [2,3] claimed that these values were provided based on video content. However, studies have discovered [30][31][32] that the existing R-λ model in the HM software is not optimal for the coding structure for the RA configuration.

A. EXISTING R-λ RATE CONTROL MODEL IN THE RA CONFIGURATION OF HM ENCODER
The first limitation is that the R-λ rate control model tends to consume more bits for frames at the beginning of a sequence. Since more bits are used to encode the first few frames, fewer bits are available to compress the rest of the frames in a sequence. However, this condition influences the overall rate control performance. Figure 1 depicts the frameby-frame quality of bitrate, PSNR, and MSSSIM for the "BQMall" generated from HM-16.20 with the rate control enabled for the RA configuration. The PSNR and MSSSIM qualities of each frame are not well maintained when the model attempts to code several last frames. It makes sense that a lack of bitrate forces the encoder to code a frame with a high QP and generate a low quality of a compressed video. Figure 2 shows the reconstructed frames by the HEVC encoder with the activated rate control. The quality of reconstructed frames is progressively degraded as the available bitrate is getting limited. This figure reflects that an adequate bit allocation for a frame may produce a higher visual quality subjectively than those allocated with insufficient bitrates. This phenomenon exists because the Rλ rate control model fails to provide appropriate α and β, leading to inaccurate bit allocation [16][30] and deteriorating  λ and quality of the subsequent frames [18][19][20][21]. For further investigation, Table 1 lists the actual bitrates for several test sequences generated by the current rate control model in HM-16.20 for the RA configuration. Notably, the target bitrate is provided to assess how well the model can control the bitrate accuracy with different QP settings. The smaller the bitrate accuracy value is, the better the improvements of the rate control algorithm. However, the rate control bitrate accuracy in HM-16.20 is not well maintained, especially when the test sequences are tested using higher QP settings, such as in QP 32 and 37. In essence, the quality of α and β impacts the whole performance of the rate control model. Accordingly, it is crucial to reevaluate how Eq. (1) can provide optimum HEVC encoder performance.

B. SOME EFFORTS TO IMPROVE THE EXISTING RATE CONTROL MODEL IN THE RA CONFIGURATION
Many studies have attempted to improve the rate control in HEVC. For GOP-rate control, Wu et al. [33] proposed a rate control by exploiting the R-Q model for HM-13.0. The proposed algorithm is designed by considering the temporal prediction structure in HEVC. Although this algorithm aims to tackle only GOP-level rate control, determination on the QP value for the first frame and the coding tree unit (CTU)level rate control is also presented. However, the proposed algorithm can merely favor a small coding gain. To reiterate the aforementioned Q-domain rate control problem, it still suffers from a general "chicken and egg" dilemma. Rodriguez et al. [34] proposed a buffer-constrained rate control for real-time HEVC with the hierarchical GOP structure in HM-9.0. This algorithm operates on three layers, namely the intra period layer, picture-level, and CTU-level. However, this algorithm also runs under the Q-domain rate control model, which might suffer from similar general Qdomain rate control problems. In addition, the same limitation applies to the Wang et al. [35] algorithm designed based on the R-Q model rate control for HM-8.0. For the frame-level rate control, Pan et al. [36] proposed a frame-level rate control based on visual characteristics of the input video for HEVC. This algorithm estimates the greylevel co-occurrence matrix to determine the relationship between the visual video characteristics and rate. However, this algorithm is designed only for the Low-Delay-P configuration, which has a more straightforward hierarchical picture structure than the RA configuration. Gong et al. [31] proposed a temporal-layer-motivated R-λ rate control at the frame-level for the RA configuration in HM-14.0. Their work argues that the R-λ rate control model in HM is not optimal for this configuration. Accordingly, this algorithm facilitates a novel λ estimation by considering the framelevel motion difference. Distinguishing these sequences with slow and fast motions indicates this algorithm also designed the λ estimation, including λ and QP clipped function separately. Zhang et al. [37] proposed a frame-level rate control algorithm by observing the GOP-level rate control model quality. However, the proposed algorithm is designed only for low delay hierarchical GOP structure. This algorithm proposed a more adaptable quality dependency model for the current GOP by analyzing the low delay reference structure. Subsequently, the current GOP quality  proposed a frame-level and GOP-level rate control for a low delay structure [38]. The RD characteristics at the framelevel of the encoded frames at the same temporal layer position in adjacent GOPs are analyzed to formulate bitrate estimation for the GOP-level. Then, a global RD optimization was then proposed based on a recursive Taylor expansion model to cover the Lambda estimation, which is then used to benefit the bit budget at frame-level rate control. For the CTU-level rate control, there have also been many works attempting to improve the rate control performance of the HEVC encoders by considering the perceptual characteristics of CTU blocks. Woong et al. [39] proposed a luminance adaptation characteristic based on a pixel domain just noticeable difference model to determine bit allocation and QP value for each CTU in a frame. However, the model tends to have a similar performance with the anchor R-λ rate control model. Guo et al. [40] proposed an inter-block dependency model to improve the CTU-level rate control by estimating some propagation factors of each 16×16 block. However, the algorithm was designed only for a low delay coding structure, which may have some difficulties applying it for RA configuration. Zhou et al. [41] formulated a clip function of λ and QP for the CTU-level rate control by proposing a visual difference predictor model for a high dynamic range input video. Raufmehr et al. designed a fuzzy rate controller-based for the scalable HEVC [42]. However, both algorithms were proposed for different use cases from the proposed algorithm. Bosse et al. proposed a distortion sensitivity model based on a deep neural network to estimate bitrate at CTU-level [43]. The work shows a significant improvement compared with a constant QP setting under the 'all-intra' configuration of the HEVC encoder. However, it does not provide evaluations against the rate control settings and much information on the neural network architecture. Li et al. extracted characteristics of each CTU block to estimate allocation bits at the CTUlevel applied for the 360-degree video [44].
This work proposes novel estimation models for α, β, λ, bit allocation, and QP for the R-λ rate control model in HM-16.20. A deep convolutional feature-based GOP-and framelevel rate control are proposed for the RA configuration. The DNN model for video coding has gained much attention in the video coding community [45][46][47][48][49][50][51][52]. Thus, the proposed algorithm obtains some advantages of using high-level perceptual features constructed from the original and reconstructed frames using a predefined VGG-16 model. The proposed algorithm achieves a higher compression rate and improves the compressed quality video against HM-16.20 with the rate control.

III. PROPOSED RATE CONTROL FOR THE RA CONFIGURATION IN THE HEVC ENCODER
As discussed in Section 2, the existing R-λ rate control in the HM software is not optimal for the RA configuration coding structure. From formula (1), the R-λ rate control model is expressed with three main factors, namely the estimated bpp, α, and β parameters. In the frame-level rate control, α and β are initialized differently from those at the CTU-level rate control. Although the original work [2,3] committed that α and β were based on the characteristics of test video sequences. However, α and β in the existing frame-level rate control cannot correctly impact the bpp-λ relationship. Inaccurate α and β parameters are responsible for inaccurate bitrate control and achieving minimal rate-distortion, as depicted in Figure 1, Figure 2, and Table 1.
This work determines the values of α and β for the framelevel rate control by considering the high-level features of a particular convolutional layer of the VGG-16 architecture. From previous work in [30], it is observed that a strong interrelationship between α-λ and β-λ is vital for improving the quality of the bpp-λ relationship. Consequently, it will also result in increasing the reconstructed video quality subjectively. Specifically, a pre-trained VGG-16 model is aimed to extract perceptual features from the original and reconstructed frames to address the estimation problems at the existing rate control, including the estimation of bit allocation at GOP-and frame-level, parameters of α and β, λ and QP values. In addition, the initial QP of a frame based on [22] is involved for the proposed frame-level rate control to handle the frame-level initial QP decision. Both spatial and temporal perceptual features are also accustomed to these estimation problems to satisfy the RA configuration coding structure. Notably, this proposed algorithm is designed mainly for the B-frame type. Therefore, it is required to check whether the frame being coded is an Iframe or B-frame when estimating the α, β, and λ. The Iframe type is assessed using the existing model as in HM-16.20, whereas the B-frame is based on the proposed ones. Finally, the QP value can be determined based on the design of the proposed algorithm. The CTU-level rate control and the rest process in the unit encoding are conducted as those in HM-16.20.

A. VISUAL FEATURE EXTRACTION BASED ON THE PRE-TRAINED VGG-16 NETWORK
This work begins by using the symbol for referring to the proposed perceptual loss value for each frame.
is the averaged value of all perceptual loss values of CTUs within a frame estimated from the high-level visual features extracted using the VGG-16 network. The proposed algorithm explores the perceptual loss for a frame to address some issues of the estimation models at GOP-and frame-level rate control scheme in HM-16.20 for the RA coding structure. In this work, denotes the perceptual loss generated from a collocated CTU position between the original and reconstructed frames based on the double-simplified VGG-16 network, as depicted in Figure 3 and detailed in [22]. Notably, is conjectured using the Euclidean distance as a perceptual loss function in the range 0 to 1 to reflect the human HVS attribute. Furthermore, is fashioned by considering the RA coding configuration temporal structure in the HEVC encoder for a better HVS attribute deliberation in the proposed algorithm, as illustrated in Figure 4.
The proposed double-simplified VGG-16 framework is designed based on the full-reference visual quality approach by excluding the 'pool5' and 'fully connected' layers from the VGG-16 network. Both layers are traditionally favored by specific classification objects, which intuitively influence the quality of generated visual features. Consequently, the proposed framework expects merely developed features from the last block of the convolution layer 'block5conv1' in Figure  3. A pre-trained VGG-16 network trained on the ImageNet dataset is directly used without separately training it on the network. The VGG-16 network has been well-known to produce a high-quality visual feature for many computer vision applications. To this end, the extremely deep convolutional layer of VGG-16 is exerted for each CTU block of the original and reconstructed frames. Then, it is required to identify universal patterns and generalize them for the perceptual features. Notably, the network is prepared for the input with an RGB color format.

B. BIT ALLOCATION FOR THE GOP-LEVEL RATE CONTROL ALGORITHM
In the HM-16.20 software, the bitrate allocation for GOP is governed by a fixed smooth window size parameter (SW) set to 40, which applies to all GOP cases in a sequence. Notably, the SW parameter is aimed to supervise the smoothness of bitrates between the adjacent GOPs. However, in HEVC, each test sequence may have different frame rates and varying target bitrates. Accordingly, a static SW parameter will not be perceptually canny for different video test conditions with a distinct visual characteristic.
The proposed bit allocation of GOP-level rate control , expressed in Equation (2), attempts to consider a more visually appealing approach by assessing the perceptual loss value of the last coded frame in a GOP. In contrast to designing the entire bit allocation of the existing model, the proposed algorithm extends the current model by utilizing the frame rate FR, the remaining frames going to be encoded FL, the average frame bitrate for the entire frames in a sequence , the remaining bits after encoding the previous GOP , and GOP size information of each test sequence, when the λ value of the last coded frame at a GOP is less than the given threshold.
defined in Equation (3) is introduced as a perceptual-based threshold with similar goals as SW in HM-16.20 to control the smoothness of bitrates between the adjacent GOPs. However, this control function for the proposed algorithm is provided to be more adaptive than the existing smooth window size parameter in HM- 16.20.
is computed based on the value, λ value of the previously coded frame , and λ value of the current frame .
in the proposed bit allocation of the GOP-level rate control is calculated as in Equation (4), which is the same as the existing rate control model in  where TB and NF refer to the given target bitrate for a sequence and the full frames in a sequence, respectively. (4)

C. BIT ALLOCATION FOR FRAME-LEVEL RATE CONTROL ALGORITHM
The rate control algorithm in HM-16.20 formulates bit allocation at the frame-level by firstly estimating bit ratio ꞷ formed by considering the hierarchical B-prediction for the RA configuration. Next, the estimated bit ratio is capped depending on the remaining frames that will be encoded FL set to less than 16. When this condition is unsatisfied, the frame-level bit allocation will be estimated based on a static scale threshold set to 0.5, determined empirically. This existing thresholding scheme on the frame-level bitrate estimation model in HM-16.20 does not stand with the visual characteristic of a frame that will be coded. Depending on the feature of a frame, the threshold scale may not favor the rate control performance. Furthermore, it lacks temporal information from the previously coded frame perceptualwise, predominantly when the model is applied to budgeting bitrate at the frame-level designed under a dyadic high-delay hierarchical prediction structure RA configuration.
The proposed frame-level bit allocation considers the perceptual loss value to alter the thresholding scheme of the existing bit allocation at the frame-level.
is designed for the RA coding structure manner. The proposed attempts to determine each frame bit allocation by creating the existing threshold more adaptive and broader within the range values and getting better exposure on a visual-friendly bit allocation model. Notably, the proposed extension model applies when FL is larger than 16. Otherwise, the exact estimation model as in HM-16.20 is  applicable. The proposed bit allocation of frame-level rate control can be described as in Equation (5), where denotes the remaining bits of the current GOP after allocating bits , represents the bit ratio of the i-th frame, and N is for the total number of frames in the sequence.

D. ESTIMATION OF Α AND Β PARAMETERS, Λ, AND QP FOR THE FRAME-LEVEL RATE CONTROL
After calculating each frame target bitrate, the next step is to determine the λ value for each frame based on Equation (1). In HM-16.20, the bpp term is derived with the target bitrate of a picture over the number of pixels in a frame, or mathematically, = / . Notably, Equation (1) is only valid for not Intra frame type. The λ value for an Intra frame is decided from Equation (6). The estimation model uses the total cost of an Intra frame from the Hadamard transform from CTUs within the Intra frame, which is then capsulated as the mean absolute difference for each pixel within a frame as in Equation (7). can adequately yield frame characteristic information for for the proposed algorithm that also employs the same manner of as in HM-16.20.
= ( ) 1.2517 For other frame types, the proposed algorithm initially defines as the bpp term at the frame-level calculated based on the proposed bit allocation of the frame-level , described as = Subsequently, and are proposed to utilize the proposed before completing the encoding process to improve the initialization of the model parameters α and β of the existing rate control model parameters. An observation on updating the existing α and β parameters for each frame is conducted based on the existing work [2,3]. The updated α and β are denoted as and computed, as in Equations (9) and (10).
Notably, and denote the estimated α and estimated β, respectively. and are constants, represents the bits per pixel required during the encoding process, and and represent the actual and the estimated λ, respectively.
Inspired by the updating process of α and β in Equations (9) and (10), the value of each frame is investigated experimentally to improve -λ and -λ. Parameters, namely , , , and , in Equations (9) and (10) are disregarded in the experiments to create a new estimation model for and , as in Equations (11) and (12), before the encoding process.
The proposed Equations (11) and (12) aims to improve the estimation of the existing α and β parameters in HM-16.20, denoted as and , respectively. Thus, both parameters are anticipated to strengthen the relationship of α-λ and β-λ based on the proposed visual feature, which can be denoted as -λ and -λ. To this end, improving the relationship between α to λ and β to λ may also enhance the relationship of bpp and λ. To finally be able to do that, a new estimated model for λ based on the proposed visual feature is also required to be adjusted. The proposed is defined as: where is introduced to perpetuate for preserving the current frame visual quality according to its temporal layer characteristic. −1 stands for the standard deviation of the original frame in the temporal layer ID − 1, is the given base QP value, and denotes the initial QP frame of the current . Notably, is determined based on previous work in [22]. This modification of indicates the QP decision of the existing rate control model based on the proposed , defined as In general, QP decides the quantization step after the transformation determining each predicting mode distortion level and the residual after quantization. The proposed algorithm also uses two constant values in Equation (15), 4.2005 and 13.7122. It was found that both existing values provide the best coding efficiency for the proposed algorithm. However, the QP value generated by the proposed is forecasted based on the proposed . In addition, we added to the formulation of to bias the default value 13.7122 in HM-16.20. To validate the proposed performance, the proposed formula in Equation (15) can also be represented as where QP denote the QP formulation with without including . Then, Figure 5 is provided to illustrate the performance of the proposed in terms of BD-BR-PSNR and BD-BR-MSSSIM, resulting in -2.70% and -22.99%, respectively. To validate the proposed , different formulations of are observed to adjust the proposed , such as = − and = + 0.5 + . It shows that the other formulas tend to worsen BD-BR-MSSSIM while strengthening the BD-BR-PSRN performance. Notably, the proposed algorithm is aimed to maintain or even improve the visual quality of a compressed video. Therefore, BD-BR-MSSSIM is best considered for the proposed algorithm. A positive BD-BR-MSSSIM score implies a low performance of the proposed. In addition, we tried to use the original QP estimation as in HM-16.20 for the proposed algorithm, which resulted in a lower BD-BR-MSSSIM score than the proposed . However, the proposed algorithm still exhibited significant improvements in terms of objective and subjective quality. To further evaluate the effectiveness of the proposed , Figure 6 depicts the curve of the relationship. The 2 coefficients in this figure display a strong relationship. Notably, the "BasketballPass" test sequence is used to visualize this result coded with different QP settings. VOLUME XX, 2017 9

E. EFFECTIVENESS OF THE PROPOSED RATE CONTROL MODEL
After proposing the estimation models for bit allocation at the GOP-and frame-level, , , , , and , the effectiveness of those models should be evaluated by comparing them with the existing estimation models in HM-16.20. To reiterate, it is crucial to inspect the relationship of α to λ and β to λ to determine a better relationship of bpp to λ. Table 2 tabulates the aforementioned relationship comparisons. The following test sequences are employed: (A) "Kimono", (B) "BasketballDrive", (C) "BQMall", (D) "PartyScene", (E) "BQSquare", and (F) "BasketballPass".  First, the estimated α, β, and λ of the first 17 frames of the test sequence are collected. The correlations of each estimated α and β parameter with the λ values of the tested frames are observed using the Pearson product-moment correlation coefficients. From Table 2, the proposed algorithm tends to yield a stronger relationship of and than the existing and -.In addition, it is reasonable to mention that the existing and relationships are critical to the quality of the bpp-λ relationship. In particular, when POC=0, and are set to their default values, observation shows no correlation to the estimated λ. Inaccurate and on the first frame led to the miscalculation of the allocating bits, estimating λ, worsening distortion, and decreasing visual qualityall these results may affect the following consecutive frames. To confirm the quality of the bpp-λ relationship based on the proposed one, Figure 7 shows some comparisons between the and relationship. The "BQMall" and "BQSquare" test sequences are coded to display this analysis. This figure portraits a stronger bpp-λ relationship generated by the proposed than that in HM-16.20. The 2 coefficient is used for the determination with a value in the

IV. EXPERIMENTAL RESULTS
The evaluation of the proposed rate control algorithm was assessed based on conditions defined in Table 4. Specifically, the proposed algorithm was evaluated under the common test conditions of HEVC [53] to assess bitrate accuracy, coding efficiency, objective visual quality under the peak signal-tonoise ratio (PSNR), and the multi-scale structural similarity (MSSSIM) metrics, and the subjective evaluation. We compared the proposed algorithm with other rate control algorithms in HM-16.20, initially from the R-λ model [2] and Li et al. algorithm called model parameter estimation (MPE) [54] under the same experimental conditions.

A. OBJECTIVE PERFORMANCE EVALUATIONS
A total of 13 test sequences are encoded for the RA configuration with the GOP size set to 16, which is a typical case in practical applications, using all QP parameters: 22, 27, 32, and 37. Subsequently, all the experiments for each QP were summarized by averaging the results of every test sequence. Notably, the bitrate accuracy BA for this evaluation is below.
Based on Equation (17), a lower BA score indicates a better performance. TB represents the given target bitrate, and AB is for the actual generated bitrate. BA is to check how accurately a rate control model can satisfy the given TB. Regarding the coding efficiency performance (BD-BR), the evaluation was obtained to measure the bitrate reduction by both proposed and conventional algorithms. In addition, it is calculated while maintaining the same video quality measured by the PSNR and MSSSIM metrics, denoted as BD-BR-PSNR and BD-BR-MSSSIM. A negative value of BD-BR implies an improvement of BD-BR. For the objective visual quality, the assessments were applied by observing the difference between the generated PSNR and MSSSIM metrics of the proposed algorithm, called as and , respectively, compared with other algorithms, and , described as ∆ and ∆ provide the difference values of the objective visual quality from the PSNR and MSSSIM metrics, respectively. A positive result from Equations (18) and (19) advises a higher visual quality than other algorithms As shown in Table 5, the proposed rate control algorithm conveys a better bitrate accuracy than the existing rate control models, with approximately 3.23% bitrate accuracy on average. Both the tested conventional algorithms slightly suffer in terms of bit accuracy by approximately 5%. The objective quality of the proposed algorithm also yields higher PSNR and MSSSIM scores than the tested traditional algorithms. It is observed that the proposed algorithm demonstrates 34.90 dB and 0.97644 for the PSNR and MSSSIM metrics on average, respectively. In addition, the proposed algorithm portraits consistent improvements in objective quality in almost all the test sequences. Table 6 confirms the quality difference of PSNR and MSSSIM for each class test sequence of the proposed rate control algorithm against the MPE and R-λ models. The proposed algorithm also yields better bitrate accuracy than the MPE algorithm by approximately -3.88%, BD-BR-PSNR by -7.85%, and BD-BR-MSSSIM by -4.39%. It generates better bitrate accuracy on BD-BR-PSNR and -8.74% when compared with the R-λ model. Accordingly, the proposed rate control algorithm based on deep convolutional features can perform better than the other two rate control algorithms.

B. SUBJECTIVE PERFORMANCE EVALUATIONS
For the subjective quality comparison, Figures 8 and 9 depict several regions from different test sequences generated by the proposed algorithm and the R-λ model of HM-16.20. In Figure 8, a visual quality comparison of the reconstructed "BQMall" test sequence from frame number 85 is displayed. The proposed algorithm quality commits to producing slightly higher visual quality subjectively than the one generated by the R-λ model in HM-16.20. In addition, the proposed algorithm can demonstrate a moderately lower bitrate in approximately 452.59 Kbps than the R-λ model at 456.74 Kbps. The proposed algorithm can testify to the high video quality of a compressed video while decreasing the generated bitrates. Accordingly, the proposed algorithm can achieve a higher coding gain than the existing R-λ model in HM-16.20.
In addition, Figure 9 presents a visual quality generated by the proposed and the existing one in the HM software from frame number 98 of the "BasketballPass" test sequence. The proposed algorithm witnesses a slightly higher actual bitrate by approximately 211.33 Kbps than the R-λ model in HM-16.20 at 207.01 Kbps. However, this increased bitrate is also followed by more improvements in the visual quality of the proposed algorithm. Hence, the proposed algorithm can still contribute to the overall coding gain. Additional observations are then provided to confirm these improvements to observe the fluctuation changes of bits, PSNR, MSSSIM, and QP of the "BQMall" and "BasketballPass" test sequences, as shown in Figures 10 and  11, respectively. In these figures, the generated bitrate by the proposed algorithm displays a significant impact on the entire performance of the proposed rate control algorithm. The proposed algorithm can allocate lower bitrates than that in the R-λ model of HM-16.20 at the beginning frames of the test sequences while maintaining relatively similar reconstructed video qualities. This case is also confirmed by the generated PSNR and MSSSIM fluctuation changes of both test sequences. In addition, the induced QP decision comparisons between the proposed algorithm and the R-λ model are depicted. These figures show that the proposed algorithm can result in even more stable QP determination from the first to the last frames of a test sequence. Notably, a stable QP decision is a crucial parameter to indicate the quality of the rate control algorithm as it can measure both bitrate and visual qualities.

C. COMPLEXITY PERFORMANCE EVALUATIONS
The proposed rate control algorithm is designed based on deep learning features to improve the existing rate control overall performance in the R-λ model of HM-16.20 for the RA configuration. Higher time consumption is required to extract the deep-learning-based visual features as a tradeoff with its performance. The complexity is mainly from the original and reconstructed frame visual feature extraction under the VGG-16 network employment. However, the proposed algorithm complexity is still relatively similar to other existing perceptual rate control models [27] and [28]. With the same classes of test sequences used by the proposed algorithm, the algorithms in [27] and [28] require 15.2× and 15.3× more encoding time than the anchor algorithm. Furthermore, the proposed algorithm can produce comparably higher coding gain and visual quality gains than the conventional works in [27] and [28]. It is typical for a perceptually based rate control algorithm to have a higher complexity than those that do not consider any perceptual feature approaches as a tradeoff of their coding performance.
The proposed algorithm still has room for the feature extractions optimization in parallel. Hence, throughput can be enhanced with a parallel machine, such as GPU, that suppresses more encoding time. The proposed algorithm requires 15× encoding time higher than the R-λ model of HM-16.20 or 8× higher than the MPE model. Notably, all the experiments were conducted with the rate control enabled.

D. EVALUATION ON SCENE CHANGE CASES
An additional evaluation of several test sequences with scene change cases was also carried out to evaluate the effectiveness of the proposed algorithm. Scene changes imply that the retention of frames in which frame scene content is significantly different from the previously retained frames. The definition of a scene change is generalized to include the abrupt transitions between shots and gradual transitions between images resulting from the video editing modes and inter-shot changes induced by the camera operations. It is argued that measuring the significance of a change in the content of the video frames is subjective. Figure 12 outlines several test sequences with their scenechanging part. These test sequences are detailed in Table 7.
The proposed algorithm outperforms the BD-BR-PSNR and BD-BR-MSSSIM performances against the HM-16.20 by approximately -6.28% and -10.35%, respectively. This is mainly due to the ability of the proposed algorithm to maintain the generated video quality while reducing the bitrate. To this end, the proposed frame-level QP determination based on [44] should be acknowledged. Therefore, the proposed algorithm has demonstrated better performance for the entire rate control performances than the VOLUME XX, 2017 9 R-λ rate control model in HM-16.20, e.g., maintaining higher bitrate accuracy, maintaining visual quality, and improving the bitrate efficiency for the RA configuration.

E. EVALUATION UNDER THE HYPOTHETICAL REFERENCE DECODER (HRD) CONSTRAINT
Buffer occupancy analysis is essential for any rate control algorithm for overflow and underflow prevention. In the HEVC encoder, such cases can be handled by enabling the HRD constraint along with the R-λ rate control model as discussed in [2] [55][57]. The buffer size, is defined as where and ℎ represent the time delay and bandwidth, respectively. The buffer occupancy ( ) is determined by the current state of the coded picture buffer ( ) and the buffering rate of the current frame being coded ( ), expressed as = + (21) Table 9 shows objective quality comparisons between the anchor HM-16.20 and the proposed algorithm. In BD-BR-PSNR and BD-BR-MSSSIM, the proposed algorithm yields BD-rate saving in approximately -6.16% and -12.30% for "BQTerrace", -5.89% and -7.56% for "BQMall", and "BasketballPass" at -5.55% and -11.84%, respectively. Figure 13 depicts comparisons of the frame-by-frame fluctuation of buffer occupancy fullness, bitrate, PNSR, and MSSSIM between HM-16.20 and the proposed algorithm from the "BQMall" and "BasketballPass" test sequences. For the "BQMall" sequence, although the buffer analysis shows overflow at the last frames, the proposed algorithm saves bitrate by -18.61% or controls the bitrate mismatch much lower than the anchor. In another case, the proposed algorithm maintains buffer occupancy for the "BasketballPass" by preventing overflow/underflow while minimizing the generated bitrates. The generated bitrate keeps lower at 414 Kbps than in HM-16.20 at 427 Kbps. Furthermore, the proposed algorithm exhibits more stable PSNR and MSSSIM fluctuations than HM-16.20, particularly in the last frames of the "BasketballPass" sequence.
Subjectively, the proposed algorithm confirms that the "BasketballPass" reconstructed frames are perceptually higher than the video quality produced by HM-16.20. The visual quality comparisons between the proposed algorithm and HM-16.20 are depicted in Figure 14 at frame number 30, 291, and 297. The right column (frame a, c, and e) contains reconstructed frames by HM-16.20, and the left column (frame b, d, and f) is from the proposed rate control algorithm. In figure (b), the visual quality difference made by the proposed algorithm is negligibly decreased against the one in figure (a) by HM-16.20. However, as the reconstructed frame goes to the end of the sequence, the visual quality generated by HM-16.20 suffers even more due to inaccurate bit allocation problems at the frame-level. As depicted in (c) and (e), the quality of reconstructed frames from HM-16.20 is significantly improved in (d) and (f) by the proposed rate control.    1  10  19  28  37  46  55  64  73  82  91  100  109  118  127  136  145  154  163  172  181  190  199  208  217  226  235  244  253  262  271  280  289 1  10  19  28  37  46  55  64  73  82  91  100  109  118  127  136  145  154  163  172  181  190  199  208  217  226  235  244  253  262  271  280  289

V. CONCLUSIONS
This work proposes an improved rate control algorithm based on a deep convolutional feature for the RA configuration in the HEVC encoder. Note that the current version of the R-λ rate control model is not optimal for the RA configuration setting, especially in its estimation models. Therefore, the proposed algorithm aims to design a novel estimation model for the model parameter estimations (α and β), bit allocation, λ, and QP at the frame-level by considering perceptual characteristic information of a previously coded frame. In this work, the proposed algorithm employs a full-reference visual quality approach by employing a pretrained VGG-16 architecture to extract the visual feature from the original and previously coded frames. The proposed algorithm controls higher bitrate accuracy, thereby improving the coding efficiency and visual quality of the current R-λ rate model in HM-16.20. The proposed algorithm achieves better BD-rate performance on average at -4.39% and -8.74% based on BD-BR-PSNR and BD-BR-MSSSIM, respectively. The proposed algorithm is also robust in scene changes cases. Furthermore, the proposed algorithm demonstrates significant improvements over the HM-16.20 by controlling the overall performance quality and preventing buffer under/overflow when the HRD option is activated. For future work, the proposed algorithm will be adjusted to benefit rate control model for the RA configuration of Versatile Video Coding encoder.