Early Termination of CU Partition Based on Boosting Neural Network for 3D-HEVC Inter-Coding

As an extension of the High Efficiency Video Coding (HEVC) standard, the 3D-HEVC needs to encode multiple texture videos and depth maps components. In the 3D-HEVC inter-coding test model, a large variety of Coding Unit (CU) sizes are adopted to select the one with the lowest Rate-Distortion (RD) cost as the best CU size. This technique provides the highest achievable coding efficiency, but it brings a huge computational complexity which limits 3D-HEVC from practical applications. In this paper, early termination of CU encoding is proposed to reduce the complexity caused by the CU size splitting process. The proposed algorithm is based on CU homogeneity and a boosting neural network clustering algorithm. The algorithm contained three main steps. The first step is for the extraction of various features from the original encoder. Then, the selection of the features, which had a high correlation with CU partition using a machine learning algorithm. In the second step, a boosting neural network model is used for training the selected features to derive the threshold values for our proposed algorithm. In the final step, an efficient early termination of CU splitting is released for texture videos and depth maps based on the extracted thresholds from the training model. The experimental results show that the proposed algorithm reduces a significant encoding time, while the loss in coding efficiency is negligible.


I. INTRODUCTION
With the fast recent development of 3D multimedia and displaying technologies, the three-Dimensional (3D) video systems have evolved due to their real-world visual experience that goes beyond two-Dimensional (2D) videos [1]. In the Multi-View plus Depth (MVD) systems, which consists of multiple texture video and associated depth maps, a small number of captured texture video and its corresponding depth map are coded and the resulting bitstream packets are multiplexed into a 3D video bitstream [2], [3].
3D-HEVC is an extension of the HEVC standard [4], [5], which was developed by the Joint Collaborative Team on 3D Video (JCT-3V). 3D-HEVC introduces many techniques and feature tools to further improve coding efficiency. When considering the inter-coding process, 3D-HEVC adopts a hierarchical coding structure which is one of the most powerful tools to improve the coding efficiency of the 3D-HEVC The associate editor coordinating the review of this manuscript and approving it for publication was Chaker Larabi . encoder [6], [7]. However, this hierarchical coding structure is based on the quad-tree structure of the Coding Unit (CU). As shown in Figure 1, the largest CU is called Coding Tree Unit (CTU) and each CTU is divided into four CUs. Besides, the CU is split recursively into four equally CU sizes. These CU sizes are varying from 64 × 64 to 8 × 8, corresponding to depth levels 0 to 3, respectively [8]. The 3D-HEVC encoder evaluates all CU sizes using the Lagrange multiplier to catch the one with the least Rate Distortion (RD) cost as the best CU split [9]. The RD-cost function is presented as follows: RD − cost = SSE luma + ω chroma ×SSE chroma +λ × B (1) where SSE luma and SSE chroma are the average difference between the current CU and the matching CUs in the luma component and chroma component, respectively, ω chroma is the chroma weighing factor, λ is the Lagrange multiplier, B is bit cost to be considered for 3D-HEVC mode decision. This flexible coding structure of 3D-HEVC contributes significant improvement in coding gain. However, it brings a dramatic increase in encoding complexity because the encoding process of 3D-HEVC needs to explore every single CU from 64 × 64 to 8 × 8 size where the best CU partition must be decided for all possible CU sizes. Therefore, it is required to develop an early termination of CU encoding algorithm to reduce the computational complexity of the 3D-HEVC encoder.
Several fast inter coding approaches have been proposed to reduce the complexity of 3D-HEVC encoder [10]- [15]. In our previous work [10], we proposed a fast CU size decision algorithm based on a machine learning algorithm. Thus, The structure tensor is adopted as an extracted feature to build a binary classification model. This algorithm is used to extract the adaptive splitting values for depth maps and texture CUs. An early termination of CU size and fast merge decision algorithms are proposed in [11] for 3D-HEVC inter-coding. The inter-view correlation is used as a priori information to select the optimal prediction unit (PU) and CU sizes. A fast inter prediction algorithm is proposed in [12] using depth map segmentation. The proposed work is based on the CTU classification into uniform and complex CTUs by dividing the depth map into three parts (foreground, middle-ground, and background) for an early termination of depth map CU encoding. In [13], the authors proposed a fast inter mode decision algorithm to reduce the complexity of the 3D-HEVC coding. The proposed algorithm is based on two mean steps, firstly, an early skip mode decision is built based on the texture correlation of adjacent dependent and base views. Then, the symmetric and asymmetric motion partition modes are checked by selectively skipping according to the texture feature of the coding CU. In [14], a fast inter and intra mode decision algorithm is proposed based on the correlation between the depth and the texture videos, and edge classification. While the texture videos and their associated depth map represent the same scene at the same time instant, the CU sizes and prediction modes rarely used in the corresponding texture video are skipped from the associated depth map views to reduce the 3D-HEVC coding complexity. In [15], the authors proposed a low complexity mode decision algorithm for inter and intra prediction using the depth map changes characteristics to skip the unnecessary depth modes for both inter and intra prediction.
Although, all the existing are efficient to reduce the computational complexity of 3D-HEVC inter-coding. Therefore, there is still some room left for further complexity reduction. In our previous work [10], we proposed a fast CU size decision algorithm using tensor features as homogeneity determination and a machine learning binary classification model. In this paper, we propose an amelioration of the previous work based on an early termination of the CU encoding algorithm by investigating various features in training sets to reduce the complexity of texture videos and depth maps. The CU size decision algorithm is modeled as a data classification problem which is applied to predict whether the current CU split or not to the smaller sizes. The classification is efficiently solved using the AdaBoosting Neural Network (AB-NN) algorithm. The AB-NN algorithm is used in this investigation due to the high accuracy for this kind of problem, given that it can derive the adaptive thresholds for texture VOLUME 10, 2022 and depth maps CUs. The experimental results demonstrate that the proposed algorithm can significantly reduce the computational complexity of 3D-HEVC with negligible coding performance degradation compared to the original encoder.
The remainder of the paper is organized as follows. Section II provides the statistical analysis of CU partitions in 3D-HEVC encoder. Section III introduces the boosting neural network algorithm. The 3D-HEVC features analysis is presented in Section IV. The proposed algorithm is described in Section V. Performance evaluations of the proposed algorithm are shown in Section VI, and the conclusion of this work is provided in Section VII.

II. MOTIVATION AND STATISTICAL ANALYSIS
3D-HEVC adopts a hierarchical coding structure, which is one of the most powerful tools to improve the coding efficiency of the 3D-HEVC encoder [6], [7]. In the joint model of 3D-HEVC, a complex RD optimization process is performed for all the possible CU sizes to find one with the minimum RD cost and determine the best coding size for a CU. Thus, the small CU sizes are likely to be chosen for coding the complex region and small CU sizes are more suitable for coding the homogenous region. However, the depth maps are mainly characterized by a large homogeneous region, where they have a higher probability to be coded using larger CU sizes. Therefore, a proposed algorithm based on the CU homogeneity could skip the RD Cost time-consuming process computed on the CU split process, and then reduce significantly the computational complexity of the 3D-HEVC coding process.
In order to well understand the correlation between the CU complexity and the CU size decision in the 3D-HEVC encoder, we encoded eight experimental videos sequences using the HTM-16.3, Random Access (RA) configurations [16]. The experimental videos are recommended by Common Test Conditions (CTC) [17] using four Quantization Parameters(QPs). The experiments covered the 3D video sequences: ''Balloons'', ''Kendo'' and ''Newspaper'' with a resolution of 1024 × 768, and ''GT_Fly'', ''Poznan_Hall2'', ''Poznan_Street'', ''Undo_Dancer'' and ''Shark'' with a resolution of 1920 × 1088. The four pairs of quantization parameters (QP-pairs) are used to encode texture and depth maps (QP-texture, QP-depth), which are (25,34), (30,39), (35,42) and (40, 45). Table 1 and Table 2 show the distribution of CU sizes for texture videos and depth maps, respectively, according to the four QP-pairs. For texture views, it can be seen from Table 1 that The probability of a CU to be coded with large sizes is very high compared to the small CU sizes for all sequences and all QP-pairs, it is more than 85% on average. While the total percentage of CUs with small CU sizes is less than 15% on average. Furthermore, the probability of choosing the size 64 × 64 is depended on the complexity of the sequence. For ''Shark'' which is characterized by complex regions, the proportion of 64 × 64 is approximately 64% for small QP, and for ''Poznan_Hall2'' which is characterized by homogenous regions, the proportion of 64 × 64 is more than 84% in small QP. For the depth maps, the probability of choosing CU size 64 × 64 is more than 82% for all sequences and 96.16% on average. This occurs because the depth map is characterized by sharp edges and large homogeneous regions differing from typical texture video contents. Therefore, the probability of choosing small sizes is less than 5% on average. It can also be seen that the variety of quantization parameters QP affects the CU size distribution for both texture videos and depth maps. Thus, the QP describes the compression rate by impacting the image quality. High QPs generate more homogeneous areas in the coded image that are efficiently encoded using larger CU sizes. However, with low QPs, the predicted images tend to preserve several details, requiring smaller CU sizes to manage the encoding efficiency. Therefore, if we can decide the CU size and skip the CU splitting process, the coding time will be saved and the computational complexity of the 3D-HEVC encoder can be reduced.

III. BOOSTING NEURAL NETWORK ALGORITHM
A boosting is an approach for developing the performance of learning algorithms. The boosting algorithm is one of the most powerful learning techniques introduced during the past decade. The motivation for the boosting algorithm is to produce a scheme that combines many ''weak'' classifiers [18] such as decision trees and neural networks to achieve a powerful classifier. The most promising boosting algorithm is ''Adaptive Boosting'', namely, AdaBoost, which is introduced by Freund and Schapire [19]. AdaBoost has been applied with large success to manifold benchmark machine learning problems using mainly decision trees as base classifiers [10], [20]. Besides, there is recent evidence that AdaBoost may very well overfit if we combine several hundred thousand classifiers. It also seems that the performance of AdaBoost degrades a lot in the presence of significant amounts of noise [21], [22]. However, to make the AdaBoost model more efficient, many works have already proposed AdaBoost with the neural network algorithm as a weak learner instead of the decision tree in the traditional AdaBoost models [23]- [25]. In [23], the authors reported that the Adaboosting neural network is significantly better than boosted decision trees in terms of accuracy. Although the neural network is reported to be outperformed by some statistical methods in most cases, attempts to improve it have never been stopped [26], [27].
A neural network (or an Artificial Neural Network) [28] is a learning processing model that is inspired by the way of biological nervous systems, such as the brain to treat information. The key element of this model is the novel structure of the information processing system. It is constituted of a large number of highly interconnected processing elements namely, neurons, working in harmony to solve specific problems. The most popular type of neural network is composed of three layers of units: input layers, hidden layers, and output layers as shown in Figure 2. The input layer is connected to hidden layer, which is also linked to the output layer. The input layer activity is the raw information that is introduced into the network. The activity of each hidden unit is given by the activities of the input units and the weights (W ij ) over the connections between the input and the hidden units. Therefore, the transfer and the activation functions translate the input signals to output signals. The threshold value of our model is extracted from the activation function, in which the output is set at one of two levels, according to the fact that the total input is greater than or less than some threshold value. The performance of the output units is affected by the activity of the hidden units and the weights between the hidden and output units.
In this paper, we consider the neural network to be the base classifier of Adaboost model, namely, AdaBoosting Neural Network(AB-NN). We are given a training data set S = {(x i , y i ), . . . , (x N , y N )}, which x i the input features and y i = {−1,1} corresponding to the output. In the AB-NN model, each sample in S is assigned an equal weight of 1/N, which means that each sample has the same opportunity to be selected at the first step. Generating T neural network classifiers for the AdaBoost model need T rounds of training neural network with T different training sample groups S t (t = 1, 2, . . . , T). In round t, the function to determine the weight of sample i is denoted by D t (i). In each round after the construction of the classifier AB-NN which provides a function h t to map x to {−1,1}, the value of D t (i) is adjusted in terms of how they are classified by the classifier AB-NN and the training sample group S t+1 is then generated in terms of D t on S with sample replacement. VOLUME 10, 2022 The algorithm maintains a weight distribution D t (i) over the data points. The weights are updated in each iteration. The goal of the base learner is to minimize the weighted error as follows: where θ t (x i ) is a node in the output layer which indicate the threshold value that a CU can split or no into next depth levels. in this end, the output function h(x) can be defined as follows: The final decision function for AB-NN algorithm is defined in Eq. 4, which βt is obtained by Eq. 5.
The details and the pseudo-code of the AB-NN model is described in Algorithm 1. Since, 0 < t <0.5, so 0<β t <1, the weights of the correctly classified samples with idea output is reduced by βt, and weights of those misclassified samples will have no change.

IV. 3D-HEVC CODING FEATURES ANALYSIS A. FIRST AND SECOND ORDER STATISTICS FEATURES EXTRACTION
The precision of the CU size decision in a classification task is highly dependent on the feature space used to train the model. In most of the machine learning algorithms adopted in CU size decision of 3D-video coding, the features extracted from a CU are always statistic criteria, such as variance, structure tensor and gradient [7], [10], [29], [30]. In this paper, we are training two types of features measure. the first order and the second order features. The first order features which based on central moments [31], the texture measures are statistics calculated from an individual pixel and do not consider pixel neighbor relationships. In the second order or Gray Level Co-occurrence Matrix (GLCM) features, we consider the relationship between neighbors [32], [33].

1) FIRST ORDER FEATURES
First order features are based on statistical characteristics calculated from each CU. For a pixel I(i, j) in a luminance domain, the Mean m 1 and Central Moments µ k for each CU are computed as follows: where k = 2, 3 and 4, and N corresponds to the number of pixels for each CU size.
In this study, we are using the most frequently moments, which are the variance σ 2 , the skewness and the kurtosis based on the central moments µ 2 , µ 3 , and µ 4 respectively. These features are called the normalized k-central moment and they are calculated as follows: Compute the coefficient β t using Eq. (5) Update the weight function D t (i): Normalize the weight function D t+1 (i): 2) SECOND ORDER FEATURES The features generated from the first order provide information related to the gray-level distribution of the region. However, they are not given any information about the relative positions of the various gray levels within the region. These features will not be able to measure whether all low-value gray levels are positioned together, or they are interchanged with the high-value gray levels, especially depth maps CUs.
For an analyzed image, the texture information is specified by the matrix of joint probability with two pixels separated by a distance of d along direction θ with gray levels i and j. We suppose the current CU is a matrix (donated as R) with N x columns and N y rows, G is the corresponding GLCM. Each pixel has eight nearest-neighbors connected to it, horizontal 0 • , vertical 90 • , right 45 • and left-diagonal 135 • directions, and their four contrary directions, as illustrated in Figure 3. It can be seen from the Figure 3 that Pixels 2 and 6 are 0 • nearest pixels to pixel X, pixels 1 and 5 are 45 • nearest neighbors, pixels 4 and 8 are 90 • nearest neighbors, and pixels 3 and 7 are 135 • nearest neighbors to pixel X. In this work, we calculate the GLCM for each pixel in the direction 0 • only. where # represents the number of elements in the set, (k, l) and (m, n) are the coordinates in the pixel matrix R, and D = (N x × N y ) × (N x × N y ).
The GLCM for the direction 0 • is calculated as follows: where R(m, n) and R(k, l) are the gray values in the matrix R. Finally, the GLCM matrix G can be represented as Eq.13, shown at the bottom of the page, where W is the gray level. To reduce the computational overhead, the GLCM matrix is calculated for θ = 0 • and d = 1 and also the W is set to 8. Although, the luminance value range of the current CU is from 0 to 255, so each pixel need to be divided by 32.
In this study, four GLCMs features; Homogeneity, Contrast, Entropy and Angular Second Moment(ASM), were implemented. The mathematical description of these features are given in [32] and they are calculated as follows: where G(i, j) is the GLCM matrix when d = 1 and θ = 0 • . Homogeneity is a measure that takes high values for lowcontrast images, Contrast is a measure of local level variations which takes high values for image of high contrast, Entropy is a measure of randomness and takes low values for smooth images and ASM is a feature that measures the smoothness of the image. The less smooth the region is, the more uniformly distributed G(i, j) and the lower will be the value of ASM. All these features together provide high discriminative power to describe the complexity of a region.

B. 3D-HEVC FEATURES SELECTION
In this work, we are interested in reducing the encoding complexity of inter-prediction frames. To train our model, we encode five training video sequences with different resolutions (Non-CTC sequences). These video sequences are ''Akko&Kayo'' and ''Rena'' with a resolution of 640 × 448. ''Pantomime'', ''Dog'' and ''Champagne_tower'' with a resolution of 1220 × 960. The training dataset is collected from the first 100 frames of each sequence. We extract the first and second features for each CU and the splitting decision information. The splitting decision takes ''0'' when the CU is coded with the current size while taking the value ''1'' if the CU splits to the smaller sizes. We extracted these features for texture and depth maps separately for the four QP-pairs: (25,34), (30,39), (35,42) and (40, 45), in which the RA configuration was considered. Although, the training  dataset is composed of seven features and the splitting flag information.
To assess how each feature contributes to the CU partitioning decision, the Information Gain (IG) [34], [35] is used for this investigation. Among all evaluated features, some of them were selected, considering the correlation of each feature with the CU split decision. The algorithm applied, which is based on the Information Gain (IG), defines the most relevant features for dealing with the split decision. IG refers to the difference between the entropy of all data set and the entropy of the subset of the evaluated attribute. The Waikato Environment for Knowledge Analysis (WEKA) [36], version 3.8.5, was used to calculate the IG for the selected features. WEKA generated three datasets for texture and three datasets for depth maps. Each dataset is corresponding to a CU size, 64 × 64, 32 × 32 and 16 × 16, and it is composed of the seven training features and the splitting information flag. Considering the IG of the analyzed attributes associated with the CU split decision. Table 3 shows the IG of all features for CU sizes 64 × 64, 32 × 32 and 16 × 16. Figure 4 shows the average of IG for the tree CU sizes for texture and depth maps. It can be seen from Table 3 and Figure 4, that in texture CUs, the Homogeneity is ranked the first and the Contrast the second. Unless, in depth map CUs, the Contrast is ranked the first and the Homogeneity the second. To this end, the two GLCM features, Homogeneity and Contrast are selected for the AB-NN model.

V. THE EARLY TERMINATION OF CU ENCODING ALGORITHM
In this work, we proposed an early termination of CU encoding for texture videos and depth maps, in which this proposed approach can reduce the complexity of 3D-HEVC intercoding. Our proposed method is based on the selected GLCM features in Section IV, and the AB-NN for training model which is done off-line. Thus, the training model is composed of three datasets for texture and three datasets for depth maps. Each dataset is corresponding to CU sizes 64 × 64, 32 × 32 and 16 × 16, and it is composed of the selected features Homogeneity and Contrast calculated by Eqs. (14) and (15), respectively, and the splitting decision information. The AB-NN algorithm is used to extract the threshold vector θ j for each dataset corresponding the selected features as described in Algorithm (1). Figure 5 shows the methodology for design and evaluation flow of our proposed algorithm. The first step is the extraction VOLUME 10, 2022 of features from the 3D-HEVC encoder using the training sequences. In the next step, the extracted features are filtering using WEKA and IG to select the ones with a high correlation with the CU partition. The selected features are using to build a training model using the AB-NN model and extract the threshold values for our proposed algorithm. Finally, the evaluation flows step is for performing our proposed algorithm using the extracted thresholds and compared the results with HTM 16.3 [16].
To simplify the use of the selected features for our proposed algorithm, we calculated in Eq.18, ω CU , the distance between the selected features.
where H CU is the Homogeneity of each CU computed using Eq. 14, and H CU is the Contrast of a CU calculated using Eq. 15. Therefore, the threshold value of the proposed algorithm is calculated in the same way using the extracted thresholds by AB-NN. The early termination of CU splitting (T enc ) is described as follows: where Th V S is the threshold value, in which S parameter takes the values 0, 1, and 2 according to CU sizes, 64 ×64, 32×32, and 16 × 16, respectively. The parameter V takes T for texture CUs and D for depth map CUs. The value of the threshold Th V S , which is extracted from the AB-NN model explained in Section III, changes depending on the current view (texture or depth map), CU size, and the Quantization Parameters (QPs). If ω CU smaller than Th V S , then, the current CU can be terminated early and coded with current size. Other than that, if ω CU is larger than the selected Th V S , the current CU splits to the smaller sizes. The proposed overall algorithm is given in Algorithm 2.
It can be seen from Algorithm 2 that the isDepth flag is used for checking the texture and depth map CUs, and Th 0 , Th 1 , and Th 2 are the threshold values for CU sizes, 64 × 64, 32 × 32, and 16 × 16, respectively. Firstly, the GLCM filter is applied for each pixel, then, the checking process for texture and depth map CUs is started, in which, each incoming depth map CU, Th 0 , Th 1 , and Th 2 take the depth map thresholds Th D 0 , Th D 1 , and Th D 2 , respectively. And for each texture CU, Th 0 , Th 1 , and Th 2 take the texture thresholds Th T 0 , Th T 1 , and Th T 2 , respectively. Then, the splitting process is started and the ω CU is calculated using Eq.(18) depending on the CU sizes. ω 0 , ω 1 , and ω 2 are the ω CU according to CU sizes, 64 × 64, 32 × 32, and 16 × 16, respectively. Finally, an early termination of CU size is established to judge whether if the current CU must be or not be split into smaller sizes.

VI. EXPERIMENTAL RESULTS
To verify the efficiency of the early termination of the CU encoding algorithm, the proposed algorithm has been implemented on the recent 3D-HEVC reference software    by JCT-3V using eight video test sequences presented as follows: ''Kendo'', ''Balloons'' and ''Newspaper'' with a resolution of 1024 × 768. ''GT_Fly'', ''Poznan_Hall2'', ''Poz-nan_Street'', ''Undo_Dancer'' and ''Shark'' with a resolution of 1920 × 1088. The 3-view case in CTC was used. In this case, for each sequence, 3 texture cameras and 3 associated depth maps were encoded using the random access configuration. Table 5 recapitulates the details of the CTC test sequences. The test platform is an Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30 GHz with 8 GB RAM and a Microsoft VS C++ 2015 compiler. Table 5 presents the experimental results of the early termination of the CU encoding algorithm for the texture and depth maps in the 3D-HEVC encoder. We evaluate our proposed algorithm in respect to the Bjontegaard delta-rate (BD-BR, BD-PSNR) [37], [38] considering the quality of the synthesized views using the (VSRS) algorithm provided by JCT-3V [39]. TS represents the Time Savings of the entire encoder (texture and depth maps) defined as follows: where ET Original represents the encoding time of the original HTM-16.3 encoder, and ET Proposed is the encoding time of the proposed algorithm. It can be seen from Table 5 that the proposed algorithm can saves a considerable encoding time for texture and depth maps, and it can provide a similar performance for all test sequences compared to the original 3D-HEVC encoder. The proposed algorithm decreases the encoding time from  Meanwhile, the average decrease of BD-PSNR is 0.01 dB for depth maps and 0.02 dB for texture videos. Moreover, the average BD-BR increase is 0.31% for texture videos and 0.22% for depth maps. The results indicated in Table 5 show that the proposed algorithm based on the AB-NN and GLCM features can avoid unnecessary CU size in 3D-HEVC intercoding with negligible loss of coding efficiency. Figure 6 illustrates more detailed experiment results of the proposed algorithm compared to the original 3D-HEVC for four typical sequences: Balloons (1024 × 768), Newspaper (1024 × 768), GT_Fly (1920 × 1088), and Poznan_Hall2 (1920 × 1088) for texture and synthesized views. It can be seen from Figure 6 that the proposed algorithm achieves better RD performance for the four test sequences. Table 6 compares the proposed algorithm with the views synthesis performance in related works [10]- [15] that also focus on RA configuration. It can be seen from Table 6 that the proposed algorithm can greatly reduce the encoding complexity compared to the inter-coding related works.
When comparing the proposed algorithm with our past work [10], which is focused on a CU size decision on 3D-HEVC inter-coding, it can be seen that our previous work can reduce the encoding time up to 37.13% on average, which is less than the 40.25% in our proposed work, and a 0.34% increase of synthesis BD-BR for HTM-16.2.
Concerning the work proposed in [11], the BD-BR is negligible. Meanwhile, our proposed algorithm performs a better gain in encoding speed. Therefore, it achieves only 24.10% on average relative to HTM-16.2. Thus, our method can save more overall encoding time with the BD-BR that slightly increases in the synthesized views.
By comparing with the work in [12], our proposed algorithm reduces greatly the complexity of the 3D-HEVC encoder. It can be seen that the authors do not present the results of all test sequences, and their work achieves only a 16.42% time savings on average in correlation with HTM-15.1, with a 0.33% synthesis BD-BR increase.
In [13], the proposed algorithm reduces only 18.71% on average of the encoding time in correlation with HTM-16.0, indicating that our proposed work is better in terms of performance, and BD-BR increase is approximately the same.
In comparison with the approach in [14], the proposed algorithm achieves the same encoding efficiency with 0.22% BD-BR increase in the synthesized views. Thus, the algorithm presented in [14] achieves a 19.90% reduction in encoding time relative to HTM-16.0 which is less than half of our results.
Compared with the work in [15], the proposed work achieves a 23.40% time savings on average and it has been implemented in the HTM-8.0 version, whereas the method proposed in the present work was implemented on an HTM-16.3 version, which fully implements the final 3D-HEVC standard. Furthermore, in our proposed algorithm, the coding efficiency loss is better, where the average of BD-BR increase in [15] is approximately 0.58%. Figure 7 shows the time saving comparison of the proposed algorithm compared to the related works in texture videos,  depth maps, and both, separately for two typical sequences: Balloons (1024 × 768) and Poznan_Street (1920 × 1088). It can be seen from Figure 7 that the proposed algorithm achieves better time saving for the two test sequences compared to the related works in texture views, depth maps views and the both components. Table 7 presents the comparison of our proposed approach and the related works. The performance of our proposed algorithm in terms of time-saving and coding efficiency is exceptionally high as both texture video and depth maps components are fully employed and an efficient machine learning model is used in this work for early termination of CU encoding in 3D-HEVC.

VII. CONCLUSION
In this paper, we proposed an early termination of the CU encoding algorithm for reducing the complexity of 3D-HEVC texture and depth map inter-coding. The proposed approach is based on first and GLCM features and a boosting neural network for the training model. After extracting the features from the original encoder, we apply the IG model to select the ones with high correlation with the CU partition. The selected features are used in the AB-NN training model to find the suitable thresholds for texture and depth maps. The proposed algorithm can avoid unnecessary CU sizes in the 3D-HEVC encoder. Experimental results show that the proposed approach achieves considerable encoding time savings for 3D-HEVC inter-coding while maintaining negligible coding performance degradation.