Fast Texture Coding Based on Spatial, Temporal and Inter-View Correlations for 3D Video Coding

3D-HEVC (The 3D Extension of High Efficiency Video Coding), the latest international standard for 3D video coding, supports multiview plus depth 3D video format to enrich multimedia applications. For the texture coding, 3D-HEVC utilizes not only the information of temporal and spatial domains but also that of inter-view domain. However, the time consumption and complexity of 3D-HEVC also increase significantly. In this paper, fast texture coding algorithm for 3D-HEVC is proposed. We individually calculate the Pearson correlation coefficients by the rate-distortion costs (RD-costs) of coding tree units (CTUs) in the temporal, spatial and inter-view domains to analyze the correlations for independent view and dependent view. The proposed coding algorithm is based on the coding information of CTUs with higher correlations. The fast algorithm predicts and dynamically adjusts the depth range of the coding unit (CU). The prediction unit (PU) mode decision is proposed according to the complexity and partition direction of the best PU modes obtained from the highly correlated CTUs. The search range is adaptively adjusted for motion estimation. In addition, the RD-cost threshold is estimated to early terminate the CU split. Experimental results show that the proposed fast texture coding algorithm reduces the texture coding time significantly and outperforms numerous previous works.


I. INTRODUCTION
With advanced multimedia technology, video has been widely used in various aspects. Compared with a typical 2D display, people may be more interested in the immersive user experience of a 3D display. In recent years, 3D video applications related to free viewpoint television (FTV) [1], [2], 360 • panoramic video, virtual reality (VR) [3] and augmented reality (AR) [4] are becoming widespread. These applications are very popular in commercial advertising, education, multimedia entertainment, etc. In response to the tendency, the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V) of ITU-T Video Coding Experts The associate editor coordinating the review of this manuscript and approving it for publication was Gangyi Jiang. Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG) establishes the video coding standard for 3D videos, as known as the 3D extension of high efficiency video coding (3D-HEVC) [5], [6].
3D-HEVC evolves from multi-view video coding (MVC) [7] and multi-view plus depth (MVD) [8] coding. Multiview-HEVC (MV-HEVC) [9] is the extension of HEVC in multi-view structure. Multi-view videos can be demonstrated on FTV or 3D stereo display, and the viewers are able to watch the scenes from different angles or enjoy the 3D experience. MV-HEVC effectively encodes the videos from multiple viewpoints by removing the inter-view redundancy. However, the more views, the more data are required. As a result, the 3D video format evolves to MVD. Depth map, a gray level image, uses greyscale to represent the relative VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. System diagram for 3D-HEVC [6].
distance between the camera and the objects in a scene. Through depth map and depth-image-based rendering (DIBR) [10] technique, numerous virtual views can be synthesized, which greatly reduces the required captured views. Figure 1 depicts the system diagram of 3D-HEVC. At the encoder side, the input data are multi-view color textures with the associated depth maps. After being compressed by 3D-HEVC encoder, the encoded bitstream can be extracted according to the backend application at the decoder side. For instance, we can project multiple virtual views on FTV via DIBR if the bitstream is completely decoded. Also, the bitstream can be partially decoded to perform 3D stereo display or typical 2D display. Compared to HEVC encoder, 3D-HEVC efficiently encodes the MVD videos by not only reducing the spatial and temporal redundancy, but also thoroughly exploring the inter-view (between views) and inter-component (between color texture and depth map) relationships. The previously encoded components or views are all able to be the reference sources of the current coding component.
Coding unit (CU) is the basic element in HEVC and 3D-HEVC. Before starting the coding process, an input frame is divided into multiple non-overlapped coding tree units (CTUs) by the size of 64 × 64. A CTU can be further split into CUs with various sizes from Depth 0 (64 × 64) to Depth 3 (8 × 8). The CU split is based on quadtree structure, which performs a top-down strategy to split the CTU into every possible CU size to find the best combination of the CU partitions with the minimum rate-distortion cost (RD-cost). Besides, each CU contains at least one prediction unit (PU) and one transform unit (TU). PU partition modes deal with different texture variations. PU only allows square partitions (2N × 2N and N × N) for intra prediction. For inter-frame prediction, in addition to 2N × 2N and N × N, PU also supports symmetric mode partition (SMP; 2N × N and N × 2N) and asymmetric mode partition (AMP; 2N × nU, 2N × nD, nL × 2N and nR × 2N). Similar to HEVC, 3D-HEVC also reduces the inter-frame redundancy by performing motion compensated prediction (MCP) on temporal reference frames within a given search range (SR) to explore the best matching blocks. In addition, when coding the dependent views, the independent view has already been encoded and is able to be the reference source. As a result, 3D-HEVC can also implement disparity compensated prediction (DCP) to find the best matching block at the independent view. In summary, MCP refers to the inter-frame prediction by using the already encoded temporal frames of the same view, while DCP indicates the inter-view prediction by using the already encoded frames of the reference views in same access unit.
The fast algorithms to accelerate the coding of 3D-HEVC has been investigated lastingly [11]- [14]. Several mechanisms have been proposed for fast color texture coding. Heo et al. [15] early terminate the construction of the Merge list by checking the most probable candidates and the motion data of temporally collocated PU. Ding et al. [16] determine the range of the CU depth level by analyzing the weighted motion homogeneity of spatial, temporal and interview coded tree-blocks. Song and Jia [17] propose an early Merge mode decision for the dependent view by considering the encoded PU modes of inter-view reference CTU and parent CU. Zhang et al. [18] acquire the RD-costs of the spatial, temporal and inter-view neighboring CTUs. Then, a RD-cost threshold for the early termination of mode decision process is computed by the weighted RD-costs from these neighboring CTUs. Pan et al. [19] adaptively reduce the spatial and temporal MV candidates for the PU with the Merge mode. For the inter mode PU, the combination of spatial and temporal candidates is utilized to early determine the final MV. Zhang et al. [20] use the motion homogeneity model to propose a fast texture video coding method including the fast depth level range determination, early SKIP/Merge mode decision and adaptive motion search range adjustment. Bakkouri and Elyousf [21] use the features based on CU homogeneity as the input for Fuzzy C-Means clustering algorithm to decide the quadtree splitting of CU and choose the thresholds for early termination of CU partition. Zhang et al. [22] propose fast depth level, early SKIP/Merge mode and adaptive early termination based on the correlations of inter-view, spatio-temporal, and texture-depth to decide to skip some treeblocks of texture and depth map at the early stage. This algorithm applies to both texture and depth map, and significantly reduces the encoding time. Avila et al. [23] take advantage of the depth map information to calculate the sample average (SA) and sample gradient (SG) for the decision of background and foreground-ROI. Then, the CU split is early terminated if current CU is in non-ROI region. Bakkouri et al. [24] characterize the homogeneity of a CU by calculating the gradient vectors and the structure tensor. Then, a fast algorithm combining early termination of CU split and mode prediction process is accomplished for both the color texture and depth map. Chen et al. [25] analyze inter-view correlation by observing the best mode distribution for the independent view and dependent view. The early Skip mode decision prejudges whether Skip is the best mode by referring the inter-view neighboring CTUs. Also, the inter SMP and AMP modes are selectively disabled by checking the difference of two adjacent pixels in main diagonal of the CU. Li et al. [26] jointly exploit the priori and posterior probability models to early verify whether Merge mode is the best or not for the dependent view.
Although the fast algorithms for HEVC are getting mature, the time-saving and coding performance are still limited if we just duplicate the strategies from HEVC to 3D-HEVC. The reason is that the properties of the multi-view structure are not fully considered. Moreover, when designing the fast algorithms for the color texture coding of 3D-HEVC, we should be cautious that the coding efficiency should be strictly maintained because the coding results of each view or each component interact with each other. Otherwise, the prediction error may propagate to the depth map, and even the synthesized view. Some works take all the spatial, temporal and inter-view correlations into consideration by using the weighting factors [16], [18], [20], [22]. However, we do not know the appropriate priority of the information. These weighted factors vary from different situations. Sometimes, the critical information may be ignored. Based on the above discussions, this paper considerably deliberates upon spatial, temporal, and inter-view relationships by calculating the Pearson correlation firstly. Secondly, we extract the essential reference information with the higher correlation. Then, the CU depth range prediction and adjustment, fast inter PU mode decision, search range adjustment and CU early termination based on RD-cost estimation are proposed.

II. PROPOSED METHOD
A. ANALYSIS OF SPATIAL, TEMPORAL AND INTER-VIEW CORRELATIONS 1) PEARSON CORRELATION COEFFICIENT Figure 2 demonstrates the RD-cost distributions for the dependent view at the 6-th frame for Kendo sequence, including the current coding frame, forward-reference frame, VOLUME 9, 2021 backward-reference frame and inter-view frame. According to Figure 2(a) (b) (c) (d), the RD-cost distributions are quite similar. Moreover, the overlapped RD-cost distributions in Figure 2(e) verify the RD-cost dependencies between the reference frames. As a result, we utilize the RD-costs of the neighboring CTUs to calculate the Pearson correlation coefficients and analyze the spatial, temporal and inter-view relationships.
Eq. (1) formulates the Pearson correlation coefficient (ρ X ,Y ), which is commonly used in statistics to investigate the linear relationship between two sets of variables (X and Y ). The variables x i and y i represent the i-th element in the set X and set Y , respectively. The variable n indicates the number of the components in the set. The value of ρ X ,Y ranges from −1 to 1. The larger the ρ X ,Y is, the more relative the two sets.
When encoding independent view, the reference sources are only spatial and temporal domains; when encoding dependent view, inter-view domain is also available. Thus, the analysis of the independent view and dependent view are performed individually.

2) INDEPENDENT VIEW
We investigate the spatial and temporal relationships for the analysis of the independent view coding. In terms of spatial relationship, we analyze the correlations between the current coding CTU and the left (L), upper (U), upper-left (UL) and upper-right (UR) CTUs. In terms of the temporal relationship, the correlations between the current coding CTU and the forward-reference collocated (C0) CTU as well as the backward-reference collocated (C1) CTU are inspected. Figure 3 specifies all the reference CTUs for the spatial and temporal correlation analyses of the independent view. We collect the RD-costs (RDs) of these reference CTUs and calculate the Pearson correlation coefficient by Eq. (1) for each reference neighboring location. Eq. (2) clusters the sets for the calculation of the spatial correlation (X I _S and Y I _Sα ), and Eq. (3) groups the sets for the calculation of the temporal correlation (X I _T and Y I _T β ). Besides, the RD-cost of the current coding CTU is unknown, so RD I _C_avg is added into Eqs. (2) and (3) to augment the sample sets. RD I _C_avg is obtained by averaging the RD-costs of the forward-reference (RD I _C0 ) and the backward-reference (RD I _C1 ) CTUs. Finally, there are four spatial correlation coefficients and two temporal correlation coefficients as in the coefficient set (ρ I _ST X ,Y ), which is formulated in Eq. (4).

3) DEPENDENT VIEW
At the same access unit, the independent view is already coded before dependent view coding, so the inter-view relationship can be also taken into consideration. For the coding of the dependent view, the correlation analyses of the spatial and temporal relationships are inherited from the way of the independent view. The correlations between the current coding CTU and the left (L), upper (U), upper-left (UL), upper-right (UR), forward-reference collocated (C0), and backward-reference collocated (C1) CTUs are also explored by acquiring the RD-costs (RDs) of these reference CTUs and calculating the Pearson correlation coefficients by Eq. (1). The sample sets for the calculation of the spatial correlation (X D_S and Y D_Sα ) are grouped in Eq. (5), while the sample sets for the calculation of temporal correlation (X D_T and Y D_T β ) are clustered in Eq. (6).
There are still slight differences from independent view. In addition to the averaged RD-cost (RD D_C_avg ) from the temporal collocated CTUs (RD D_C0 and RD D_C1 ), we shift the current coding CTU from the dependent view to the independent view by using the depth oriented neighboring block based disparity vector (DoNBDV) to acquire the inter-view corresponding CTU (IV_C) and the inter-view corresponding spatial neighboring CTUs. Then, the RD-costs of these inter-view corresponding CTUs are added into the sets for the calculation of the spatial correlation as listed in Eq. (5). All the reference CTUs for the spatial and temporal correlation analyses of the dependent view are specified in Figure 4.
Regarding the inter-view relationship, as shown in Figure 5, we acquire the collocated (C), left (L), upper (U), right (R) and bottom (B) reference CTUs in the forward (list_0) and backward (list_1) directions. In addition, we shift these temporal neighboring CTUs from the dependent view to the independent view by DoNBDV to obtain the inter-view corresponding CTUs. Finally, the RD-costs of these inter-view reference corresponding CTUs are collected and the calculation of the inter-view correlation is also conducted by Eq. (1), which the sample sets (X D_IV and Y D_IV ) are enumerated in Eq. (7). In summary, there are four spatial correlation coefficients, two temporal correlation coefficients, and one inter-view correlation coefficients as in the coefficient set (ρ D_STIV X ,Y ), which is formulated in Eq. (8).

4) EXTRACTION OF THE REFERENCE INFORMATION
In this paper, we want to utilize the encoded information from left, upper, upper-left, upper-right, forward collocated, backward collocated and inter-view corresponding CTUs to accelerate the entire coding process either in CU or PU level. Instead of coarsely adopting all the information from these reference CTUs, we only choose the critical information from the CTUs with the higher correlations based on the correlation analysis. The inaccurate or unrelated reference information can be dropped out to avoid the decision error when performing the proposed fast algorithm.
After the calculations of the Pearson correlation coefficients, we sort the elements in the coefficient set (ρ I _ST X ,Y for the independent view and ρ D_STIV X ,Y for the dependent view). The encoded information is extracted from the top M CTUs with the highest correlations for the fundamental of the following proposed algorithms. To decide the appropriate quantity of the reference CTUs, we investigate the correlation coefficients by encoding the test sequences (Kendo, Balloons, Newspaper, Poznan_Street, Poznan_Hall2, Shark, Undo_Dancer, GT_Fly). Table 1 and Table 2 tabulate the correlation rankings of the independent view and dependent view, respectively. We can notice that the third highest correlations of both independent view and dependent view have been downed to around 0.6. In addition, there is an obvious   gap between the third and the fourth highest correlations and the correlations after the third highest correlations are relative lower. Thus, M is set as 3 in the experiments to balance the time-saving performance and coding efficiency.

B. CU DEPTH RANGE PREDICTION AND DYNAMIC ADJUSTMENT
In this section, the proposed CU depth range prediction and dynamic adjustment are described in detail. To explain clearly, the related parameters are defined in Table 3.

1) CU DEPTH RANGE PREDICTION
Based on the correlation analysis, we predict the CU depth range from the top M reference CTUs with the highest correlations as shown in Eq. (9). The lower bound (Depth min ) and the upper bound (Depth max ) of the predicted CU depth range are determined by the minimum and the maximum CU depths among these reference CTUs, respectively. Compared to the ordinary criterion which directly predicts the CU depth from all the reference CTUs in spatial, temporal and inter-view domains, the proposed CU depth range prediction has selection mechanism on the fundamental of correlation analysis.
In addition, Merge/Skip or 2Nx2N mode is often selected as the best mode for the PU. If we immediately bypass the CU depth which is out of the predicted range, it may result in significant impact on coding efficiency. Therefore, if the current CU depth (Depth cur ) is not within the predicted CU depth range, we reserve the prediction of Merge/Skip and 2Nx2N modes and omit the other PU modes to reduce the coding time. Besides, intra prediction is unchanged.

2) DYNAMIC ADJUSTMENT FOR CU DEPTH RANGE
Although the proposed CU depth range prediction has thoroughly investigated the spatial, temporal and inter-view correlations, the video content, parameters and frame types between the current coding CTU and the reference CTUs are still slightly different. The possibility of the misjudgment cannot be completely avoided. To avoid the prediction error, we make use of the coding characteristics of the current coding CTU and dynamically adjust the predicted CU depth range. Three kinds of strategies are conceived to fine-tune the CU depth range as follows. The RD normalized is calculated by Eq. (10) RD normalized = 4 Depth cur × RD cur (10) The Depth min is reduced by 1 to additionally include the larger size CU for smaller predicted depth range (Depth range ) by the first condition and possibly smoother CU content by the second condition. The Depth max is increased by 1 to further check the smaller size CU for possibly complex CU partition. The judgements of the latter two conditions mainly come from the information of reference PUs and RD-Costs.

C. FAST INTER PU MODE DECISION
The best PU mode at current CU depth is usually similar to the PU modes in reference CTUs. As a result, the proposed fast inter PU mode decision anticipates the potential PU partitions in advance to accelerate the prediction process. Large proportion of the prediction time can be decreased if we directly restrict the PU partitions to the PU modes which have been already chosen in the reference CTUs. However, the coding efficiency could be dramatically degraded.
During the coding of the independent view, the accessible reference information is comparatively less (no inter-view information). Moreover, the prediction error will also propagate to dependent view through inter-view dependency. Referring to the hierarchical coding order of the random-access prediction structure, the frame distances between either forward-reference or backward-reference frames at the lower hierarchical layers are larger, so the prediction errors are accordingly more evident. Also, if lots of prediction errors are generated in the lower hierarchical layers, these errors are very likely to drift to the higher hierarchical layers. Hence, a strict mode decision strategy is necessary to the coding of the independent view or lower hierarchical layers. In contrast, a powerful mode decision strategy is feasible during the coding of the dependent view coding or the higher hierarchical layers.
Several parameters used in this section are also defined in Table 3. There are two kinds of decision strategies in the proposed fast inter PU mode decision. The first one is based on mode complexity, which is for the independent view coding or the dependent view coding at lower hierarchical layer. The set composed of the PU modes which have been already chosen in the reference CTUs is defined as PU ref .
If PU ref does not include the complicated asymmetric inter mode partitions (Inter_AMP), the prediction of Inter_AMP mode at current CU depth is disabled. If both the asymmetric inter mode partitions (Inter_AMP) and symmetric inter mode partitions (Inter_SMP) do not belong to PU ref , the prediction of Inter_AMP and Inter_SMP mode at current CU depth is disabled simultaneously.
The secondary strategy is based on partition directions, which is for the dependent view coding at higher hierarchical layer. If PU ref is not composed of any horizontal inter partition mode (Inter_Hor), the prediction of Inter_Hor at current CU depth is inactivated. Likewise, the vertical inter partition mode (Inter_Ver) is bypassed at current CU depth if no element in PU ref belongs to Inter_Ver. This method restricts the direction of PU partitions in a straightforward manner, which can reduce much coding time if the reference information is correct enough.
If any neighboring PU selects intra mode as the best mode, it means that not all the reference information refers to the properties of inter prediction, which might cause the decision error of the proposed criteria for inter partitions. To avoid this special case, we execute an intra-boundary check on the border pixels of the M reference CTUs. The checked pixels are illustrated in Figure 6. For the left and upper reference CTUs, the checked pixels are the border pixels of the adjacent encoded PU (blue-highlighted). For the upper-left and upper-right reference CTUs, the checked pixels are the connecting corner pixel (yellow-highlighted). For the forward collocated, backward collocated and inter-view corresponding CTUs, the checked pixels are the four corner pixels (pinkhighlighted). Intra check_PU specifies how many checked pixels are encoded as intra modes. The proposed fast inter PU mode decision is operated only if Intra check_PU is zero (no intra mode is selected as the best mode among the neighboring encoded PUs). The complete flowchart of the proposed fast intra mode decision is demonstrated in Figure 7.

D. SEARCH RANGE ADJUSTMENT
The magnitude of the search range (SR) affects not only the precision but also the time-consumption when executing motion estimation. The most common way is to reduce the search range by a pre-defined scaling factor according to some parameters, such as block complexity. A more flexible method is conceived in the proposed search range adjustment.
We collect the motion vectors (MVs) of all PUs from the reference CTUs. Then, the search range is modified as the maximum MV length among the MVs from the reference  Some special situations are worth discussing. If the AMVP vector of the current PU is (0,0), there are two meanings. The first case is that the best MV candidate of AMVP is (0,0) exactly. The second case is that the reference location of AMVP is encoded as intra mode and the MV is not available. Especially for the second case, the start point of the motion estimation might be inaccurate, so the search range should be large enough. In addition, if the length of the AMVP vector is larger than

E. CU EARLY TERMINATION BASED ON RD-COST ESTIMATION
The proposed method estimates a threshold for the best RD-cost of the current coding CU to early terminate the CU split. Several related parameters are defined in Table 4. The threshold for the CU early termination (TH ET ) can be determined and adjusted based on the RD-cost distributions of the reference CTUs. In this paper, the RD-costs are utilized to evaluate the spatial, temporal and inter-view correlations. The RD-cost distributions seem similar if the CTUs are highly correlated. We use this property by estimating the best RD-cost of the CU for the current CU depth (RD m,Depth cur ,predict ) from the m-th reference CTU for CU early termination. The procedures are illustrated in Figure 8 and described below.
Step 2: Obtain the maximum CU depth (Depth ref ) and the RD-cost (RD m ) of the m-th reference CTU.
Step 3: Predict the RD-cost of the CTU for the current CU depth (RD m,Depth cur ,predict ) based on the RD-cost of m-th reference CTU by Eq.

1) DETERMINATION OF THE LARGER EXTREME FOR RD m
If the RD-cost of m-th reference CTU (RD m ) is an inordinate larger value, we are not sure that RD m is the RD-cost with either good or bad prediction. If RD m belongs to the latter one, the estimated best RD-cost by our method may be valueless. In probability theory, for any probability distribution, the Chebyshev Inequality [27] ensures the percentage of the random variables which is apart from the mean value by a given scaling is not larger than a certain fraction. To avoid the error decision, we adopt the One-side Chebyshev Inequality [28] as a discriminator to distinguish the inordinate RD-cost, which is expressed in Eq. (14). In Eq. (14), One-side Chebyshev Inequality [28] guarantees the probability that RD m is larger than or equal to the mean RD-cost (µ m,Depth ref ) added by k 1 times of the standard deviation (σ m,Depth ref ) is smaller than or equal to 1/(1+k 2 1 ). As the blue-painted region in the Figure 9, if RD m is larger than or equal to µ m, we recognize RD m as a larger extreme. In our design, we set probability ( 1 1+k 2 1 ) as 30%, so that k 1 is 1.528 approximately. If RD m is a larger extreme, the CU early termination based on RD-cost estimation is disabled.
2) DISCRIMINANTION OF THE SMALLER EXTREME FOR RD m,Depth cur ,predict If the RD-cost is comparably low, it usually indicates that the prediction result is quite accurate. Therefore, the predicted RD-cost does not need adjustment. After the predicted best RD-cost (RD m,Depth cur ,predict ) is obtained, we combine the One-side Chebyshev Inequality to determine whether the RD-cost is a smaller extreme, which is expressed in Eq. (15). As the green-painted region in the Figure 10, if RD m,Depth cur ,predict is smaller than or equal to µ m,Depth cur − k 2 × σ m,Depth cur , the RD-costs located in this region are with the relatively accurate prediction results and we recognize RD m,Depth cur ,predict as a smaller extreme. In this case, no additional adjustment is required for RD m,Depth cur ,predict . The predicted RD-cost from the m-th reference CTU for the CU early termination is determined by RD m,ET , which is formulated in Eq. (16). In our experiment, the probability of 1 In order to make the predicted best RD-cost more rigorous, an amendment is operated after obtaining RD m,Depth cur ,predict . Instead of directly setting an adjustment parameter, we perform an adaptive adjustment according to the distribution of RD m,Depth cur ,predict and calculate the RD-cost (RD m,ET ) for CU early termination. If RD m is not a larger extreme and RD m,Depth cur ,predict is not a smaller extreme, we perform an adaptive adjustment on RD m,Depth cur ,predict to make the threshold for CU early termination stricter. For a general normal distribution, if RD m,Depth cur ,predict is closer to the mean value (µ m,Depth cur ), it indicates that RD m,Depth cur ,predict is probably not an outlier, which is more reliable. Only slight adjustment is required. On the contrary, substantially adjustment is necessary if RD m,Depth cur ,predict is far away from the mean value (µ m,Depth cur ).
As the yellow-painted region in Figure 11, when RD m,Depth cur ,predict is larger than µ m,Depth cur , the estimated RD-cost for CU early termination (RD m,ET ) is obtained from adjusting RD m,Depth cur ,predict by Eq. (17). Likewise, as the yellow-painted region in Figure 12, RD m,ET is calculated from adjusting RD m,Depth cur ,predict by Eq. (18) when RD m,Depth cur ,predict is smaller than µ m,Depth cur . Eqs. (17) and (18) linearly scale RD m,Depth cur ,predict by a fraction    relative to the distance away from the mean value (µ m,Depth cur ). Besides, if RD m,Depth cur ,predict is equal to Lastly, we choose the minimum RD m,ET as the final threshold for CU early termination (TH ET ) as Eq. (19). After finishing all the prediction at the current CU depth, if the RD-cost of the best mode is smaller than the estimated RD-cost threshold (TH ET ), it implies that the prediction result at the current CU depth is very precise. Under this situation, we early terminate the following CU split to diminish the coding time.
Besides, if PU ref are all Merge/Skip, the current best PU mode (PU best ) is Merge/Skip, and RD normalized is smaller than the average of the RD-costs of the reference CTUs (RD ref ,avg ), the CU split is also early terminated.

F. OVERALL ALGORITHM
The flowchart of the overall fast algorithm is demonstrated in Figure 13, which includes the proposed CU depth range prediction and dynamic adjustment, fast inter PU mode decision, search range adjustment and CU early termination based on RD-cost estimation.

III. EXPERIMENTAL RESULTS
The proposed fast texture coding algorithm is implemented in the 3D-HEVC reference software version 16.0 (HTM-16.0) [29]. Three-view case with associated depth maps is encoded under random-access configuration. Related settings for the experimental environment completely follow the Common Test Condition (CTC) [30] from JCT3V, which are listed in Table 5.
The test benchmark sequences are tabularized in Table 6. There are two kinds of resolutions (1024 × 768 and 1920 × 1088). These test sequences can be classified into two classes, including the nature-scene videos (Kendo, Balloons, Newspaper, Poznan_Street, Poznan_Hall2) and the artificial animations (Shark, Undo_Dancer, GT_Fly). The evaluation of the time-saving performance for the color texture coding time (TS texture ) is formulated in Eq. (20). Moreover, the coding efficiency is assessed by the Bjøntegaard Delta Bitrate (BD-BR) [31] and the Bjøntegaard Delta Peak Signal-to-Noise Ratio (BD-PSNR) [32]. In addition, the main purpose of the multi-view plus depth format is synthesizing the virtual views, so the coding efficiency of the synthesized view is also necessary to be evaluated. The quality of the synthesized view is evaluated by averaging the Peak Signal-to-Noise Ratio (PSNR) of the six virtual views. Then, the BD-BR of the synthesized view is calculated by the total bitrate and the PSNR of the synthesized view. (20) Table 7 shows the experimental results of the proposed algorithm. V/V, V/T and S/T represent video texture PSNR/video texture bitrate, video texture PSNR/total bitrate, and synthesis view PSNR/total bitrate, respectively. The coding efficiency is well-preserved and the average BD-BR and the average BD-PSNR are only 0.68% and −0.021dB with the 40.75% time-saving for texture coding.  The proposed approach provides more time-saving at the sequences with the larger resolution (Poznan_Street, Poz-nan_Hall2, Undo_Dancer, GT_Fly). The reason is that the video contents in the sequences with the larger resolution are usually smoother than the sequences with the smaller resolution, and the CTUs with smooth content tend to be Depth 0. Therefore, the CTUs of the sequences with larger resolution (Poznan_Street, Poznan_Hall2, Undo_Dancer, GT_Fly) are more likely to be early terminated, which contribute to more coding time reduction.
Studies [19], [20] provide good time-saving performance for 3D-HEVC texture coding. We make a performance comparison of BD-BR(V/V) and texture coding time between our method with the references [19], [20] in Table 8. The average time-saving of the proposed algorithm in texture coding reach to 40.75%, which is apparently better than 33.03% in [19]. Although the time-saving of texture coding of [20] is larger than that of proposed method, regarding the coding efficiency, the average BD-BR(V/V) of the proposed fast texture coding is superior to [20] by 0.51%. Our method provides much better BD-BR than [20] for Newspaper, Shark, Undo_Dancer and GT_Fly sequences. The performance comparison for the BD-BR(V/T) and TS texture between the proposed algorithm and [21] is tabulated in Table 9. The proposed algorithm can speed up 40.75% of the texture coding time, which obviously outpefroms 29.12% (dependent view) of [21] by 11.63%,   which makes a significant progress. Furthermore, the average BD-BR(V/T) of the proposed approach is only 0.74%.
In addition, Figure 14 demonstrates the comparison of subjective quality for the synthesized view of Newspaper sequence for the proposed algorithm and HTM-16.0. The visual quality looks almost the same. Even in the partial enlargements of the complex region and object boundary, the perceptual quality is almost lossless and the time-saving of the proposed fast algorithm achieves to 34.296%. Figure 15 depicts the RD performance for the synthesized view of Newspaper sequence compared to HTM-16.0. The two RD-curves are very close, which indicates the coding efficiency is well-maintained and similar to the original encoder.  [19] and [20].

IV. CONCLUSION
This paper proposes fast texture coding algorithm for 3D-HEVC. We considerably investigate the spatial, temporal and inter-view relationships by correlation analysis. The essential and critical encoded information from the neighboring CTUs with highest correlations are utilized for the proposed strategies. The proposed CU depth range prediction estimates the probable CU depth interval for the current CTU and bypasses most of the encoding procedures at the unimportant CU depths. Two kinds of fast inter PU decision are conceived, which quickly determine the PU partition types by the complexity and direction of the encoded reference CTUs. The search range of the motion estimation is punctiliously performed with the proposed intra boundary check. A CU early termination criterion is also proposed to avoid the exhausting CU split process. The RD-cost of the current coding CTU is predicted by linear scaling from the RD-costs of the reference CTUs. We analyze the RD-cost distribution of the reference frames and consider various situations based on the one-sided Chebyshev inequality. Experimental results show that we have significantly reduced the time of texture inter coding. Our results outperform the previous works with competitive compromise between computational complexity and RD performance. Subjective comparison verifies that the perceptual quality is well-maintained.