Region-Based Template Matching Prediction for Intra Coding

Copy prediction is a renowned category of prediction techniques in video coding where the current block is predicted by copying the samples from a similar block that is present somewhere in the already decoded stream of samples. Motion-compensated prediction, intra block copy, template matching prediction etc. are examples. While the displacement information of the similar block is transmitted to the decoder in the bit-stream in the first two approaches, it is derived at the decoder in the last one by repeating the same search algorithm which was carried out at the encoder. Region-based template matching is a recently developed prediction algorithm that is an advanced form of standard template matching. In this method, the reference area is partitioned into multiple regions and the region to be searched for the similar block(s) is conveyed to the decoder in the bit-stream. Further, its final prediction signal is a linear combination of already decoded similar blocks from the given region. It was demonstrated in previous publications that region-based template matching is capable of achieving coding efficiency improvements for intra as well as inter-picture coding with considerably less decoder complexity than conventional template matching. In this paper, a theoretical justification for region-based template matching prediction subject to experimental data is presented. Additionally, the test results of the aforementioned method on the latest H.266/Versatile Video Coding (VVC) test model (version VTM-14.0) yield an average Bjøntegaard-Delta (BD) bit-rate savings of $-0.75\%$ using all intra (AI) configuration with 130% encoder run-time and 104% decoder run-time for a particular parameter selection.

Abstract-Copy prediction is a renowned category of prediction techniques in video coding where the current block is predicted by copying the samples from a similar block that is present somewhere in the already decoded stream of samples. Motioncompensated prediction, intra block copy, template matching prediction etc. are examples. While the displacement information of the similar block is transmitted to the decoder in the bit-stream in the first two approaches, it is derived at the decoder in the last one by repeating the same search algorithm which was carried out at the encoder. Region-based template matching is a recently developed prediction algorithm that is an advanced form of standard template matching. In this method, the reference area is partitioned into multiple regions and the region to be searched for the similar block(s) is conveyed to the decoder in the bitstream. Further, its final prediction signal is a linear combination of already decoded similar blocks from the given region. It was demonstrated in previous publications that region-based template matching is capable of achieving coding efficiency improvements for intra as well as inter-picture coding with considerably less decoder complexity than conventional template matching. In this paper, a theoretical justification for region-based template matching prediction subject to experimental data is presented. Additionally, the test results of the aforementioned method on the latest H.266/Versatile Video Coding (VVC) test model (version VTM-14.0) yield an average Bjøntegaard-Delta (BD) bit-rate savings of −0.75% using all intra (AI) configuration with 130% encoder run-time and 104% decoder run-time for a particular parameter selection.

I. INTRODUCTION
I N A block-based hybrid video coding standard like H.264/Advanced Video Coding (AVC), H.265/High Efficiency Video Coding (HEVC) or the recent H.266/Versatile Video Coding (VVC), the pictures of an input video sequence are partitioned into blocks. Then predictive coding (intra or inter) is applied together with transform coding and quantization, followed by the entropy coding of the prediction data and quantization indexes [1], [2], [3], [4]. While intra-picture prediction is used to take advantage of the spatial redundancy in the pictures for video compression, inter-picture prediction is employed for exploiting the temporal redundancy between the pictures. Typically, the former is achieved by extrapolating the boundary samples of the current block 1 in a predefined manner. The DC, PLANAR, and ANGULAR modes in H.264/AVC and its descendants are instances of this type of prediction. On the other hand, inter-picture prediction traditionally relies on motion-compensated prediction (MCP) which is a popular example of copy prediction methods.
A copy prediction approach assumes that a similar block (termed as predictor block) to the current block is present in the already decoded stream of samples. This block is found by initiating a search algorithm in the reference area, and later, its samples are copied to the current block as the prediction signal. MCP, intra block copy (IBC), template matching prediction (TMP) etc. are examples of this type of prediction. In MCP, the predictor block is identified at the encoder-side through a block-matching (BM)-based search mechanism where the original block is compared against the potential reference blocks in the reference area. Error metrics like the sum of absolute differences (SAD) or sum of squared differences (SSD) are typically used for measuring the similarity between the blocks, and the reference block that leads to the least distortion is treated as the predictor block. In a more efficient encoder implementation, the bits for transmitting the motion data (i.e., the motion vector (MV) and index to the reference picture) related to the reference block are also taken into account and the selection that results in the least rate-distortion (RD) cost is chosen as the predictor block. At last, the motion information of the predictor block is transmitted to the decoder in the bit-stream for the final reconstruction of the current block. The second example, IBC, is an intra-picture prediction method that is analogous to MCP with the difference that the reference area is the current partially reconstructed picture [3], [4], [5], [6], [7], [8], [9], [10], [11]. In the case of TMP, the motion information of the predictor block is not transmitted explicitly to the decoder as in MCP and IBC, instead, it is obtained by initiating an identical template matching (TM) search at the encoder and decoder [12]. In detail, a match for the template (conventionally, the neighbouring samples to the top, left and top-left corner of a block is regarded as its template) of the current block is found from the reference area by minimizing an error metric (like SAD, SSD etc.), and then the block corresponding to the best template match is considered as the predictor block of the current block. Finally, the samples of the predictor block are copied to the current block as the prediction. Additionally, if multiple template matches are allowed, the average of the samples of the corresponding blocks is typically used as the final prediction signal. The major drawback of TMP is that the entire search process for the predictor block(s) has to be repeated at the decoder-side also, resulting in a large number of computations there.
Region-based template matching prediction (RTMP), which is the subject of this paper, is an advanced form of standard TMP. Previous publications demonstrated that RTMP has higher coding efficiency and lower decoder complexity than its predecessor [36], [37], [38], [39]. In RTMP, the reference area is partitioned into multiple regions and the final prediction signal is a linear combination of the predictor blocks obtained through TM search from the given region. At the encoder, an independent prediction signal from each region is obtained and the best among them is identified using the standard rate-distortion optimization 2 (RDO) or a similar algorithm. Then, the index of the region that gives the best prediction is conveyed to the decoder in the bit-stream. Later, at the decoder, the TM search routine is carried out for the predictors only in the region corresponding to the parsed index. Thus, the prediction efficiency and decoder computational efficiency of RTMP are improved when compared to TMP.
While previous publications [36], [37], [38], [39] targeted algorithm description and optimization of RTMP for practical applications, this paper focuses on the theoretical background of RTMP. Accordingly, a theoretical justification for the compression efficiency of RTMP in the context of intra coding (however, it holds for inter coding also) is detailed in the next section. After that, the realization of RTMP for a block-based hybrid video codec is explained in section III. It is then followed by the experimental results of RTMP against H.266/VVC in section IV. At last, the conclusions from this publication are given in section V. Note that here after in this paper, RTMP for intra coding is abbreviated as intra-RTM. Similarly, TMP for intra coding is shortened as intra-TM. 2 In a standard encoder, RDO is utilized to identify the best coding parameters among the varied selection of coding tools like coding modes, prediction parameters, quantization parameters etc. for a given coding efficiency. Here, the rate and distortion resulting from a given method are determined. Finally, the coding parameters that minimize the RD cost over a set of choices are considered as the final selection and their side information is transmitted to the decoder [2].

II. ANALYSIS OF REGION-BASED TEMPLATE MATCHING PREDICTION
This section is divided into two parts. Firstly, the prediction efficiency of RTMP is examined in subsection II-A. Later, the RD performance of RTMP is evaluated in subsection II-B. Note that it is assumed throughout this section that TMP, RTMP and IBC (or MCP, in case of inter coding) have the same reference area, and they use the same error metric in their corresponding search algorithms. Additionally, by distortion, we refer to prediction distortion, i.e., the mean squared error (MSE) between the original block and the prediction block generated by RTMP or TMP or any other method.

A. Prediction Efficiency of RTMP
In traditional TMP, the block corresponding to the best template match in the reference reconstructed area is regarded as the predictor block of the current block, and the distortion is measured using this block. Hence, it can be deduced that TMP is inadequate in obtaining the best (or most similar) predictor on all cases, unlike IBC which uses the original block to find its predictor block. This can be explained through statistical dependencies, as the current block is consistently correlated to its original block, however not necessarily to its template in the same way. Thus, the predictor from IBC is the optimal one among TMP and IBC. This leads to the further inference that the displacement error associated with TMP is always greater than that from IBC. Note that the displacement error from a copy prediction method is defined as the difference between the displacement of the predictor obtained through that method and the true displacement. Therefore, if IBC and TMP are the displacement errors from IBC and TMP respectively, then and are the block vectors (BVs) of the current block B c from IBC and TMP respectively. m t = (m t x , m t y ) is the true BV associated with the current block B c . Note that a BV gives the displacement of the predictor block from the current block and it is analogous to MV of MCP. It is clear from the relationship in Eq. (1) that the prediction efficiency from TMP is less than the corresponding efficiency from IBC. Therefore, this subsection describes how the prediction efficiency of TMP can be improved through the transition to RTMP.
In RTMP, the given reference area (or search window) is partitioned into multiple regions such that an independent prediction can be obtained from each region. Further, the region with the best prediction is identified through RDO (or any other similar mechanism) at the encoder-side, and the index of the chosen region is transmitted to the decoder in the bit-stream. Now, at the decoder-side, the TM search process for the predictor is carried out only in the region corresponding to the parsed index. Finally, the prediction signal is generated using the predictor from that region.
An example illustration of intra-TM and intra-RTM (with the number of regions n r = 5) is given in Fig. 1. The predictor block and best template from intra-TM are represented by B p and τ b respectively. Similarly, the predictor block and best template from each region with index ν of intra-RTM are represented by B ν p and τ ν b respectively. The partitioning scheme for intra-RTM is based on a preliminary analysis of the position of intra-TM predictors against the current block. It was observed that most of the predictors originate from the immediate neighbourhood of the current block. Accordingly, in intra-RTM, the first region is treated as the most probable region and it is not partitioned. Moreover, a separate flag is assigned to this region in the region index coding (refer subsection III-E). The rest of the regions are partitioned diagonally. Besides, detailed investigatory tests on varied region shapes have indicated that using only vertical or horizontal regions degrades the performance of intra-RTM since such a partitioning scheme does not well accommodate the immediate neighbourhood of the current block with one most probable region in most occasions [40].
As shown in Fig. 1, only one prediction from the entire reference area is obtained in TMP. Further, no side information needs to be transmitted to the decoder, however the predictor estimation in the entire reference area needs to be repeated there. On the other hand, in RTMP, the region that gives the best prediction in the reference area is identified at the encoder and its index is transmitted to the decoder. In general, RTMP operates as a two-phase method at the encoder-side; In the first phase, the predictor estimation process in each region is carried out one-by-one using the TM search routine. In the second phase, the prediction block from each region is generated using the estimated predictor and it is evaluated through the standard RDO or a similar approach. Finally, the region that gives the least distortion (for RDO, the least RD cost) is considered as the best region and its index is transmitted to the decoder for the final reconstruction of the block.
Additionally, the second phase is comparable to a BM-based displacement (or motion) estimation stage as in IBC (or MCP), however, with a fewer number of reference blocks that were obtained from the first phase. Regarding the encoder complexity, the standard RDO approach in the second phase of RTMP would increase its computations considerably due to the inclusion of the quantization and transform coding steps. However, it is not mandatory to use the standard RDO algorithm here. A simplified RD cost estimation (for example, using Hadamard transform) or a distortion-only cost estimation can give a better gain-complexity trade-off [38], [39].
The distortion from RTMP is always smaller than that from TMP. First, RTMP can also always find the global best template match as in TMP and thus the distortion from RTMP is not greater than the one from TMP. On the other hand, assume for simplicity that RTMP is operated with two regions 1 and 2 and thus, for the comparison, TMP operates with the single region 1 ⊔ 2 . Then it might very well happen that the global best template match is located in region 1 , while the distortion is smaller for TM search restricted to 2 than for TM search restricted to 1 . Thus, in this case, the distortion from RTMP would be smaller than that from TMP. I.e., by testing the predictors related to the region-wise local best template matches in RTMP, the process of BM is mimicked and a predictor closer to or the same as the global best block match is achieved. Hence, the displacement error from RTMP is smaller than the corresponding error from TMP. The following experimental analysis provides evidence for this. First of all, the prediction methods of IBC, TMP, and RTMP (for different numbers of regions n r ) are implemented at the encoder of the VVC test model (version VTM-2.0.1) [41]. Next, the corresponding predictor blocks from each of the approaches are obtained for the same reference area. Then, the relative displacement of the predictor of TMP against IBC (assuming IBC has the true displacement) is calculated as, where the predictor from TMP and IBC is assumed to be located at (TMP x , TMP y ) and (IBC x , IBC y ) respectively. Similarly, if the predictor from RTMP is located at (RTMP x , RTMP y ), then the relative displacement of the predictor of RTMP against IBC is as given below.
The number of occurrences of TMP predictor blocks with respect to the IBC predictor blocks inside a 70 × 70 window is measured using Eq. (4) for 1 frame (using all intra (AI) configuration of the common test conditions (CTC) defined for standard dynamic range (SDR) content by the Joint Video Experts Team (JVET) [42]) with quantization parameter (QP) equal to 22. For simplicity, all experiments in this section are restricted to 4 × 4 blocks. Then, the percentage of occurrences at TMP x = 0, TMP y = 0 is obtained. This indicates how often the predictor from TMP is the same as the predictor from IBC. Next, the number of occurrences of RTMP predictor blocks with that to IBC's is measured for n r ∈ {3, 5, 9, 17} using Eq. (5), and the corresponding percentage of occurrences at RTMP x = 0, RTMP y = 0 is gathered. The related average distortion (the average MSE) in each case is also obtained. At last, the average distortion related to IBC is also measured which is for n r = n s where n s is the number of samples in the reference area. In the experiment, n s = 1140 as n s = ((ζ + M)×(ζ + N )−(M × N )) for the search window as shown in Fig. 1 and ζ = 30, M = 4, N = 4. (Refer section III for more explanations on ζ .) When n r = n s , every sample in the reference area is a region, and the block corresponding to the sample is the predictor. In that instance, the TM search in the first phase of RTMP does not apply and only the BM search in the second phase is carried out. Thus, RTMP becomes equivalent to IBC, assuming the error metric in IBC and second phase of RTMP are the same. Moreover, if RDO is used in the second phase of RTMP, then RTMP would be able to achieve higher compression efficiency than IBC since a full-RD test would be applied for each BV. Further, in such a case, the encoder of RTMP would be much more complex than the encoder of IBC due to the same reason.
The above analysis is carried out for various sequences and the corresponding test results are summarized in Table I. Note that when n r = 1, RTMP is identical to TMP. The general observation from the experimental results is that, for any sequence, the percentage of occurrence at RTMP x = 0, RTMP y = 0 increases as the value of n r increases, i.e., the number of instances where the RTMP predictor is identical to the IBC predictor increases as its number of regions increases. Additionally, the distortion from RTMP decreases as the number of regions increases. Furthermore, in an ideal case, the RTMP distortion approaches the optimal (copy prediction) distortion when n r = n s . Hence, altogether, it is concluded that the displacement error from RTMP is lower than that from TMP, i.e., RTMP ≤ TMP (6) and where RTMP and TMP are the displacement errors, and D RTMP and D TMP are the distortions from the RTMP and TMP methods respectively. In other words, the prediction efficiency of RTMP is higher than the corresponding efficiency of TMP. Lastly, Table I also demonstrate the adaptability of RTMP between TMP and IBC by varying its number of regions n r .

B. Rate-Distortion Performance of RTMP
In the previous subsection we showed that the prediction efficiency of RTMP is higher when compared to that of TMP. In this subsection, an expression for a simplified RD cost of RTMP is formulated and evaluated against the simplified RD cost of TMP.
Let D m be the distortion between the original block and the block predicted through method m, and R m the rate associated with the side information from method m, then the simplified RD cost related to method m is given by, Here, λ is the Lagrange parameter that determines the trade-off between D m and R m , and is set according to where Q is the quantization step-size and c is a constant [2], [43], [44], [45]. Further, Q is defined as where Q p is the QP and κ is the bit-depth in bits per sample [46]. Now, for TMP, the rate associated with it is, as there is no side information to be conveyed to the decoder. Hence, if D TMP is the distortion related to TMP, then the simplified RD cost of TMP according to Eq. (8) is, The above expression indicates that the RD cost of TMP only depends on its distortion and remains constant over varying values of Q p since the Lagrange parameter λ does not effect J TMP . This is particularly advantageous for compression at low-rates (i.e., at high Q p values).
Similarly, for the case of RTMP, suppose the reference area is partitioned into n r regions and the region index is binary coded, then the approximated rate for RTMP is, Therefore, if D RTMP is the distortion related to RTMP, then the simplified RD cost of RTMP according to Eq. (8) is, Next, an expression for D RTMP is required to evaluate the RD cost J RTMP associated with RTMP. From the RTMP distortion values collected for different values of n r in the experiments in subsection II-A, it is observed that the distortion of RTMP decreases non-linearly as the value of n r increases. This behaviour can be compared to an exponential decay function, as given below: where A(t) is the final amount at time t, A 0 is the initial amount, γ is the decay constant (i.e., the constant that determines the rate of decay) and C is an offset.
In the case of RTMP, as shown in the experimental results in subsection II-A, the distortion of RTMP decreases as the number of regions n r in the reference area increases. Thus, the distortion of RTMP for a given reference area is influenced by the number of contained regions, and therefore for RTMP, t in Eq. (15) can be associated by n r . Further, when n r = n s , RTMP is comparable to IBC, i.e., the lower bound of the distortion from RTMP depends on the distortion from IBC (assuming IBC gives the optimal copy prediction distortion). This implies that the offset C can be given as, where ω is an additional distortion that depends on the content and search algorithm. Ideally, ω = 0. Additionally, when n r = 1, D RTMP = D TMP since RTMP is equivalent to TMP in that case. This can be interpreted using Eq. (15) and (16) for n r = 1 as, Hence, At last, substituting the above terms for A 0 and C in Eq. (15), the expression for the RTMP distortion can be approximated as, Thus, the distortion D RTMP of RTMP depends on the number of regions n r and decay constant γ for a given value of the distortions D TMP , and D off . The experimental and theoretical distortions of RTMP against number of regions n r ∈ {1, 3, 5, 9, 17} for the same sequences used in the experiments in subsection II-A are plotted in Fig. 2. The theoretical values are calculated using the constants (see Table II) that are collected through curve fitting of the corresponding experimental values. It can be observed that the theoretical values consistently approximate the experimentally obtained values.
The dependency of distortion D RTMP of RTMP against the number of regions n r for varying values of the decay constant γ is illustrated in Fig. 3. Since the experimentally obtained values of γ are between 0 and 1, the same range of values is considered for the theoretical analysis also. The distortions of TMP and IBC are assumed to be D TMP = 3923 and D IBC = 1190 respectively, and the number of samples in the reference area n s to be 1140 (from the experiments related to the sequence Johnny in subsection II-A). It can be observed that D RTMP decreases exponentially from the initial value D TMP with increasing value of n r and eventually equals D IBC , provided ω = 0. In other words, D RTMP can be varied between D TMP and D IBC by simply modifying the number of regions n r in the reference area. Besides, Fig. 3 show that D RTMP decreases with increasing value of γ as expected from the general behaviour of an exponential decay function.
At last, applying Eq. (19) into Eq. (14) we obtain,  The relationship between the RD cost J RTMP and the number of regions n r for a given QP and reference area (i.e., for a given value of γ , n s , ω, D TMP and D IBC ) using Eq. (20) is plotted in Fig. 4. The constant c and bit-depth κ for calculating the Lagrange parameter λ (see Eq. (9) and (10)) are assumed to be 0.12 and 8 respectively [45]. It is clear that the RD cost of RTMP is smaller than the related cost of TMP for the chosen QP and reference area. Further, as the value of n r increases, the RD cost of RTMP rapidly decreases to a minimum value (say at n r = n r thres ), and after that, slowly increases. This increase in the RD cost beyond n r thres can be justified by the fact that the rate of signalling of the region index becomes substantially large for such instances, even though the corresponding distortion is rather small. Consequently, the value of n r for the RTMP algorithm should be chosen such that n r ≤ n r thres . Besides, as n r thres ≪ n s , the value of n r should also be selected such that n r ≪ n s .
In order to further evaluate the behaviour of the RD cost of TMP and RTMP, their corresponding theoretical values are calculated using Eq. (12) and (20) respectively for D TMP = 3923, D IBC = 1190, ω = 0, n s = 1140, c = 0.12 and κ = 8, and plotted against the quantization parameter Q p ∈ {1, 2, 3 . . . 42} as shown in Fig. 5. For the case of RTMP, the number of regions n r considered for the examination are n r ∈ {3, 5, 9, 17}, and the decay constant γ of D RTMP is assumed to be 0.5. As expected, the RD cost of TMP remains constant over the varying values of Q p . On the other hand, the RD cost of RTMP for any value of n r starts from a smaller value than that of TMP, remains constant until a particular value of Q p (until Q p = Q pthres ), and then, increases steadily. It is also observed that the rate of increase in the RD cost of RTMP for a smaller value of n r is smaller than that with a higher value of n r . Further, the RD cost of RTMP remains smaller than the corresponding cost of TMP for a wide range of Q p values. More precisely, TMP wins over RTMP only at very high Q p values as its speciality of no data overhead is particularly beneficial in such occasions. Furthermore, the RD cost for RTMP is supposedly smaller than in the given model, since entropy coding of the region index is not taken into account in the current analysis.
Apart from the findings mentioned above, it should be pointed out that the decoder complexity of RTMP is significantly smaller compared to the corresponding complexity of TMP. This is because the search routine for the predictor(s) in RTMP is executed only in the given region of the reference area instead of the complete reference area as in TMP. Further, this leads to reduced memory accesses in RTMP with respect to TMP at the decoder-side. Altogether, it can be concluded that higher coding and computational efficiencies than TMP can be achieved with the RTMP approach. In other words, RTMP outperforms TMP.

III. IMPLEMENTATION OF REGION-BASED TEMPLATE MATCHING PREDICTION IN BLOCK-BASED VIDEO CODECS
In this section, the application of the RTMP method for intra coding on a standard block-based video codec is explained.

A. Algorithm
Let B c be the M × N block to be predicted and X be the current partially reconstructed picture. Then, the search window in the immediate neighbourhood of B c is partitioned into n r regions according to the constraints from (a) to (e), as illustrated in Fig. 6. Any region is represented by ν where the value of ν belongs to ν = 1, 2, 3, . . . , n r .
(a) The borders of the regions are clearly defined so that the TM search for the predictor block at the decoder is synchronized with that at the encoder. Additionally, the regions are non-overlapping in order to prevent rechecking of the reference templates. (b) The first region 1 is covered by the top and left area of B c . As mentioned before, the immediate neighbourhood of the current block is the most probable region and hence it is not partitioned. (c) n r is an odd number so that all the areas other than 1 are partitioned with diagonal symmetry. (d) n r − 1 is a power of 2. This is for the efficient signalling of the region index ν (refer subsection III-E). (e) As deduced from subsection II-B, the value of n r is very small compared to the number of samples in the search window, i.e., n r ≪ n s . Assuming that the conditions from (a) to (e) are met, the relationship between the parameter that determines the size of the search window ζ and regions δ is generalized by, where ⌊·⌋ is the floor function. Now in intra-RTM, given a region index ν, the TM search process is carried out in region ν for finding the predictor block of B c . In detail, if τ c is the template associated with B c and τ ν r is any possible template in the region ν of X , then the SSD error between them is, where n τ is the number of samples in the template that are used for error calculation and j corresponds to individual samples in the template. Then, the reference template τ ν r that gives the minimum error against τ c is termed as the best template τ ν b from the region ν , i.e., τ ν b = arg min r (ϵ ν r ).
Finally, the block corresponding to τ ν b is regarded as the predictor B ν p of the current block B c from ν . Later, the prediction signal of RTMP from the given region is generated using the adaptive weighted averaging (AWA) technique. It is explained in the next subsection.

B. Adaptive Weighted Averaged Prediction
The prediction efficiency of a copy prediction method can be improved by using more than one predictor block. In that case, conventionally, the average of the corresponding samples of the predictors is used as the final prediction signal. The averaging process causes a smoothing effect (provided the multiple predictors have a similar distortion) and the distortion of the predicted block against the original block decreases, resulting in an improved outcome. Various publications on TMP in the literature have demonstrated that enabling averaging can enhance the coding efficiency of TMP (for examples see [21], [22], [23], [35]). Nevertheless, our studies (refer Appendix) indicated that the usage of a fixed number of predictors is not favourable in all cases. This observation has led to the development of AWA. In AWA, the final number of predictor blocks and their weights in the prediction signal generation of RTMP is decided based on the TM error between the best predictor and other predictors. It is described in the following.
Let n p be the number of predictors collected from the given region through TM search. Based on the detailed investigation on the value of n p for intra-RTM on H.266/VVC, it is found that n p = 3 is a suitable choice. Now, let P 1 , P 2 , P 3 are the multiple predictor blocks from the given region and ϵ p1 , ϵ p2 , ϵ p3 are the SSD error related to them respectively such that ϵ p1 ≤ ϵ p2 ≤ ϵ p3 . Then, the final prediction of intra-RTM, where ϵ thres = 2ϵ p1 . In this way, the final number of predictor blocks and their weights are not predetermined in RTMP, instead, they are decided at the end of the TM search algorithm based on the TM error between the best predictor (P 1 ) and other predictors. Note that separate TM search routines are not initiated for finding the additional predictors when averaging is enabled, i.e., the given region is not searched multiple times but only once. All the n p predictors are gathered in a single iteration. The major difference for the case of n p > 1 is that a sorting algorithm is carried out at the end of every reference template check, in order to keep track of the best n p predictors in the increasing order of their associated TM error. This may result in an increase in the complexity, especially if the value of n p is large. Additionally, the multiple predictors have to be recorded and maintained during the complete TM search process.

C. Encoder Search
At the encoder-side, the RDO algorithm is utilized to identify the region that gives the minimum RD cost and also to determine whether RTMP is to be applied to the current block or not. In detail, individual prediction signal from each region is obtained one-by-one and their corresponding RD cost is calculated. Then, the region that gives the least cost among the n r regions is considered as the best region. Later, the cost of the best region is compared with the cost of the other tools. Finally, if RTMP has the least cost among all, it is treated as the best mode of prediction for the current block B c . In that case, the index ν of the best region is transmitted to the decoder in the bit-stream, along with the RTMP mode flag, for the final reconstruction of B c .

D. Intra-RTM for Chroma
The intra-RTM method can be applied to chroma components in the same manner as luma. From the implementation perspective of H.266/VVC, there are two ways of doing this.
1) As an additional prediction mode -Here, intra-RTM is introduced as a separate chroma prediction mode. Thus, the chroma blocks would always have the option for coding through intra-RTM like the luma blocks. Consequently, supplementary full-RD tests (equivalent to the value of n r ) as in a luma block are required at the encoder, resulting in an increased encoder complexity. Further, extra bins are necessary for transmitting the region index of the chroma blocks to the decoder. 2) As an intra derived 3 mode (DM_CHROMA) -In this option, intra-RTM is applied to the chroma components only if the current chroma mode is DM_CHROMA and the corresponding luma mode is intra-RTM. Hence, no extra full-RD tests are required. Further, the value of the region index ν for the chroma components is copied from the corresponding luma component. Thus, there are no extra syntax elements for the chroma components. Note that having the RTM tool always in the first option does not mean that the chroma block would be coded always with RTMP. This only implies that intra-RTM would be added always for full-RD testing, and based on the results, the final choice would be made, as in a luma block.
Since the second option offers a better gain-complexity trade-off, it is recommended for a more competent extension of intra-RTM to the chroma components. Note that this option is adopted for the chroma components in the experiments in section IV. Additionally, the region size parameter for the chroma predictors is modified as, so that it is comparable to the typically used 4:2:0 chroma sub-sampling format. δ is the region size parameter of luma.

E. Signalling
In order to identify whether the RTMP method is to be applied to the current block or not, a flag (intra_rtm_flag) is added to every coding unit (CU). When it is true, the index ν of the region is decoded from the bit-stream.
The coding scheme related to the region index transmission is shown in Fig. 7. Since 1 is the most probable region, a separate flag termed as the mpr_flag is utilized for this region. Therefore, in the region index coding of RTMP, the mpr_flag is decoded first. Suppose it is true, the value of ν is set to 1. On the other hand, if it is false, then the region index is parsed using fixed-length coding with a p number of bins. All the bins are coded using the context-adaptive binary arithmetic coding (CABAC) engine.

IV. EXPERIMENTAL RESULTS
The experimental results of the intra-RTM technique on top of the VVC Test Model (version VTM-14.0) are presented and discussed in this section. The JVET CTC defined   ENCT AND DECT ARE  THE ENCODER AND DECODER RUN-TIME RESPECTIVELY. AVG. IS THE AVERAGE BD-RATE SAVINGS OF CLASSES A1, A2,  B, C AND E. TEST 1 -CODING TOOLS CONFIGURED AS IN THE JVET CTC. TEST 2 -TEST 1 WITH ISP, MIP, MRL  DISABLED. TEST 3 -TEST 1 WITH IBC ENABLED for SDR content is utilized for evaluating the test results [3], [42]. However, the tests are restricted for the all intra (AI) and random access (RA) configurations. Further, as recommended by the JVET CTC, the coding efficiency of a test is measured in Bjøntegaard-Delta (BD) bit-rate savings which represents the average bit-rate savings for the same video quality (here PSNR). It is denoted as a percentage of the reference bit-rate. Hence, if the measured BD-rate value results in a negative value, it implies a coding gain from the proposed method. On the contrary, a positive value of the same signifies coding loss [47]. Additionally, as per the JVET CTC, the tests are repeated for QP values Q p ∈ {22, 27, 32, 37} and their average BD-rate is considered as the final coding gain value of a test. Three test results of intra-RTM are examined in this section. In the first test, all the VVC coding tools are configured as recommended by the JVET CTC. This helps to understand the performance of intra-RTM when all the VVC coding tools are enabled. However, the complete potential of intra-RTM would not be visible in this test due to the presence of other intra tools. Furthermore, all the coding tools in the standard are typically not enabled in a practical application. Hence, in the second test, only the basic intra tools are enabled (i.e., the intra tools ISP, MIP and MRL are disabled). At last, in the third test, IBC is enabled additional to the CTC. This test is intended to understand the performance of intra-RTM against IBC which is an established copy prediction method used for intra-picture prediction in screen content coding (SCC). The three above-mentioned tests are summarized below. Note that both the anchor and test are configured as given below; the only difference is that, in the test, intra-RTM is enabled.
• Test 1 -Coding tools configured as in the JVET CTC. • Test 2 -Coding tools configured as in the JVET CTC, however, ISP, MIP and MRL are disabled.
• Test 3 -Coding tools configured as in the JVET CTC, however, IBC is enabled.
The corresponding results of the aforementioned tests are presented in Table III. The values of search window size parameter ζ , region size parameter δ and number of regions n r related to intra-RTM are 60, 12 and 9 respectively (for the SCC sequences in classes F and TGM, they are 60, 30 and 3 respectively). Further, the template width η in samples is 1. Note that the values of ζ , δ, n r , η and n p are chosen experimentally such that they offer a reasonable trade-off between coding gain and computational complexity (refer Appendix). Additionally, for the purpose of comparison, the corresponding test results of intra-TM (ζ = 60, η = 1) are included in Table III. The AWA option is applied to intra-TM also. However, in the standard TMP approach, normal or weighted averaging is typically used. Besides, similar to intra-RTM, intra-TM is enabled to the chroma components through the intra derived mode and the search window size parameter is modified as in Eq. (25).
An average BD-rate saving of −0.75% is obtained by intra-RTM in Test 1 with only 104% decoder run-time for AI configuration. For RA configuration, it is −0.23% with 99% decoder run-time. This test demonstrates the coding gain from intra-RTM when all the coding tools of H.266/VVC is enabled as per JVET CTC. The noticeable coding gains in all the classes indicate that intra-RTM is applicable to a wide variety of sequences. In Test 2, the average BD-rate savings is −1.07% with 105% decoder run-time. This test shows the potential of intra-RTM in a comparatively less competitive environment. In the case of Test 3, the average BD-rate savings from intra-RTM is −0.43% with 103% decoder run-time. This test indicates that intra-RTM can achieve a justifiable coding gain even in the presence of IBC. The comparatively less overhead than IBC has benefited intra-RTM in this test. Note that, according to JVET CTC, IBC is always enabled in classes F and TGM. That is why these classes have identical results in Tests 1 and 3.
Comparing the test results of intra-RTM and intra-TM, it is clear that intra-RTM has consistently higher BD-rate savings than intra-TM. This confirms the findings from section II that RTMP has higher coding efficiency than TMP. Besides, intra-RTM has negligible decoder run-time, almost as identical as the reference. 4 This is due to the fact that in RTMP the search routine for the predictors of the current block is initiated in the given region of the search window only whereas it is carried out in the entire search window in TMP. The experimental results further indicate that the bit-rate savings of intra-TM and intra-RTM are higher in AI configuration when compared to RA configuration. This is because intra-TM and intra-RTM are intra tools and are used predominantly in intra-pictures.
It is also noticed from the experimental results in Table III that the encoder run-time of intra-RTM is more than that from intra-TM. This is because the standard RDO mechanism is used to identify the best region in these tests. However, some optimizations as proposed in [38] and [39] can be adopted to considerably reduce the encoder complexity of RTMP. For example, when the optimizations were incorporated into a similar implementation of the given intra-RTM version on VTM-2.0.1, the encoder run-time reduced from 203% to 133% while the BD-rate savings dropped from −1.14% to −1.08%.

V. CONCLUSION
The RTMP technique is an advanced form of traditional TMP. In RTMP, the reference area is partitioned into multiple regions and the final prediction signal is obtained through a linear combination of the predictor blocks from the given region. The index of the chosen region is conveyed to the decoder in the bit-stream. In this publication, a detailed Test results on various RTMP-parameter analyses are summarized here. All tests are with the number of regions n r = 5.
1) Predictor averaging: The different predictor options considered for this analysis are given in Table V where ϵ thres = 2ϵ p1 (refer subsection III-B for more details on AWA and ϵ thres ). The value of the weights of the predictors is a power of 2 and the summation of all the weights is also a power of 2 such that multiplications and divisions can be efficiently implemented as bit shifts.
The corresponding test results are in Table IV In 1999, he joined the Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Berlin, where he is currently the Head of the Image and Video Coding Group and the Video Coding and Analytics Department. He was a major Technical Contributor to the entire process of development of the H.264/MPEG-4 Advanced Video Coding (AVC) standard and the H.265/MPEG High Efficiency Video Coding (HEVC) standard, including several generations of major enhancement extensions. In addition to the CABAC contributions for both standards, he particularly contributed to the fidelity range extensions (which include the high profile that received the Emmy Award in 2008) and the scalable video coding extensions of H.264/MPEG-4 AVC. During the development of its successor H.265/MPEGHEVC, he also successfully contributed to the first model of the corresponding standardization project and further refinements. He made successful proposals to the standardization of its range extensions and 3D extensions. He has authored numerous publications in the research area of image and video coding, and holds several hundreds of internationally issued patents and patent applications in this area. His current research interests include still image and video coding, signal processing for communications and computer vision, machine learning, and information theory.
Dr. Marpe is a member of the Informationstechnische Gesellschaft of the Verband der Elektrotechnik Elektronik Informationstechnik e.V. He received several best paper awards for his publications. As a student, he was a Visiting Researcher at Kobe University, Japan, and the University of California at Santa Barbara and Stanford University, USA, where he also returned as a Visiting Professor. He was a consultant and co-founder of several start-up companies. He is currently a Professor with the Department of Electrical Engineering and Computer Science, Technical University of Berlin. He is jointly the Head of the Fraunhofer Heinrich Hertz Institute, Berlin, Germany. Since 1995, he has been an active participant in standardization for multimedia with many successful submissions to ITU-T and ISO/IEC. For his research, he received numerous research and innovation awards and various best paper awards for his publications. Since 2014, Thomson Reuters named him in their list of "The World's Most Influential Scientific Minds" as one of the most cited researchers in his field. He has been elected to the German National Academy of Engineering (Acatech) and the National Academy of Science (Leopoldina). Since 2018, he has been appointed the Chair of the ITU/WHO Focus Group on Artificial Intelligence for Health.