Modeling Perceived Quality on 8K VVC Video Under Various Screen Sizes and Viewing Distances

Perceptual video quality considerably affects the quality of experience (QoE) of watching television (TV) broadcasts. Viewing conditions, such as the screen size and viewing distance, impact the perceived quality. We performed subjective evaluation experiments on 8K (7, $680\times 4$ ,320) ultra-high definition (UHD) compressed videos under seven viewing conditions (combinations of 31.5-, 55-, and 85-inch displays and 0.75, 1.5, and 3.0 H (times of screen height) of viewing distance). Distorted videos compressed by the versatile video coding (VVC)/H.266 were used in four types of encoding resolution, from 2K (1, $920\times 1$ ,080) to 8K, at a wide bitrate settings range of 3–80 Mbps. We derived a simple regression equation predicting the mean opinion score (MOS) using the hierarchical linear model (HLM), investigating the factors influencing subjective video quality. In this equation, MOS is expressed as a linear combination of terms including intercept and bitrate associated with sequence and encoding resolution, screen size, and viewing distance; it indicates that the smaller the screen, or the further the viewing distance, the fewer artifacts are perceived, as following empirical rules. Furthermore, we confirmed that the derived model is accurate as the Pearson linear and Spearman rank order correlation coefficients between predicted and actual MOS values were more than 0.97.

pressed in 85 Mbps using the high efficiency video coding 23 (HEVC)/H.265 [3]. Meanwhile, the versatile video coding 24 (VVC)/H.266 [4] was standardized in 2020 as the subsequent 25 video coding scheme of HEVC. Bonnineau et al. [5] con-26 ducted subjective assessments on both 8K HEVC and VVC 27 encoded videos and reported that VVC exhibits an average 28 The associate editor coordinating the review of this manuscript and approving it for publication was Diego Bellan . of approximately 41% of bitrate reduction over HEVC for 29 the same visual quality. Thus, VVC can become a dominant 30 technique for delivering high-quality UHD videos at consid- 31 erably lower bandwidth, such as terrestrial transmission. 32 When watching television (TV) broadcasts, the perceived 33 video quality significantly impacts the quality of experience 34 (QoE) [6]. For the video quality assessments on practical 35 broadcasting, it is necessary to consider a video degrada-36 tion level caused by compression because video coding is 37 inevitably applied, and the target bitrates differ depending 38 on the transmission paths (e.g., satellite, terrestrial, the Inter-39 net). Several models have been proposed to predict the per-40 ceptual quality of compressed videos, designed to be well 41 correlated to the subjective evaluation results. ITU-T Rec.   We compressed the 8K sequences using VVC encoder soft-95 ware in a broadcasting set. We down-converted the 8K videos 96 to 2K (1,920×1,080), 4K, and 6K (5,760×3,240) spatial 97 resolutions with the same 60 Hz temporal framerate to gen-98 erate distorted videos. For all the down-and up-conversion 99 processes, the Lanczos-3 filter [14] using FFmpeg 3 was 100 applied according to previous studies [5], [15]. Next, we load 101 the down-converted and 8K original videos to the encoder. 102 The encoding conditions are presented in Table 1  First, each subject signed a consent form after receiving 159 the experimental overview information. A verbal instruc-160 tion based on a sample instruction for ACR described in 161 Appendix II of P.913 was provided. Subjects were encour-162 aged (1) to evaluate a part in front of them, (2) carefully 163 observe the entire clip before judging, (3) rate the general 164 quality of the video rather than the content, and (4) frankly 165 answer a query on video quality when they saw this clip on a 166 TV screen. 167 Subsequently, a training session was conducted, includ-168 ing the highest and lowest quality 8K compressed 169 videos. Subjects evaluated five test items generated from 170 three sequences that differed from those introduced in 171 Section II-A, namely, the SteelPlant, Festival, and Water 172 polo(scrolling text) sequences from the UHD/WCG test 173 sequences A and B.

231
The performance of the objective metrics was evaluated 232 similarly to previous related studies [5], [24]. The consis-233 tency between the metric values and the subjective evaluation 234 results was investigated by the logistic curve fitting based on 235 the least square method as follows:  items i. 242 We assessed the performance in terms of PLCC, SRCC, 243 and RMSE concerning the corresponding relationship 244 VOLUME 10, 2022 between y i andŷ i . Furthermore, we calculated the       where x andŷ X denote a MOS value and a predicted pro-325 portion of scores X or greater, respectively. The actual pro-326 portion y X corresponding to x is plotted as a circle in the 327 graph. The variables a X and b X are selected to minimize 328 all conditions i (y Xi −ŷ Xi ) 2 : a X determines the distribution 329 width of the scores, whereas b X indicates the MOS value that 330 results inŷ X = 0.5. Table 8 shows the specific values of the 331 variables a X and b X , X = 3 − 5. For comparison, we also 332 arranged those of the DSIS case from our previous study [27] 333 in the table. We did not show the values for X = 2 because 334 of the lack of MOS values less than 2 in the DSIS case, and 335 more than half of the evaluators overlapped in the two exper-336 iments. The values in Table 8 revealed that the distributions 337 of scores 3-5 are like one another, which is contrary to our 338 prediction. S that comes closest to producing π = 95%. Through the 349 investigations over various datasets mostly evaluated by non-350 experts, S CI = 0.5 for 24 subjects and S CI = 0.7 for 351 15 subjects when the 5-level ACR scale was used. As our 352 previous studies indicated that expert results differ from those 353 of non-experts [27], [28], we calculated π for each 0.1 of S 354 using our results obtained from 18 video experts: π = 88% 355 for S = 0.6 and π = 98% for S = 0.7. For comparison, 356 we randomly selected 15 subjects from the 18 subjects and 357 calculated the mean π of 100 trials: π = 92% for S = 358 0.7; and π = 98% for S = 0.8. We confirmed that our 359 experimental results follow the existing S CI rule.

361
As presented in Table 7, the MOS values of 85-3.0H exhibited 362 the best correlations with the four objective quality metrics 363 among the seven viewing conditions, and the results in the 364 viewing conditions with 0.75 H were inferior to others. This 365 phenomenon could be attributed to the following reasons. (1) 366 We applied the VMAF 2K model trained by subjective evalu-367 ation results observed from the viewing distance of 3 H [23], 368 and (2) the viewing distance of a dataset used to determine the 369 parameters of MS-SSIM was 32 pixels per degree of visual 370 angle [22], which should be more than 3 H considering the 371 test patches were 64 × 64 pixels.  the intra-class correlation coefficient (ICC) that measures 416 the similarity within a group for the rest of the considered 417 variables in Table 9. Equation (4) is the definition of ICC, 418 where σ 2 b and σ 2 w is the between-group and within-group 419 variances, respectively.
As denoted in Table 10, the ICCs of the encoding condi-422 tions (sequence and encoding resolution) were more signif-423 icant than those of the viewing conditions (screen size and 424 viewing distance). Thus, we applied HLM to the encoding 425 but not the viewing conditions. 426 We studied several candidate models using R 5 ver.4.1.3 427 (March 2022) and selected a simple yet sufficient perfor-428 mance model. The performance was measured by a good-429 ness of fit in terms of Akaike's information criterion (AIC), 430 Bayesian information criterion (BIC), and log-likelihood 431 (logLik). Appendix A outlines details of the candidate models 432 and their performance.
Here, we explain the terms on the right-hand side of (6) 448 from left to right. Table 11 details the specific values of β 0 and 449 β 1 , which are the intercept and the slope of bitrate i, respec-450 tively. In this model, β 0 and β 1 resulted in distinct values 451 depending on the sequence (seq) and encoding resolution 452 (res) because these were separately derived as the fixed effect 453 and the two types of the random effect that vary with seq and 454 res in seq (denoted as res:seq). We provided specific values 455 for the fixed and random effects in Appendix B, and the 456 TABLE 11. Specific values of β 0 and β 1 in (6). similarities were calculated without the curve fitting of (1). 475 The evaluation results revealed that the proposed model can 476 predict MOS with sufficient accuracy. 478 We confirmed that the regression formula in (6)  and MOS values at actual bitrates (in circles). The lines 496 and circles in blue, red, green, and purple correspond to 497 the results of spatial resolutions at 2K, 4K, 6K, and 8K, 498 respectively, and both dotted lines of 6K and 8K are in 5.0. 499 In the graph, the slope of the 2K regression line in blue 500 is considerably steeper than others (also see β 1 of a08 in 501  Table 11). However, the MOS values in the 2K encoding 502 resolution could be saturated at the bitrate crossed to the blue 503 dotted line, approximately 20 Mbps. A knee point will be 504 observed at approximately 20 Mbps if we conduct subjective 505 assessments at higher bitrates on the 2K encoding resolution. 506 In the proposed model, a simple linear combination works 507 well if the bitrate range is limited. 509 We encoded four 8K sequences in 2K, 4K, 6K, and 8K encod-510 ing resolutions at four bitrates for each resolution using VVC 511 and conducted subjective evaluation experiments under seven 512 viewing conditions with distinct screen sizes and viewing 513 VOLUME 10, 2022 and encoding resolution, screen size, and viewing distance). 525 We evaluated the derived model's performance regarding the 526 similarities between the predicted and actual MOS values and 527 confirmed the high accuracy as both PLCC and SRCC are 528 more than 0.97 and RMSE is less than 0.30.

529
From this study, we reconfirmed that subjects feel limited 530 deterioration in the 31.5-inch 8K display than for the larger 531 8K displays. However, with the smaller screen, observers may 532 feel less ''sense of being there,'' which is a feature of 8K [34]. In this Appendix, the goodness of fit for the seven candidate 539 models M0-M6 was detailed. First, the models were denoted  Table 12 describes the goodness of fit for each model in 551 terms of AIC, BIC, and logLik. The smaller AIC or BIC is, 552 or the larger logLik is, the higher the goodness of fit.

553
Among them, we selected M4 in Section V, though the 554 goodness of fit for M5 and M6 were superior to that of M4.

555
The reason for this was that M4 is simple, with effortlessly 556 comprehended regression coefficients, and adequate accu-557 racy, as displayed in Fig. 7. 558 TABLE 13. Fixed and random effects of β 0 in (6).

560
In this Appendix, we present the fixed and random effects of 561 β 0 and β 1 in (6) in Tables 13 and 14, respectively.