SJTU-TMQA: A Quality Assessment Database for Static Mesh with Texture Map

In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compression and enhancement. In this paper, we create a large-scale textured mesh quality assessment database, namely SJTU-TMQA, which includes 21 reference meshes and 945 distorted samples. The meshes are rendered into processed video sequences and then conduct subjective experiments to obtain mean opinion scores (MOS). The diversity of content and accuracy of MOS has been shown to validate its heterogeneity and reliability. The impact of various types of distortion on human perception is demonstrated. 13 state-of-the-art objective metrics are evaluated on SJTU-TMQA. The results report the highest correlation is around 0.6, indicating the need for more effective objective metrics. The SJTU-TMQA is available at https://ccccby.github.io


INTRODUCTION
With the technological advancements of computer graphics and the development of rendering technologies, 3D static meshes with texture maps are constantly applied in many areas due to their effectiveness in representing 3D objects or scenes.A typical 3D textured mesh contains a number of faces with 3D points as vertices, each face is textured with a texture map indicated by texture coordinates.For brevity, we use textured mesh to indicate static mesh with texture map.The quality of textured mesh is important for human perception-oriented applications, such as immersive gaming, animation, and digital museums.However, 3D textured meshes have a large volume of data.They require effective compression and transmission algorithms before practical utilizations, in which different types of distortion might be introduced and degrade subjective perceived quality.To op-timize textured mesh processing algorithms with respect to quality of experience, mesh quality assessment (MQA) has become a hotspot in recent study [1][2][3].
MQA includes two aspects: subjective and objective quality assessment.Subjective quality assessment is the most reliable method, which needs to invite subjects to evaluate the perceptual quality of distorted meshes in strictly controlled testing environments.Objective quality assessment aims to study objective metrics that have high correlations with human perceptual quality, replacing subjective experiments in practical and real-time applications to reduce the cost of time, human resources, and money.Therefore, to design effective objective quality metrics and facilitate the application of textured meshes, subjective MQA needs to be fully studied, and a database containing diverse mesh contents, rich distortion types, and reliable mean opinion scores (MOS) is expected.
Over the past years, some researchers have conducted studies on subjective MQA and established several databases.For example, [4,5] focus on colorless meshes and mainly consider single distortion types, such as noise addition and lossy compression.[3] studies meshes with vertex color and releases a database with 480 distorted meshes under compression and simplification distortion.[1,2] investigate textured meshes and propose superimposed distortion types, including mesh simplification/decimation, texture map downsampling, and coordinate quantization.
However, the aforementioned public databases have weaknesses, limiting their utilization in current studies.First, [3][4][5] are for colorless or vertex-color meshes, while meshes with texture map are the star of emerging immersive multimedia applications.Second, they are limited by the small-scale [4,5] or the restricted range of distortion types [1][2][3], making them insufficient for a comprehensive MQA study.
To mitigate the above problems, we create a large-scale textured mesh database containing rich contents and multiple types of distortion in this paper, called SJTU-TMQA.21 reference meshes are selected from different categories, including human figures, inanimate objects, animals, and plants.Eight types of distortion: six single distortion types and two superimposed distortion types are injected into each reference mesh at different distortion levels, leading to 945 distorted meshes.The distorted meshes are rendered into processed video sequences (PVS) with a predefined camera path, and 73 viewers aged 18 to 30 are collected to perform subjective experiments with a lab environment.The diversity of source content, the accuracy of the MOS, and the influence of different types of distortion are demonstrated.13 state-ofthe-art (SOTA) objective metrics are tested on SJTU-TMQA.The best results report correlations of around 0.60, indicating that the proposed SJTU-TMQA is a challenging database and serves as a catalyst for a more effective objective metric study.

DATABASE CONSTRUCTION
In this section, we detail the construction of SJTU-TMQA, including source mesh selection, distortion generation, PVS generation, training and rating session, and outlier removal.

Source mesh selection and preprocessing
To better study the perceived subjective quality of textured meshes, 21 high quality source meshes are carefully selected from SketchFab1 .These meshes encompass a diverse array of categories, including human figures, inanimate objects, animals, and plants.Fig. 1 illustrates the snapshots of the source content.PymeshLab2 library is used to remove redundant and invalid information (e.g., unreferenced vertices and null faces) from the reference mesh as proposed in [6].

Distortion generation
To simulate various types of distortion resulting from acquisition noise, resampling, compression, and other factors, 8 different distortion types are introduced and detailed as follows: •Downsampling (DS): DS is applied to the texture map of the textured mesh.The "Image.LANCZOS" low-pass filter offered by PIL library3 is used to resize the texture map to 45%, 35%, 25%, 15%, and 5% of the original resolution.
•Gaussion noise (GN): GN is applied to the vertex coordinates of the textured mesh.All vertices of reference meshes are enhanced with a random Gaussian distributed geometry shift which magnitude are 0.5%, 1.0%, 1.5%, 2.0%, and 2.5% of the minimum dimension of the bounding box.
•Texture map compression (TMC): TMC is applied to the texture map of the textured mesh.We use the "imwrite('jpg', 'Quality')" compression function offered by Matlab software, which is based on the libjpeg library 4 , with the following quality parameters: 24, 20, 16, 12, 8, and 4.
•Quantization Position (QP): QP is applied to the vertex coordinates of the textured mesh.Draco5 is used to perform uniform quantization with bits set to 7, 8, 9, 10, and 11.
•Simplification without texture (SOT): SOT is applied to the faces of the mesh sample, in which the number of vertices is reduced and consequently leads to larger face sizes.Iterative edge collapse and a quadric error metric (QEM) [7] are used to perform simplification and reduce the number of faces by 10%, 25%, 40%, and 55% compared to source meshes.
•Simplification with texture (SWT): SWT is also applied to the faces of the mesh sample, but the texture information is injected to guide the QEM simplification results.We uniformly reduce the number of faces by 20%, 35%, 50%, 65%, and 80% compared to source meshes.

PVS generation
To perform subjective experiments, each distorted mesh is rendered to PVS with 1920x1080 resolution and 30 fps, using a pre-defined camera paths: the camera rotates around the z axis with a rotation step of 0.75 • degrees per frame, and the rotation radius is equal to the mesh maximum bounding box.A complete rotation (360 • ) around the mesh results in 495 frame images captured by OpenGL.Then, we group the images into PVSs using FFMPEG with libx265 , and the constant rate factor is set to 10 to ensure visually lossless encoding [8].Each PVS has a duration of 16 seconds.

Training and rating session
To ensure the reliability of the collected subjective scores, we use "bench" shown in Fig. 1 to generate a training session with the same method as [1].In the rating session, a double stimulus impairment scale method is used and an 11-level impairment scale proposed by ITU-T P. 910 [9] is used as the voting method.The subjective experiment is conducted on a 27-inch AOC Q2790PQ monitor with resolution 2560×1440 in an indoor lab environment under normal lighting conditions.The display resolution is adjusted to 1920×1080 to ensure the consistency with the PVSs.To avoid visual fatigue caused by an overly long experiment time, we randomly divide the 945 PVS into 21 subgroups.

Outlier removal
Two consecutive steps are adopted to remove outliers from the raw subjective scores.First, each rating session additionally contains an extremely low-quality PVS and a duplicated PVS, known as "trapping samples".After collecting subjective scores, we first remove outliers according to the trapping results.Second, ITU-R BT.500 [10] is used to detect and remove outliers again.Finally, three outliers are identified and removed from the poor subjective score.

DATABASE ANALYSIS
In this section, the diversity of content in SJTU-TMQA is first proved, then subjective experiment results are analyzed to demonstrate the reliability of MOS.

Diversity of SJTU-TMQA content
Geometry and color complexities are proposed to validate the diversity of content, which quantified by spatial perceptual information (SI) [9] and the color metric (CM) [11], respectively.We use the depth and color image obtained by projection with six views of its bounding box [12] to calculate the SI and CM of the reference mesh.The maximum SI and CM values are selected to illustrate the scatter plot of geometry complexity vs. color complexity in Fig. 2 To prove the accuracy of MOS and analyze the impact of different distortion on subjective perception, MOS vs. distortion parameter plots of four meshes which belong to different types of content (i.e., deadRose, elena, fruitSet, and hawk), are shown in Fig. 3. Except for QP, most of the curves of DS, GN, TMC, SOT, and SWT showcase perfect monotonicity, which proves the accuracy of the MOS.For QP, except for "elena", the other three meshes present limited MOS variations.We think the reasons are: first, the influence of QP can be masked by mesh texture; and second, "elena" belongs to the human figure and human observers are particularly sensitive to facial features that are known as salient areas [8].Minor distortion in these areas can easily be detected and reflected via MOS variation.

OBJECTIVE METRICS TESTING
Four types of objective metrics are tested based on SJTU-TMQA: image-based, point-based, video-based, and modelbased metrics.Image-based metrics, proposed by [13], use 16 projected images of meshes to quantify quality.Two image-based quality metrics (Geo PSNR and RGB PSNR) are tested.Point-based metrics first use sampling to convert mesh into point clouds, and then measure quality using point cloud objective metrics.Four point-based metrics (D1 [14], D2 [15], YUV PSNR, and PCQM PSNR [16]) are tested.Grid sampling with a grid resolution of 1024 is used to sample meshes into point clouds as proposed in [13].Video-based metrics use the PVSs viewed in the subjective experiment as input, then image/video quality metrics are applied to predict mesh quality.Three video-based metrics (PSNR, SSIM [17], VMAF [18]) are calculated.Model-based metrics directly use the raw data from the mesh to assess quality.Four modelbased metrics (Hausedorff distance (HD) [19], GL2 [20], MSDM2 [21], and TPDM [22]) are tested.

Performance of metrics
To ensure consistency between the objective and MOS of the various objective metrics, a five-parameter logistic fitting function proposed by the video quality experts group [23] is used to map the dynamic range of the scores from the objective metric to a common scale.Two indicators commonly used in quality assessment society are offered to quantify the efficiency of various metrics: Pearson linear correlation coefficient (PLCC) for prediction accuracy, and Spearman rank-order correlation coefficient (SRCC) for prediction monotonicity.

Correlation of metric
The results of the metric on the entire database are shown in Table 1 "All" columns.YUV PSNR reports the best performance, followed by RGB PSNR, PCQM PSNR, and VMAF.Fig. 4 shows the scatter plots of two metrics, in which the yellow lines represent the best-fitted cruves.We observe that the scatter plot of YUV PSNR is obviously better than VMAF, in which the scatter points are closer to the best-fit line.YUV PSNR tend to give low scores for GN samples.VMAF leans towards reporting high scores for QP and TMC.The best overall correlations are below 0.6, which is far from the expectation that a robust metric should present a correlation at least above 0.80, indicating that SJTU-TMQA is a challenging database.Geo PSNR, D1, D2, and all modelbased metrics show extremely low performance.The reason is that they only consider geometric features, while some samples in SJTU-TMQA are lossless with regard to geometry information, such as DS and TMC.

Analysis by type of distortion
For an in-depth analysis, the SRCC results for different types of distortion are illustrated in Table 1 "Distortion" columns.'-' means that the results of the metric for the samples applied with this kind of distortion are meaningless.VMAF presents good performance on DS distortion, in which it reports a correlation around 0.85.TPDM shows the best perfor-Fig.4. Scatter plot of objective metrics vs. MOS.mance on GN and SWT with SRCC = 0.77 and 0.80.VMAF again exhibits the best performance on TMC, but the correlation is only 0.65.D1 and D2 showcase best results on QP and MQ with SRCC around 0.75 and 0.80, indicating that D1 and D2 might be good at predicting quantization distortion.PCQM PSNR reports a correlation around 0.70 on SOT, which is obviously better than most metrics.GTC is the most challenging type of distortion, in which no metric reports a correlation higher than 0.6.

Weakness of SOTA metrics
Given that the highest correlation of the SOTA metric is only around 0.6, revealing that the SOTA metrics have weaknesses which are summarized as follows: for image and video-based metrics, one weakness is that projection might cause information loss [12] and mask original mesh distortion.Furthermore, their performance is influenced by background information, which causes unstable score magnitudes for different types of contents [1].For point-based metrics, the performance is closely related to the mesh sampling method.For the same mesh, different sampling methods and sampling resolutions can generate point clouds with obviously different perceptions, and consequently incur unstable metric performance [24].For model-based metrics, most of them do not consider color attributes and cannot deal with geometry lossless distortion.Besides, they have strict requirements for tested meshes, such as the same connectivity or the same vertex density between reference and distorted meshes [13].

CONCLUSION
In this paper, we create a large-scale textured mesh database called SJTU-TMQA which consists of 21 static textured meshes with diverse contents, rich distortion types, and accurate MOS.The relationship between MOS and distortion is analyzed, and four types of SOTA objective metrics are evaluated based on SJTU-TMQA.The results demonstrate that human perception is influenced by content characteristics and distortion types, and the best metric only achieves a correlation of around 0.60.This database can serve as a benchmark for objective metrics testing, providing opportunities for further metric research.

Fig. 1 .
Fig. 1.The 3D graphic source model of our database