IMD-Net: a Deep Learning-based Icosahedral Mesh Denoising Network

In this work, we propose a novel denoising technique, the icosahedral mesh denoising network (IMD-Net) for closed genus-0 meshes. IMD-Net is a deep neural network that produces a denoised mesh in a single end-to-end pass, preserving and emphasizing natural object features in the process. A preprocessing step, exploiting the homeomorphism between genus-0 mesh and sphere, remeshes an irregular mesh using the regular mesh structure of a frequency subdivided icosahedron. Enabled by gauge equivariant convolutional layers arranged in a residual U-net, IMD-Net denoises the remeshing invariant to global mesh transformations as well as local feature constellations and orientations, doing so with a computational complexity of traditional conv2D kernel. The network is equipped with carefully crafted loss function that leverages differences between positional, normal and curvature fields of target and noisy mesh in a numerically stable fashion. In a first, two large shape datasets commonly used in related fields, ABC and ShapeNetCore, are introduced to evaluate mesh denoising. IMD-Net’s competitiveness with existing state-of-the-art techniques is established using both metric evaluations and visual inspection of denoised models.


I. INTRODUCTION
T HE demand on fidelity and quality of 3D surface mesh models is steadily on the rise. Mesh models are becoming omnipresent in a variety of application domains, from archaeological preservation and reconstruction, over retail and reverse engineering, to various biomedical fields like neurology and orthodontics. 3D surface meshes can either be crafted by hand, which allows fine-grained control of their quality at the cost of significant time and resources, or they can be acquired using 3D scanning technologies, whose intrinsic physical imperfections inevitably introduce noise to the reconstruction process. The noise originates in the scanning process and indirectly affects the recovery of individual 3D points on the model surface which are then connected to generate a 3D mesh. The mesh denoising process is inherently constrained by the quality of the preceding mesh generation stage, but it can take advantage of geometric, topological and connectivity information of the mesh structure to produce high-fidelity mesh models.
A mesh vertex (or a face normal) can be denoised in a local fashion, using primarily information from a local neighborhood around a vertex (or face). With progressing research, it is found that increasing the field of view by including larger neighborhoods around a vertex or face of interest can have a beneficial impact on denoising results. However, there seems to be little work aiming at including information from beyond 2-or 3-ring neighborhoods in the denoising process. This is surprising, as the consideration of wider neighborhoods holds promise for denoising larger features more accurately. Also, an integration of increasingly global information about object shape, symmetries and surface structure in the denoising process could provide vital information to avoid introducing self-intersections and other anomalies that harm object fidelity and visual appeal.
Existing mesh denoising approaches generally have one thing in common, that they come with a set of model specific parameters, which require specific knowledge and careful tuning by a user [e.g., 1,2,3]. Often authors supply well working parameter defaults, but the fact remains that for each individual model user interaction might be required. This is a remnant from the recent past when denoising 3D mesh models still was a rare task performed by specialists. But in today's digital landscape, where every mobile phone can be a 3D scanner and large datasets of 3D models need denoising, requiring user interaction is a deployment inhibitor and ought to be avoided. Methods based on machine learning could provide parameter-free denoising at inference time. With an increasing number of noisy mesh models readily available, it also has become possible to incorporate information from numerous mesh models in the denoising process. Yet, the application of machine learning methods to mesh denoising has been lagging behind other fields such as image and natural language processing. This is primarily due to the irregular structure of mesh data. Approaches have been devised to overcome this irregular mesh structure, e.g., by voxelating meshes into 3D grids and applying 3D convolutions [4]. This, however, is undesirable: Surface meshes represent 2Dmanifolds in 3D-space, hence an approach that operates in 3D-space and not exclusively on the manifold lacks efficiency.
Summarizing the above, a parameter-free, deep learningbased method that consumes a noisy input mesh and produces in one pass a denoised output mesh is sorely missing from the field. Ideally, such a method would integrate suitable amounts of local and global mesh information in the denoising process. Also, the method would optimally feature a computational complexity akin to neural networks used on other 2D-manifolds like images.
The contribution of this work includes (i) exploiting the homeomorphism between genus-0 meshes and spheres to regularize the surface on an icosahedral grid, (ii) the icosahedral mesh denoising network (IMD-Net), a novel denoising technique based on equivariant conv2D-layers allowing efficient and high-quality denoising of closed genus-0 meshes (iii) introducing two large CAD model datasets as benchmarks to mesh denoising, the ABC [5] and the ShapeNetCore [6] dataset, (iv) experiments that demonstrate the competitiveness of IMD-Net with state-of-the-art methods at various noise levels. Figure 1 shows denoising on three different meshes using the proposed IMD-Net, illustrating how well the our network preserves sharp features and fine details.

II. RELATED WORK
Mesh denoising has a history of more than three decades and numerous approaches have been put forward. Early attempts transfer ideas from related fields, most notably iterative Laplacian smoothing from mesh smoothing [10,11] and various low-, band-and high-pass filters from signal processing [12,13,14,15]. These isotropic methods lack the means to preserve features and were quickly superseded by anisotropic techniques such as diffusion process-based methods [16,17,18] and the bilateral filter [19,20]. Another successful line of anisotropic methods splits mesh denoising into face normal filtering and vertex position updating by numerically integrating the denoised face normal field.
A considerable work has gone into finding suitable filters, yielding Laplacian smoothing [21], mean, median and alphatrimming filters [22,23,24] as well as fuzzy median [25,26] and random walk filters [27]. These filters, however, do not consider the regularity of a mesh. Zheng et al. [1] devised a joint bilateral normal filter (BNF) which uses a Gaussian weighted average of distances and orientation differences to surrounding faces in order to denoise face normals. Zhang et al. [3] used patches around each face to compute a guidance normal and incorporate deviations from the guidance normal in the joint bilateral normal filter (GNF). This produces impressive results if the patch is chosen well, triggering suggestions to select patches that adapt to corners and edges [28] or minimize the angular difference within each patch [29]. Throughout normal filtering methods the patches tend to be small and strictly local. This prevents the denoising process from integrating potentially beneficial information from non-local features and global shape. Further, most methods require selecting parameters for individual models inhibiting their application to larger datasets.
Several methods reframe mesh denoising as sparse optimization problems: [30,31,32,33]. He and Schaefer [2] proposed a L 0 -minimization combined with an edge-based feature-preserving shape operator to yield piece-wise flat surface regions that intersect at the preserved features. In [34,35] the total variation of face normals is minimized to smooth the surface while minimizing distortion of volume and shape. Others devise two-stage algorithms that first compute a base mesh using a smoothing scheme [36] or a regularizer [37,38] and then recover features from the residual between noisy and base mesh. The success of optimization-based methods often depends on prior assumptions on the noise distribution, and they require adjustment of (optimization) parameters to individual models, limiting their adaption to larger datasets.
As the focus on feature preservation increased, multiple schemes were proposed that classify vertices into features  [7]. Extrusive features like the wings and the tail are all mapped to one dense cluster. (c) An authalic parametrization produced with [8].
Clusters are less dense, but thin, long-stretched triangles introduce large distortions. (d) A quasi-isometric parametrization produced with [9]. Few and sparse clusters with no thin, long-stretched triangles.
and non-features using tensor voting in combination with kmeans clustering [39], eigenanalysis [40] or feature descriptors [41,42] before applying a filtering technique. Arvanitis et al. [43] proposed a coarse-to-fine mesh denoising approach that uses graph spectral processing to preserve feature normals in the denoising process. Other techniques add feature detection to existing normal filtering methods: Yadav et al. [44] replaced the Gaussian similarity function of the bilateral normal filter [1] with Tukey's bi-weight function, which reduces the diffusion of sharp features; and Zhao et al. [45] deployed a graph-based feature detection to select optimal patches for computing guidance normals of Zhang et al. [3]; in [46] a base mesh and a feature-detecting saliency measure are employed to the same end. The results of feature-detection based methods are sensitive to their ability of correctly classifying features, which sometimes leads to misclassified features or introducing artificial ones.
A recent development in mesh denoising are methods that involve machine learning. Denoising autoencoders have been used in various applications of 1D noise filtering [47] as well as 2D image filtering. Wang et al. [48] transfer the bilateral normal filter from image filtering and apply it repeatedly to each face to compose geometric descriptors which are subsequently clustered. In a cascaded normal regression (CNR), a regression function is fitted to each cluster using a radial basis function network. In [49] this method is complemented by a reverse descriptor that aims to recover geometric features which were previously lost in the regression. Nousias et al. [50] pursued a geometric deep learning approach that employs a conditional variational autoencoder consisting of a Gaussian encoder and a Bernoulli decoder, followed by one step of bilateral filtering to remove small artifacts. In a work extending GNF, Zhao et al. [4] proposed NormalNet, a cascaded deep 3D-CNN that processes voxelated patches to estimate a guidance normal. The reported results look promising but are dimmed by the high computational complexity of a 3D-CNN. Using a graph convolutional network, [51] proposed an elegant and well performing two stage approach for mesh denoising. The deep normal filtering network (DNF-Net) [52] denoises a mesh split into patches by extracting local geometric features. It employs a multi-scale feature embedding unit that extracts features representing local geometric context and two residual learning units that aim to progressively attenuate noise. DNF-Net reports stateof-the-art denoising performance, but the patch creation and denoising are time-consuming which limits the number of patches (and meshes) that can reasonably be used for training, restricting the generalization potential of the network.

METHOD COMPONENTS
The proposed IMD-Net is designed to efficiently and accurately denoise closed genus-0 meshes. It exploits the genus-0 property, which guarantees the existence of a bijective mapping between a mesh surface and the unit sphere. A spherical parametrization algorithm is employed to construct such a mapping. Parametrization algorithms can be distinguished by how well they preserve or minimize the distortion of intrinsic geometric metrics such as face angles: conformal mappings (Fig. 2b) [53,7], face areas: authalic mappings (Fig. 2c) [54,8] or both: isometric mappings. Unfortunately, isometric mappings generally do not exist between arbitrary genus-0 surfaces and spheres. In the context of this work, an algorithm is desired which is authalic but also does not create mapped triangles that are unnaturally stretched over large parts of the sphere surface. Recently, Hu et al. [9] developed a quasi-isometric parametrization method based on progressive meshes, which exhibits such characteristics (Fig.  2d). Advanced hierarchical spherical parametrizations [9] has also been recently used as a remeshing tool in generative learning of icosahedral meshes [55]. The method has a low computational complexity and achieves state-of-the-art per- The high-level pipeline of the IMD-Net: A noisy genus-0 mesh is spherically parametrized using the AHSP [9]. Employing the parametrization, the noisy and ground truth (GT) meshes are remeshed as regular meshes. The noisy remeshing is denoised with IMD-Net. For network training, the ground truth remeshing is used in the loss function.
formance and is therefore our choice of parametrization.
IMD-Net denoises data mapped onto the unit sphere. In the context of mesh denoising, the most desirable properties of deep neural networks are translation, scale, and rotation equivariance. In [56] and [57] networks were suggested that are translation and scale equivariant on spherical data. Spherical CNNs [58] define a spherical cross-correlation that is rotation-equivariant. The correlation is designed to satisfy the generalized Fourier theorem and can be efficiently computed using a generalized Fast Fourier Transform (FFT) algorithm. Signal and filters, both defined on a spherical-polar grid, are Fourier transformed, cross-correlated, inherently rotated to achieve equivariance, and finally inverse transformed. The authors report improved results on spherical images and for 3D-object detection. However, the number and angle of filter rotations is coupled to the grid size, and the sphericalpolar grid causes oversampling near the sphere poles. More importantly, a spherical CNN layer has minimal complexity of O(B 3 ) with B being the bandwidth of the grid, making the overall network computationally expensive. Cohen et al. [59] presented IcosahedralCNN, a network operating in the spherical domain by discretizing it as a subdivided icosahedron. The discretized surface is planarized into five padded, rectangular maps and arranged in an atlas. Hexagonal convolution filters are defined and kernel expansion in combination with weight sharing are applied to make the convolution gauge (and hence translation, scale and rotation) equivariant. Two layers are distinguished: A layer which takes non-gauge equivariant input features (singular) and outputs gauge-equivariant features (regular) is referred to as a S2R-layer; and a layer which has regular input and outputs features is referred to as a R2R layer. The low computational complexity as well as the guarantee of gauge equivariance convinced us to use the S2R and R2R building blocks from IcosahedralCNN to construct IMD-Net.

III. METHOD
The promise of this work is a novel deep learning-based denoising technique that efficiently processes and denoises closed genus-0 meshes. This task is made particularly challenging by the inherent irregularity of the mesh data structure, which is overcome in two stages, by spherically parametrizing meshes and remeshing them as regular, subdivided icosahedrons. The preprocessed meshes are fed to the denoising network. In the training phase, the network also receives preprocessed ground truth mesh which are used in the loss function. The high-level pipeline of the IMD-Net framework is presented in Fig. 3. Details of each aspect of the framework are explained in this section.

A. REMESHING
A closed genus-0 mesh B (e.g., Fig. 4a) possess the special property that its irregular mesh data can be mapped onto the regular domain of a unit sphere. This spherical parametrization M is used to create a remeshing of B as a subdivided icosahedron I r = {v, f }, where the number of vertices N v depends on a selectable subdivision level r with N v = 5×2 2r+1 +2. To create a remeshing, first I r is superimposed on M and the vertices of I r are radially projected onto the sphere (Fig. 4b). M's surface is divided into non-overlapping mapping cells of near-equal size, one around each vertex of the superimposed icosahedron (Fig. 4c). The mapping cells are computed by using the faces of I r 's dual, i.e. the subdivided dodecahedron. Subsequently, the mapping cells  In this case, relevant information might be lost, and therefore multivertex cells ought to be avoided, e.g., by using a sufficiently high subdivision. The output of the remeshing stage is the icosahedron I r of which each vertex was repositioned to a sample location on B (Fig. 4e). As shown in Fig. 3, a separate remeshing is created for the noisyB and the ground truth B mesh, but both depend on the spherical parametrization of the noisy mesh,M. Using a shared parametrization ensures that the vertices and faces in both remeshings represent exactly corresponding surface positions and patches in the original meshes.

B. ICOSAHEDRAL MESH DENOISING NETWORK
IMD-Net can be assembled using the gauge equivariant S2R and R2R layers from IcosahedralCNN. The network consumes a remeshed noisy mesh,Î r = {v, f } and outputs a denoised (or estimated) mesh,Ĩ r = {ṽ, f }. Specifically, the vertex positions of a subdivided icosahedron arranged as planarized atlases are the input (v) and output (ṽ) of the network. A modified U-net, referred to as compressed Unet and shown in Fig. 5, is chosen as architecture for IMD-Net. The architecture is based on the original U-net [60], but the concatenation operations in the decoder, which concatenate feature maps coming from the upsampling and skipconnection paths, are removed. Instead, feature maps from the encoder are directly added to the respective decoder level via skip connections. The modification follows mathematically sound and empirically successful concept of residual neural networks [61]. In the context of mesh denoising, the number of learnable parameters are reduced by about 33% while maintaining the denoising performance.
The three-channel input is transformed to regular equivariant feature maps by a S2R block, composed of a 3x3 S2R-layer, a batch normalization layer [62] and ReLU. The batch normalization averages over groups of six feature maps to preserve equivariance [59]. The ReLU function operates pointwise and is therefore equivariant. In the encoder, the S2R-block is followed by multiple levels of R2R-blocks and pooling layers. In the decoder, the flow is reversed with corresponding R2R-Blocks and unpooling layers. A standard 1x1 conv-2D (bottleneck) layer produces the three channels of the displacement vectors which are added pointwise to the input vertices to yield denoised vertices.

C. LOSS FUNCTION
In the training phase, the network is provided a ground truth (or target) mesh I r alongside the noisy meshÎ r . The faces f are identical and the vertices have the same ordering in a oneto-one correspondence. For any i ∈ [0, N v ),v i is the noisy counterpart of the target vertex v i andṽ i is the network's estimate for the respective denoised vertex. As shown in Fig. 5, the network is trained to learn displacement vectors d i and computes denoised vertices asṽ i =v i +d i , which is faster than directly learning denoised verticesṽ i .
The loss function penalizes errors in the positions of denoised vertices (L pos ) as well as errors in first and second order properties of the surface (L cur ). L pos guides the network to move vertices as close as possible to their target position (Eq. 1), denoising them in the process. Squaring the differences places greater significance on vertices that are far away from their target.
L cur is designed to minimize deviations of the surface's curvature (and the surface normals), thereby avoiding selfintersections. Normally, the second order discrete mean curvature is approximated at vertices using the Laplace-Beltrami cotangent operator. If, however, any triangle in the 1-ring around a vertex is close to degeneration and has corner angles approaching zero, the derivative of cotangent heads towards minus infinity, derailing the learning process. Fortunately, the mean curvature can be reformulated as an edgebased operator, outlined in Hildebrandt and Polthier [17]. The mean curvature computed for an edge e i as K(e i ) = 2 |e i | cos (θ ei /2)n ei , where |e i | is the edge length, θ ei is the dihedral angle between the two faces k and l that form the edge and n ei is the edge normal computed as n ei = VOLUME 4, 2016  (n k + n l ) / n k + n l . Using the edge-based mean curvature (which also depends on face normals), a mean square error can be constructed by Eq. 2, where N e is the number of edges and K j the j-th component of the edge-based mean curvature vector. This loss guides the network to produce denoised surfaces that approximate second-order properties, namely edge-based mean curvatures, of the target surfaces.
L cur accelerates the network's learning process and helps to create denoised surfaces that approximate well the local smoothness and global shape of the target surfaces. The combined loss function can then be formulated as L tot (Ĩ r , I r ) = L pos (ṽ, v) + α · L cur (ẽ, e), where α allows adjusting the influence of the curvature loss.

A. DATASETS
Traditionally, a small selection of individual models has been used to evaluate and compare algorithms for mesh denoising [50,51,52]. To the best of the authors' knowledge there is no established dataset to evaluate mesh denoising algorithms. In this work, two independent benchmark datasets are selected for the experiments, the ABC [5] and the ShapeNetCore [6] dataset. While both datasets are composed of Computer-Aided-Design (CAD) models, they focus on vastly different classes and shapes. The ABC dataset is a collection of one million CAD models featuring mostly basic geometric shapes used in manufacturing, including screws, plates, rods and blocks. For this work, 10.5k models are selected at random, which contain on average 14.6k vertices and 29.2k faces with a standard deviation of 7.7k vertices and 15.3k faces. ShapeNetCore is a subset of the ShapeNet dataset and includes about 51.3k models in 55 common object categories such as airplanes, cars and tables. For the experiments, 25k models are selected with 9.7k vertices and 19.5k faces on average and a standard deviation of 1.4k vertices and 2.8 faces.
When models are not genus-0, they are transformed into genus-0 meshes using the technique previously described in [63]. As is common practice, the model vertices are subjected to artificial Gaussian white noise along their normals n v i . The amplitude of the perturbation is chosen as a fraction of a model's mean edge length l e yieldingv i = v i + x · n v i , x ∼ N (0, b · l e ), where the factor b determines the level of noise and is used to generate low, medium and high noise by selecting b as 0.1, 0.2 and 0.5, respectively. Figure 6 shows the influence of different noise-levels on an example model.  After noise application, the noisy models and their respective ground truths are remeshed as subdivided icosahedrons with 6 subdivisions (I 6 ) using the preprocessing scheme discussed in section III-A.

B. EVALUATION METRICS
The quality of denoising results is evaluated using two error metrics commonly deployed in mesh denoising. The mean angular difference, E Θ (Eq. 3) measures the average difference between denoised and ground truth face normals. Where N f denotes the number of faces in the mesh,ñ i and n i are the unit length normals of denoised and ground truth faces, respectively. A low error indicates a good recovery of the ground truth's shape and first-order surface properties.
The mean distance error, E D (Eq. 4) measures the average distance between two sets of vertices. Where N v denotes the number of vertices in the mesh,ṽ i and v i refer to the denoised and the ground truth vertices, respectively. A low E D indicates volume and scale are well preserved in the denoising process.

C. IMPLEMENTATION DETAILS
The icosahedral mesh denoising network, IMD-Net shown in Fig. 5 is implemented in python using PyTorch [64]. The implementation follows the outline of [59] to achieve convolution layers that produce equivariant feature maps on the surface of subdivided icosahedrons. The encoder path consists of one initial S2R-block producing 28 regular features, followed by three downsampling R2R-blocks outputting (44, 71, 114) regular features. The decoder block is arranged in reverse using three upsampling R2R-blocks outputting regular features and a final bottleneck layer that produces the output. IMD-Net is trained for 48 epochs with α = 10 in L tot at a learning rate of 4.5e −4 with Adam optimizer and a batch size of 8. The datasets are randomly split into training and test sets in an 80/20 ratio. The networks are trained for 48 epochs on the training set.

1) Non-Learning based Methods
IMD-Net is compared against several state-of-the-art mesh denoising methods, namely with bilateral normal filtering (BNF) [1], guided normal filtering (GNF) [3], cascaded normal regression (CNR) [48] and non-local low-rank normal filtering (NLLR) [65]. The error metrics are computed for the test sets of ABC and ShapeNetCore datasets and are shown in Table 1  noise levels IMD-Net shows an even larger improvement of 37.8% and 46.6%, respectively. In addition, the network also achieves significantly lower values of E D , e.g., a reduction of 41.2% and 29.5% for high noise levels. This implies that vertex positions denoised by IMD-Net are closer to the ground truth positions than with other algorithms and yield a more accurate volume recovery. The improved error metrics of IMD-Net indicates that the signal-to-noise ratio (SNR) at which it fails to distinguish true and noise-induced features is significantly higher than those of the competing algorithms. This can be credited to IMD-Net's design as a cU-net, which allows it to derive denoising filters combining various amounts of local and global information.
Most modern works, including the algorithms compared here ( [1,3,48,65]), first denoise the face normal field and then reposition the vertices to best fit the denoised normal field. When computed in sequence, a particular denoised normal field constrains the positions of the denoised vertices and defines a lower bound for E D . This can contribute to an algorithm producing a lower E Θ but a higher E D than another one. IMD-Net, in contrast, is trained to denoise the vertex positions directly, taking into account derived quantities such as the face normal field and curvature in the loss function. This allows the network to optimize the output in regard to both error metrics, explaining the consistently low deviations from the ground truth.
With IMD-Net's quantitative progress and improved ability to generalize over large datasets evident, it is revealing to look at individual models in order to assess its impact on visual appearance, quality and fidelity of denoised models. Fig. 7 shows a qualitative comparison of denoising results at different noise levels. The magnified part of the object helps to understand how the algorithms under investigation handle flat areas, smooth transitions and sharp edges within a model. Most methods produce visually appealing results on parts of the surfaces. But some fail to accurately denoise smooth transitions and introduce artificial sharp edges of varying quality (Fig. 7c-d, top model). Others capture the transition well, but keep noisy oscillations in flat parts of the model (Fig. 7e-f, top model; Fig. 7c and f, bottom model). Again, others cause self-intersections in fine details of the mesh (Fig. 7c and   middle model) or distort the volume noticeably (compare the size of the chicken's comb in Fig. 7, middle model). IMD-Net manages to recover flat surface, smooth transition as well as sharp edges with high fidelity. The proposed network conveniently avoids self-intersections, attributed to the use of second order properties L cur in the loss function. Figure 8 shows some example models from ShapeNetCore [6] that were denoised by the competing denoising algorithms and IMD-Net at three different noise levels.

2) Learning based Methods
Additionally, IMD-Net is compared to several learning-based methods, specifically to the data driven filtering using autoencoders (DFA) [50], denoising with facet graph convolutions (FGC) [51] and Deep Normal Filtering Network (DNF) [52]. FGC and DNF are pre-trained on 21 models from synthetic dataset of [48]. These methods require enormous computational resources and include time consuming pre-and postprocessing stages, which restricts us from training them on large datasets (containing thousands of models) like the ones used in this work. Unlike, other learning based methods [50,51,52] which generate multiple patches from the shape and train the network using these patches, our proposed IMD-Net treats the complete mesh as an input. For all competing works the pre-trained networks published by the respective authors are used and for IMD-Net the network trained on the ABC dataset is used. Fig. 10 presents denoising output of learning based methods on two shapes from ABC dataset and two shapes from test set of [48] synthetic dataset. DFA is pre-trained on eight CAD models and fails to accurately denoise the shapes from both the datasets. FGC and DNF perform quite well on test set of [48] (Fig. 10d-e, bottom two models) but fails to generalize on ABC dataset shapes (Fig. 10d-e, top two models) leaving traces of noisy oscillations in flat regions. IMD-Net, despite being trained on CAD models, performs well and rivals the performance of DNF and FGC on [48] test set (Fig. 10f, bottom two models). We attribute this to the fact that IMD-Net is trained on large datasets of meshes (and not patches) hence generalizable to a wide variety of flat, curved or sharp (c) BNF [1] 5.7°, 9.6 (d) GNF [3] 5.7°, 9.2 (e) CNR [48] 5°, 9.1 (f) NLLR [65] 5.4°, 10 (g) IMD-Net  features within a shape. Also, IMD-Net produces result for a single model in few minutes (including pre-processing), whereas the other methods need between many minutes and hours. For e.g. the DNF required more than 100GB of intermediate storage, more than 80GB of RAM and a couple of hours for preprocessing, inference and postprocessing of the Armadillo model ( Fig.  10 last row).

V. ABLATION STUDIES
The base configuration for IMD-Net, particularly the choices for the subdivison level, parametrization method, network architecture and the network depth, require both validation and justification. Therefore, three ablation studies are conducted, which explore the impact of these parameters on the performance.

A. SUBDIVISION LEVEL
To observe the influence of subdivision level r on the shape detail, models from ABC dataset [5] are subjected to medium noise and preprocessed using icosahedral meshes with sub-  (c) DFA [50] 17.2°, 19.7 (d) FGC [51] 10.6°, 9.2 (e) DNF [52] 9.6°, 7.9 (f) IMD-Net In absolute values, an increase from S5 to S6 reduces the Hausdorff distance by 7.01 × 10 −5 and a further increase from S6 to S7 by 1.56 × 10 −5 . This renders the increase from S5 to S6 about 4.5 times more efficient than the subsequent increase to S7. At the same time, the vertex and face count in a remeshed model grow exponentially by a factor of about 4 with each increase of the subdivision level (Fig. 9b). This has severe speed and memory implications for any algorithm denoising the model as the size of the input data grows by the same factor. The implication for IMD-Net is that the required model size and memory resources grow exponentially with the subdivision. Together, the reciprocal nature of an exponentially slowing reduction in the Hausdorff distance and an exponential growing number of vertices and faces impose strict limits on feasible subdivisions. If the subdivision is too small, the accuracy of the remeshed model might be insufficient; if it is too big, the mesh size might make processing infeasible. In the given context, S6 is selected for denoising experiments, as it guarantees a good tradeoff between a small Hausdorff distance and a reasonable model size.

B. PARAMETRIZATION ALGORITHM
In computer graphics, different parametrization methods are compared with respect to the distortion the algorithms introduce in either the area or angles of faces. However, in this work, the parametrization is used as a preprocessing tool to remesh an input mesh onto a semi-regular grid. This approach, like any other remeshing technique, may result in the loss of shape details. Therefore, the different parametrization algorithms are compared with respect to the loss of 3D shape information caused by the parametrization and remeshing of an input mesh.
To observe the impact of different parametrization algorithms, models from the ABC dataset [5] were parametrized using three different parametrization schemes (conformal, authalic, quasi-isometric) and remeshed at subdivision level r = 6. The remeshed output shapes were then quantitatively compared to the input ground truth in terms of normalized Hausdorff distance, shown in Fig. 11.
The AHSP algorithm [9] applied in this work yields the smallest Hausdorff distance after remeshing. The other two methods, exemplifying conformal and authalic approaches, have approximately a 16.2 and 5.0 times higher Hausdorff distance when compared to the quasi-isometric AHSP approach. Further, a qualitative visualization of some models parametrized and remeshed using the three different methods is shown in Fig. 12. The clustering of extrusive shape features in conformal parametrizations results in incomplete shape preservation after remeshing. The authalic parametrization fares better, but it suffers from stretched triangles in the [7] [8] [9] 10 −4 10 −3 Norm. HD FIGURE 11: Mean and standard deviation of the normalized Hausdorff distance between the remeshed ABC dataset models at r = 6 and the ground truth for three different spherical parameterization methods: conformal [7], authalic [8] and quasi-isometric [9].
parametrization, and, as a consequence, the remeshed output misses shape details in regions of high Gaussian curvature. In contrast to both, AHSP consistently outputs detail and shape preserving remeshings and is therefore our choice of parametrization.

C. NETWORK ARCHITECTURE
In this ablation study, the cU-net is compared against the original U-net, and two other conceivable network architectures, derived from the base configuration. A ConvNet, which keeps the feature map size constant throughout the network (no pooling and unpooling) and drops the skip connections; and an autoencoder (AE), which shares the cUnet architecture in all but the missing skip connections. The four architectures are trained and tested in an identical setup using ABC dataset at medium noise level. Fig. 13 compares from left to right the denoising performance measured by E Θ , the average GPU memory footprint and the model size in terms of trainable parameters. The comparison of E Θ in the leftmost chart reveals that the AE fails to learn to denoise meshes, reducing the error to only 18.36°where the other architectures achieve around 4°. This can be explained by the missing skip connections, which are essential for the given task. The other three architectures make local information directly available for vertex denoising, either through skip connections or by avoiding down-and up-sampling. These three architectures perform well, with cU-Net slightly outperforming the other two. However, it becomes obvious that Conv-Net is not a suitable choice when looking at the GPU memory usage, presented in Fig. 13b. With a batch size of only 4 it occupies a staggering 78.2% of the GPU memory, where cU-Net only requires about 50%. Since there is no pooling and unpooling in the network, the deeper blocks with many feature layers consume an enormous amount of memory. As it does not perform better than cU-Net or U-net, it can be argued that Conv-Net produces internally a lot of features that do not carry relevant information for denoising, rendering the other architectures more efficient.
Finally, the size of the different models, presented in Fig. 13c, explains why cU-Net is to be preferred as archi- FIGURE 12: Models from the ABC dataset parametrized and remeshed at r = 6 using conformal [7], authalic [8] and quasiisometric [9]. Remeshings generated using AHSP best preserve details and shapes. The normalized Hausdorff distance (Norm. HD) is shown below the remeshed model for comparison. tecture over a vanilla U-net. By replacing the concatenation with an addition operation, the model size is reduced by about one third (927K vs 1.4M parameters) and performance is kept steady, clearly making cU-Net the architecture of choice.

D. NETWORK DEPTH
Another ablation study is conducted to observe the influence of network depth on IMD-Net's denoising performance. The depth refers to the number of R2R-Blocks in encoder and decoder. The network depth is a crucial hyperparameter, deter-This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3164714, IEEE Access The validation error metrics of four cU-nets ranging from depth-1 to depth-4. Depth-3 performs consistently better than others on the two metrics.
mining the maximum receptive field of features and thereby influencing the amount of global information available for denoising. In order to allow a fair comparison, the four networks are normalized with respect to the model size, so that each model has about one million trainable parameters. The performance over 48 training epochs, measured by E Θ and E D , is plotted in Fig. 14. At the end of the training, E Θ is least on depth-3 network. With respect to the E D , depth-3 and depth-4 networks have the lowest positional distance. Depth-4 network's slight decrease in performance indicates that additional blocks add model complexity but little useful information. This encourages using a depth-3 network, as it yields the best tradeoff between denoising performance and model complexity.

VI. CONCLUSION
This paper proposed a novel mesh denoising technique, the icosahedral mesh denoising network (IMD-Net), which is a deep neural network especially suited to denoise closed genus-0 meshes with high quality and fidelity in a single pass.
IMD-Net consumes a noisy mesh, computes a remeshing and predicts denoised vertex positions while preserving and enhancing features of the original mesh. Beyond vertices and faces of a noisy mesh, no other explicit information about noise distribution or surface characteristics is needed. Training and inference of IMD-Net are exceptionally fast, as it is based on standard conv2D-layers. The nature of the proposed deep learning approach ensures that no parameters need tuning at inference. IMD-Net was trained and tested on two large-scale model datasets, ABC and ShapeNetCore, and compared to state-of-the-art learning and non-learning based algorithms. To the best of our knowledge, it is the first time mesh denoising algorithms are evaluated on such large datasets. The experiments showed that IMD-Net consistently outperforms other algorithms, both in terms of objective evaluation metrics and subjective visual inspections. Training on large dataset had the merit of being generalizable on shapes from different dataset, unlike other learning based methods. IMD-Net also illustrates that using the complete mesh as network input benefits the denoising process by utilizing global shape information. IMD-Net's performance advantage grows with increasing noise levels, standing proof of the positive impact that integrating local, non-local and global mesh information into the denoising process can have. Future work includes exploring to mark a consistent seam on non genus-0 surfaces to parametrize them. This way, the remeshing approach proposed in this work can be utilized to remesh the surfaces and IMD-Net can be used to denoise meshes of higher genus.