Relief Extraction From a Rough Stele Surface Using SVM-Based Relief Segment Selection

Archaeological steles having a rough surface due to long periods of weathering make recognizing the inscription difficult. In this paper, we propose a machine learning-based method to extract reliefs for the inscription from a rough stele surface. Relief candidate segments are initially obtained by using a curvature-based method, which include not only actual reliefs but also noises such as dents and scratches. Then, relief segments are selected using a support vector machine classifier that is trained with various features extracted from relief candidate segments. While conventional methods using a single geometric feature easily fail to detect reliefs from the rough surface, the proposed method utilizes 79-dimensional features consisting of appearance-based, cross section-based, and local extrema-based characteristics of each candidate segment to determine whether the segment is relief or not. Using the proposed method, the inscription of the stele Musul-ojakbi made during the Silla Dynasty AD578 were completely recognized. The experimental results demonstrate that the proposed method accurately extracts reliefs and achieves the highest performance on the rough stele data. The performance of the proposed method is about 8.95% and 10.4% higher than the best of the conventional methods in terms of the F1-score and the SIRI, respectively.


I. INTRODUCTION
Among ancient cultural heritage relics, stone steles are mostly regarded as valuable historical materials because of their political and cultural records of the people that lived in the era they were created. The inscriptions on the steles are presented in the language or symbols of ancient people. However, the stone steles that have been exposed outdoors for a long time wore out due to weathering, and might be discolorized with impurities.
A typical method to identify the contents of the stele, such as characters and drawings, is the use of the stone rubbing technique. Since the resultant image of the rubbing process depends largely on the skills and experience of the persons applying the technique, several results are repeatedly produced and examined by comparing one another for accurate analysis. However, when applying the rubbing method, the surface of the stele may be damaged further by physical contacts and contaminated by ink.
The associate editor coordinating the review of this manuscript and approving it for publication was Claudio Cusano .
To solve these problems, computer-aided methods have been developed [1]- [18]. The stele will not be damaged nor contaminated since any physical contact is not required. In addition, contrast to the rubbing technique, the quality of 3D scan data acquired does not depend much on the proficiency to handle the 3D scanner.
The computer-aided methods are categorized into two approaches; one that helps archaeologists to read the engravings by means of effective visualization, and the other that extracts reliefs from the relics automatically.
In image-based visualization methods [1], [2], the target is photographed iteratively by adjusting the direction and position of the light, and shade variation is used to identify reliefs that were not easily visible. But there is a hassle to adjust the lighting in a controlled acquisition environment for accurate shading visualization. Rendering enhancement of surface concavities and convexities by dynamically adjusting the effective light position for different areas of the surface has been widely utilized [11], [12].
For recognizing the inscriptions or extracting illustrative sketches automatically, ridge-valley edges of engravings were detected [14]- [17]. In [10], the relief segments are approximated using 3-order polynomial fitting to emphasize the surface reliefs for illustrative shading. This method explores the convex, concave and inflection properties surrounding a point and applies shading techniques using polynomial approximation, thereafter, the intensity for each point is updated. In [18], eroded inscriptions were visually emphasized by morphological filtering algorithms. By applying Laplacian smoothing and morphological subtraction to multiple different scale meshes, morphological residuals representing different levels of carvings were detected and visualized. In [13], to maximize visualization effects for a physical artifact, an interactive 3D visualization technique that highlights its features on the artifact was presented.
For automatic relief extraction, a depth-based method was developed where depth values of vertices within the 3D data were estimated globally optimizing local depth increments [4]. Assuming that the depth values of the base surface and of the reliefs are respectively Gaussian-distributed, regions deeper than the threshold determined using an EM algorithm are regarded as reliefs. In [6], a curvature-based method was presented to extract reliefs. For each vertex, two principal FIGURE 2. Two 3-D scanned characters from Musul-ojakbi. The left character is easily recognizable with a few dents, while it is difficult to recognize the right character as the surface has many dents and cracks.
curvatures are obtained. Canal-shaped regions where vertices have both large curvature magnitude and large ratio of the two curvatures are determined as relief regions.
In this paper, we aim to automatically extract reliefs for the characters from the stele Musul-ojakbi, a Korean treasure No. 516 that was produced to commemorate the construction of a reservoir in year ''Musul'' (578 AD) during the Silla dynasty. It recorded the construction information that 312 workers built a reservoir in Yeong-dong village for 13 days in year ''Musul''. The content includes the names of the supervisors and sculptors. It is noteworthy that it contains the Idu script language, the writing system with borrowed Chinese characters. Musul-ojakbi is regarded as an important material for research on irrigation facilities, cultural history and the Idu script language.
The characters of Musul-ojakbi had been inscribed in Chinese characters on a very rough surface as shown in Fig. 1. As the surface had not been grinded flat before engraving as well as has been weathered for a long time, it is very challenging to recognize the inscriptions by the visual and tactile perception of the observer or by the rubbing technique. Fig. 2 shows two character examples cropped from different locations on the stele's surface. The meshes are colorized according to the pseudo-depth, the z-coordinate of vertex, where the deepest and the highest points are represented in blue and yellow, respectively. It is noteworthy that the average psuedo-depth of the deepest valley of the reliefs is just 0.27 mm, while the pseudo-depth range of each mesh is larger than 2.4 mm.
Since this stele has a lot of dents and cracks on this very uneven surface, the conventional automatic methods using a single geometric feature, such as depth or curvatures, easily fail to extract reliefs also. The rough surface causes inaccurate depth estimation and makes the Gaussian distribution assumption for the depths invalid. Since the curvature calculation involves a second-order derivative, noisy curvatures result from the rough and weathered surface. Consequently, the curvature-based method extracts not only relief segments but also false segments of holes and cracks with noisy segment boundaries.
In this paper, we propose a machine learning-based method that extracts relief regions from the 3D scan data of the rough and weathered stele. Fig. 3 shows the overview of the proposed relief extraction method. First, candidate segments potentially to be relief are extracted using the curvature-based method. The extracted relief candidate segments contain not only actual reliefs but also noises such as dents and scratches. In the proposed method, various geometrical features are gathered for each relief candidate segment, and trained using a kernel support vector machine (SVM). Then, the trained SVM model is exploited to remove the noise segments out.
The composition of this paper is organized as follows. Section 2 briefly introduces relative works of relief extraction. Section 3 explains the proposed method. Section 4 organizes experimental results and finally Section 5 organizes the conclusions.

II. RELATED WORK
In this section, conventional automatic relief extraction methods are reviewed briefly. Considering 3D geometric shapes of reliefs, the use of depth information is the most intuitive approach. These methods estimate the depth at each point position, and the relief and background are identified based on a threshold [3], [4]. Engraving inscription on a planar surface would allow easy depth estimation. However, the surfaces are not usually flat enough.
In [3], [18], a base surface representing a virtual surface before engraving is created by smoothing the 3D data, and the distance of each vertex to the virtual base surface is used for the depth value of the vertex. However, it is difficult to search for optimal parameters for smoothing, so this does not provide an appropriate base surface.
In the depth estimation-based relief extraction method (DRE) [4], a relative depth of each point is estimated without obtaining the base surface. Instead, the normals of the virtual base surface are estimated by smoothing normal vectors. The relative depth values of a point with respect to its adjacent points are obtained using the smoothed normals. Then, the depth values of vertices are determined globally by optimizing local relative depths in the least squares sense. Assuming that the surface consists of the background and the relief regions and the depths for the two classes are modeled as a mixture of Gaussians, the relief region is segmented based on the threshold determined by an EM algorithm.
For a severely rough surface, however, the normals of the virtual base surface are not obtained accurately and the Gaussian assumption on the depth distribution is easily invalidated. Consequently, the reliefs are not well distinguished from the background.
Another approach to automatic relief extraction exploits the curvatures of the 3D data [6], [9]. Lawonn et al. [6] presented the curvature-based relief extraction method (CRE) by adapting the Frangi filter for extracting reliefs from the 3D data. The Frangi filter has been widely used in medical field to find blood vessels using the curvatures that are prominent features describing the local surface [19]- [21].
Since our method utilizes a modified version of CRE for potential relief candidate segmentation, we explicate CRE and the modified variant.
Lawonn et al. modified the vesselness measure used for the medical images [5] to represent how much the local surface of a point in a mesh is close to the canal shape [6]. The vesselness V for carving extraction from a mesh is given as where k 1 and k 2 (|k 1 | ≥ |k 2 |) are the maximum and minimum principal curvatures at a point, respectively. and represent the ratio of the principal curvatures k 2 k 1 and the magnitude k 2 1 + k 2 2 , respectively. The region of connected points with nonnegative vesselness values is extracted. In the CRE method, the extracted regions are ranked to select several carvings in terms of saliency defined as a total vesselness sum of the carving region.
From (1), when k 1 ≥ 0, the first term referred to as V shape represents how much a local surface is similar to a canal using the ratio of the curvatures and the second term referred to as V depth represents the depth of a local surface using a magnitude of the curvatures.
Specifically, the parameter α in V shape controls the width of the detected region. Only the shape close to the canal is extracted for a large α. The small noise regions disappear, but the relief corresponding to a single stroke of a character becomes thin and even tends to be separated. In contrast, the smaller the value of α, the larger the detection of the extracted area with a high presence of noise. Comparatively, the parameter, β is used to control the second order structureness in the canal.
However, the CRE method that was originally developed to process the drawing carved in plasters is not effective to extract inscriptions on steles with rough surfaces. This is because fluctuates severely due to noisy curvatures of the rough surfaces, which results in the very messy boundaries of the extracted areas and it is very difficult to search for an appropriate α. In addition, since the characters of the inscription have not only long strokes, but also short ones, contrast to the carvings in plasters, the saliency that gives a higher rank to a long connected region becomes meaningless.
In the modified curvature-based relief extraction method (MCRE) [9], such problems are alleviated by nullifying the vesselness values smaller than a threshold T V to zero and deactivating the saliency-based selection. We utilize MCRE for initial relief candidate segment extraction. Fig. 3 illustrates the proposed relief extraction method. Since the whole 3D scan data of the stele are too large to process efficiently, we partition the data manually into smaller rectangular meshes each of which contains a single character.  Initially, relief candidates are obtained by MCRE. Then, we gather various geometric features of each candidate segment. Finally, a kernel SVM binary classifier trained using the geometric features is utilized for determining whether each segment is relief or not.

A. RELIEF CANDIDATE EXTRACTION
For surface alignment as preprocessing, the coordinate frame of the rectangular mesh M is re-originated to the center position. The partitioned mesh can be approximated by a plane using the principal component analysis (PCA). Any slant of the mesh is got rid of by aligning the z-axis of its coordinate frame with the normal of the plane.
The aligned mesh is filtered with Gaussian smoothing to remove the noise that was captured during the 3D scanning. Gaussian smoothing effectively reduces the amplification of noise in the subsequent curvature computation. The relief candidate segments S and curvatures of all the vertices are obtained using the MCRE.
The obtained relief candidate segments include not only actual reliefs but also noises. Furthermore, the relief segments may have holes hindering the feature extraction used later. Therefore, we eliminate the holes and small noises by applying the morphological closing operation to the mesh in the segment refinement step. Fig. 4 shows the process of extracting the relief candidate segments. An input mesh is visualized using z values of vertices in Fig. 4(a). In Fig. 4(b), initial segments detected by MCRE are represented in yellow, while background regions in blue. There are holes on the relief candidate segments and noises. After the refinement, small noises are removed and holes are filled as shown in Fig. 4(c).

B. FEATURE EXTRACTION
For each relief candidate segment S i , we gather 79dimensional features representing various geometric characteristics of the segment. The proposed features consist of a 31-dimensional appearance feature, a 25-dimensional one related to the cross section of the segment, and a 23dimensional one describing local extreme depth and curvatures.
For a certain geometric information, we exploit five statistics including the mean, standard deviation, maximum, median, and minimum. For simple notation, the five statistical values of information l is denoted by S l ∈ R 5×1 . The following subsections describe the features used for relief classification in detail.

1) APPEARANCE SUBFEATURES
Considering that the relief represents a stroke for a character, appearance information including the size and position of relief candidate segment can be useful for relief classification. The first feature represents the area of the segment by the number of vertices to that S i belongs, denoted by N S i . We gather the statistical values of vertex position (x, y, z) within S i , i.e. S x , S y , and S z . For irregular-sized characters, the normalized position (x n , y n , z n ) obtained normalizing each coordinate of the vertex position to [0, 1] may be more appropriate, and thus, the statistics of the normalized vertex position, S x n , S y n , and S z n , are also gathered. As a result, a 31dimensional feature is obtained for capturing the appearance characteristics of each relief candidate segment.

2) CROSS SECTION SUBFEATURES
For shape-related features, we first extract a 2D skeleton from S i , i.e. the medial curve of S i , that represents the structure of the segment. Initially, all the vertices of S i are converted into 2D points by simply removing their z-coordinates and projecting them on a finer 2D grid to maintain accuracy. In our experiment, the grid spacing is set to one third of the average length of edges within the input mesh.
Then, a skeleton is obtained by thinning the region as in [22]. For estimating reliable features, the junction part where several strokes join is removed out from the obtained skeleton. Consequently, the skeleton can be partitioned into several skeleton curve segments without junction.
The cross sections of a relief resemble the shape of an artificial canal, as the relief was made by chipping the stone with a chisel. The next features indicate how much the cross sections of S i are close to the canal shape by approximating to quadratic functions. Fig. 5 depicts how the cross section is reconstructed. From the skeleton curve segments in Fig. 5(a), we uniformly sample points where cross sections are to be reconstructed. In our experiment, one fifth points were sampled from the skeleton curve segments. At each 2D sample point p, the normal direction is determined by using PCA with the skeleton points around the sample point as shown in Fig. 5(b). Letting n denote a 3D augmented vector of the normal direction by appending zero to the normal for the z-coordinate, a 3D plane P containing p can be generated so that P is spanned by two orthogonal vectors, [0, 0, 1] T and n. Here, p is a 3D point within S i which is the closest to p except the z-coordinate.
The 2D cross section of S i at p is reconstructed with the intersection of S i and P, as shown in Fig. 6. Specifically, triangle edges within S i crossing P are obtained and the intersection points of the edges with P are accurately found.   The 2D cross section for p is yielded by projecting the intersection points onto P with n and [0, 0, 1] T as u-and vdirections, respectively.
The width w and depth d of the cross section and the curvature magnitude at p can be regarded as the basic characteristics of the cross section of p . After reconstructing the cross sections of all the sampled skeleton points, we determine a 15-dimensional feature consisting of S w , S d , and S . For analyzing the cross sectional shape further, the cross section is approximated to a quadratic function of the form v = au 2 + bu + c. To deal with various relief cases, we normalize u values to [0, 1] before the quadratic regression. The coefficient a implies the approximate shape of the cross section by indicating convexity with the ratio of the depth and width of the segment cross section. Typically, the relief segments have a larger a than the noise segments.
In addition, an approximation error of the quadratic regression can measure how similar the cross section is to canals, VOLUME 9, 2021 FIGURE 9. Subjective comparison of the relief extraction methods. (a) Input meshes. DRE [4] in (b) yielded unsatisfactory results due to undetected strokes or many thick noise detection. CRE [6] in (c) and MCRE [9] in (d) are better than DRE, but there are still a lot of falsely detected noises. The proposed method in (e) produced very pleasing results similar to the ground truth in (f) by selecting and removing noise segments successfully.
because the relief segment artificially made tends to have a near-symmetrical cross section. The mean squared error (MSE) between the cross section and the approximated function is used for the approximation error, and the relief segment usually has a lower MSE value than the noise segment. Based on the observations, S a and S MSE are extracted for composing a 10-dimensional cross section feature. Fig. 7 shows the quadratic approximation for cross sections of relief and noise segments. The cross section of the relief segment is more convex and symmetric than that of the noise segment.

3) LOCAL EXTREMA SUBFEATURES
Although there is no record of the process of making the stele of the Silla dynasty era, it is believed that the relief of the stele was carved out by chiseling the stone. Considering the stele making process, locally deepest points in the surface can be seen as the trace of the end point of a chisel.
In the proposed method, the depth d is approximated by the z value of vertex. A point p is the local depth maximum (LDM) point when p is the deepest point within the 1-ring neighborhood of p.
This is an important factor in classifying the relief segments. In the relief segment, the local depth maximum points are mainly located along the skeleton of the relief segment. Since, however, the noise segments are different from the shape of the canal, the LDM points are distributed randomly and tend to deviate from the segment skeleton. That is, letting d L and d L denote the depth value of the LDM point and the distance of the point to the skeleton, respectively, d L varies largely for the noise segments. Based on this observation, we extract a 11-dimensional feature consisting of the number of the LDM points N d L , S d L , and S d L .
The curvatures, widely used descriptors for the local surface, can be utilized in a similar way. Since the curvature magnitude and the curvature ratio have been already estimated for all the vertices in MCRE, local curvature magnitude maximum (LCMM) points as well as local curvature ratio minimum (LCRM) points can be easily searched as interesting points. The final 12-dimensional feature consists of the numbers of LCMM points and LCRM points, and the statistics with respect to the values of LCMM and LCRM that are denoted by N L , N L , S L , and S L , respectively. Fig. 8 shows the positions of the local extrema points.

C. SVM-BASED BINARY CLASSIFICATION
Each relief candidate segment S i is classified as relief or not by the SVM classifier based on the extracted features of S i TABLE 1. Quantitative F1 score results of the different relief extraction methods for the nine characters in Fig. 9.

TABLE 2.
Quantitative SIRI [26] results of the different relief extraction methods for the nine characters in Fig. 9. [23]- [25]. The inscription of the stele we are interested in contains only several dozens of characters each of which consists of several strokes. Consequently, the number of relief candidate segments extracted is only a few hundred. This is why the SVM framework that performs well on small data sets of high dimension is employed to select the relief segments [24], [25].
In order to prepare the data for training an SVM classifier, relief candidate segments are extracted from the meshes in the training dataset. After the proposed features are obtained from the relief candidate segments, they are standardized for effective learning. Each segment is binary labeled ('relief' or 'noise') manually to generate the ground truth segment label L S . Then, the SVM classifier is trained using the 3-order polynomial kernel with the features and labels.

IV. EXPERIMENTAL RESULTS
The hyperparameters in this experiment are as follows: σ of Gaussian smoothing for mesh preprocessing was set to 0.9. For MCRE, α and β used in the vesselness measure were set to 0.6 and 0.1, respectively. T V for noisy boundary reduction was set to 0.4.
For segment refinement in relief candidate extraction, the morphological closing operation was performed twice to eliminate holes and small noises. The sampling rate of skeleton curve segment for the cross section reconstruction was fixed to 0.2. The hyperparameters of the SVM model in the proposed method were determined using the grid search. A third-degree polynomial kernel was utilized with the scale of 7, and the box contraint was set to 1. The standard feature scaling was adapted.
We used the 3D scan mesh data obtained from the stele Musul-ojakbi with a size of 672.9 × 987.1 × 32.4 mm (w × h × d). The scanned data contains approximately 23 million vertices, with an average resolution of 250 µm and an accuracy of 30 µm.
For the experiment, 169 character regions were partitioned from the 3D scan data. The 3D scan data were cropped so that each mesh had a margin of 1 mm from the outer edge of a single character. Apart from L S having labels per segment, we manually made another ground truth vertex label L V that indicates whether each vertex belongs to the relief. 70 characters were used for evaluation.

A. EVALUATION
To evaluate the performance of the proposed method, we compared it with DRE [4], CRE [6], and MCRE [9]. We separated the 70 characters into the training set of 61 characters and the test set of 9 characters. In the training datasets, there are 525 relief candidate segments consisting of 222 reliefs (true) and 303 noise segments (false). Fig. 9 illustrates the results of the various relief extraction methods for subjective comparison. Fig. 9(a) shows that the inscription had been engraved on the very uneven surfaces.  yielded unsatisfactory results by missing main strokes of the character or extracting thick noise segments. The rough surface caused inaccurate depth estimation, and even made the Gaussian distribution assumption for the depths invalid. As a result, the EM algorithm failed to determine a proper depth threshold. If the depth threshold is determined low, the reliefs become thin and main strokes are undetected as in the character mesh 1. For a high depth threshold, the reliefs become thick, but a lot of noise segments also appear as in the other meshes. CRE in Fig. 9(c) also produced very unpleasing segmentation results. As mentioned earlier, CRE uses the curvatures for the vesselness measure without the consideration of noisy curvatures resulting from rough surfaces. Consequently, the boundaries of the extracted segments were very jagged. MCRE performs better than CRE because the noise curvatures are partially suppressed. However, thin noise segments remain and the segments have holes and jagged boundaries also.
In contrast, the proposed method in Fig. 9(e) produced very pleasing results similar to the ground truth presented in Fig. 9(f). The results show a considerable reduction of the noise segments. This means that the trained kernel SVM model successfully selects and removes noise segments.
For the nine characters in Fig. 9, detail quantitative results of the different methods are summarized in Table 1 where the F1 score was measured based on the vertex label L V . The highest score is indicated in bold. The results obtained applying only each subfeature are also provided. The proposed method have the best performance compared to the conventional methods. Even using a single subfeature among the three subfeatures is superior to the conventional methods. The appearance information about the size and position of the segment is most useful among the three subfeatures for relief classification.
We also measured segmented inscription recognition index (SIRI), which assesses the guality of segmented inscriptions that are much closer to the subjective assessment of persons than the F1 score [26]. The GT is divided into four areas: inside of strokes, boundaries of strokes, background adjacent to strokes, and remaining background. Considering that each area has different importance to recognizing inscriptions, each area contributes differently to TP, FN, and FP calculations. Then, SIRI is obtained using the weighted TP, FN, and FP as follows: Due to the uneven class distribution, that is, vertices for relief are much less than the background vertices, the SIRI score is more useful than the others. The proposed method outperformed the conventional methods for all the characters in the SIRI scores in Table 2. The best performance was achieved when using the three subfeatures together, although sometimes it was better to use a single subfeature. Fig. 10 illustrates an objective comparison of the relief extraction methods for the 70 characters in terms of precision, recall, accuracy, F1 score, and SIRI. To evaluate the performance measures of the proposed method, we separated the 70 characters into five groups of 14 characters, and 5-fold cross validation was performed. The 70 characters contained 617 relief candidate segments consisting of 250 reliefs (true), and 367 noise segments (false).
The proposed method had a slightly lower recall value than MCRE. Considering that recall is lowered when the relief segment is undetected or thinner than the ground truth relief region, the small reduction in recall by the slightly thinned relief is acceptable. Since, however, low precision indicates that a lot of noise segments are detected from the background regions, precision is the more important factor than recall in recognizing characters.
The proposed method was assessed 8.95% higher than MCRE in terms of the F1-score. The proposed method showed the highest SIRI score with a large improvement of 29.43, 40.6, and 10.4 % compared to DRE, CRE, and MCRE, respectively. That indicated that the proposed method successfully extracted relief regions for the easiest to recognize by epigraphists.
The relief extraction results for the entire stele are shown in Fig. 11. For easy comparison, the intensity values of the rubbing result are inverted. The most classical method, the rubbing method, is difficult to recognize characters because all bumps appear. The result of CRE is similar to the rubbing, and all the bumps are extracted and connected between adjacent bumps. Therefore, it makes the characters more difficult to recognize. MCRE produces better segmentation results than CRE, because the noise curvatures are effectively decreased. But a lot of small noises still remain in MCRE. The proposed method shows the best performance compared to the other methods.

V. CONCLUSION
In this paper, we presented the method extracting reliefs from very rough surfaced steles. The proposed method extracts potential relief candidate segments, and then, selects and removes noise segments using the kernel SVM classifier. For classifying the segment accurately, we proposed the features of segments that consist of the appearance-based, the cross section-based, and the local extrema-based features. Through experiments, it is confirmed that each subfeature of the proposed features is effective and the proposed method outperforms the conventional methods objectively and subjectively. The performance of the proposed method is about 8.95% and 10.4% higher than the second best relief extraction method in terms of the F1-score and the SIRI, respectively.