Chinese Character Components Segmentation Method Based on Faster RCNN

To solve the component segmentation problem caused by the sticking and overlapping of parts in incoherent handwritten calligraphy characters, we propose a Chinese character part segmentation method based on Faster RCNN. The method utilizes the advantages of Faster RCNN on multi-scale and small targets to solve difficult problems in component segmentation. The hierarchical features of the components were used in our proposed method to identify each layer of the Chinese character structure to obtain the components. Qualitative and quantitative calculations were used to test the segmentation effect of the proposed method. The experimental results demonstrate the accurate segmentation effectiveness of our method for adhering and overlapping components. In addition, these components could be retrieved accurately in the retrieval system, and the mean Average Precision of the top 30 retrieval results reached 95.7%. A better retrieval accuracy reflects a better segmentation effect from the side, which proves the effectiveness of the proposed method.

nents contains more information than strokes, is simpler than 23 the whole Chinese character, and the number of Chinese char-24 acter components is much less than the number of Chinese 25 characters. Therefore, the accurate segmentation of Chinese 26 character components is significant for the research related to 27 the Chinese character image word stock [2], Chinese charac-28 ter recognition [3], and Chinese character font conversion [4]. 29 The current research on Chinese character images focuses on 30 The associate editor coordinating the review of this manuscript and approving it for publication was Larbi Boubchir .  , and medicine [30]. In com-113 ponent segmentation, it is a problem that the area of some 114 Chinese character components occupies a small proportion of 115 a whole character image, such as in Figure 2. (b). In addition, 116 there is also another problem that the position and size of the 117 components are uncertain in the component segmentation. 118 The characteristics of these problems are similar to those 119 described above. Therefore, this paper proposes a Chinese 120 character components segmentation method based on Faster 121 RCNN. We design the Chinese character structure type and 122 select the characters according to the proportion [31] and 123 then we use Faster RCNN to recognize and segment the 124 components to achieve component segmentation of canoni-125 cal handwritten calligraphic characters. In addition, we use 126 a retrieval system to check the segmentation effect of the 127 components.

128
In Section 1, we discuss the importance of components, 129 possible problems during segmentation, and some related 130 research. The rest of our paper is organized as follows. 131 In Section 2, we introduce the research method of this paper 132 in detail. Then in Section 3, we describe the experimen-133 tal data and results in detail. Finally, Section 4 provides a 134  and finally synthesized into a proposal. The proposal and fea-166 ture map are then passed into ROIPooling for final category 167 judgment, and the target category and category probability 168 contained in the picture is obtained.

170
In this paper, we refer to the thirteen Chinese charac-171 ter structures used in the literature [31] to label each 172 layer of the Chinese character structure. The labels include 173 left_right, up_down, up_right, up_left, left_down, up_three, 174 down_three, left_three, surrounded, frame, left_center_right, 175 up_center_down, and single_font. As shown in Table 1. 176 Among the thirteen types of structures, the single_font and 177 the frame have only one layer of structural information, and 178 the first layers of the structure are labelable. The sample 179 characters are shown in Table 1 as the ''Example Word'' 180 VOLUME 10, 2022 corresponding to single_font and frame. The rest of the struc-181 ture types have more than two layers of annotation informa-182 tion. The specific annotation process is shown in Figure 4.  2) Pass the feature map into the RPN to generate anchors. 204 The anchors are then subjected to binary classification 205 and bounding box regression. The former is used to 206 determine whether the anchor contains targets, and 207 the latter is used to generate bounding box regression 208 parameters to adjust anchor coordinates. These opera-209 tions are the premise of proposals. 3) The proposals are mapped onto feature maps and fed 211 into the ROI pooling layer for pooling. Proposals of 212 different sizes are uniformly scaled to a fixed size, and 213 then go through a fully connected layer to get the class 214 results.

215
The RPN in Faster RCNN can accept input images of 216 arbitrary size and the output is a set of proposals containing 217 scores. In the shared feature map, k anchors of different sizes 218 are generated after scanning each position of the feature map 219 using an n × n sliding window. Each anchor has two opera-220 tions, classification and bounding box regression. As shown 221 in Figure 6. Thus, each anchor will get two classification 222 scores and four bounding box regression parameters. The two 223 classification scores are the foreground probability and the 224 background probability of the anchor, which are obtained by 225 comparing the overlap of the anchor with the ground truth. 226 At the same time, the anchor gets a foreground or background 227 label. The four bounding box regression parameters are the 228 adjustments made by RPN to the predicted proposal to bring 229 the proposal closer to the ground truth. In this paper, the 230 sliding window is 3 × 3, and 9 anchors are collected at each 231 pixel position.   276 Different from the traditional retrieval methods, this study 280 uses the Milvus [32], which is a new vector database, to build 281 a map search system. The system uses the neural network 282 model VGG as the feature extraction part to obtain the 283 features of the component images, which transforms unstruc-284 tured data into high-dimensional vector data and preserves 285 the image features better. The approximate nearest neigh-286 bor algorithm (ANN) is used to calculate the similarity and 287 improve the retrieval speed. The system has a high retrieval 288 performance with a retrieval time of about one second. 289 The structure of the map search retrieval system is shown 290 in Figure 8. To verify the effectiveness of the method proposed in this 295 paper on the image segmentation task of Chinese characters, 296 this paper uses the YOLOv3spp network as a comparative 297 experiment. The proposed method and YOLOv3spp are both 298 trained in the same experimental environment and using the 299 same data set, and further comparisons have been achieved to 300 prove the effectiveness of the method in this paper.   In Table 2, the first column shows the names of the thir- contained in the obtained segmented images are incomplete, 355 and there are also a large number of useless images with 356 wrong predictions. Therefore, this paper adopts a qualitative 357 method to compare with the proposed method. analyze.

359
The proposed method is evaluated in terms of both visual 360 effect and retrieval accuracy. The visual effect is evaluated 361 by manual evaluation, and the retrieval accuracy is assessed 362 using two evaluation metrics: Average Precision (AP) and 363 mean Average Precision(mAP).

364
Equation (5) is the calculation formula for image retrieval 365 accuracy, where k is the top k images returned when querying 366 an image. precision i is the precision of the top i images in the 367 returned results. The i is the i-th image.
Equation (6) is the precision calculation formula, where 370 I s is the actual number of similar images returned in top k 371 images. Equation (7) is the mean Average Precision calculation 374 formula, where c is the number of queries. The mAP is 375 higher to indicate better retrieval results. Meanwhile, the side 376 98100 VOLUME 10, 2022  According to the literature [34], we select 20 components 419 that compose a relatively large number of Chinese characters 420 and then submit these component images sequentially to the 421 retrieval system. The retrieval system will retrieve images 422 similar to these components. Figure 12 shows the partial 423 retrieval results of the image search system. In the figure, 424 (a) is the image of the search component, and (b) to (f) are 425 the first five search results. Figure 13 shows the first five 426 search results for '' '' from (b) to (f) and their correspond-427 ing images of the original Chinese character. In Figure 12

436
In this paper, we retrieve similar components in the com-437 ponent dataset by using a map search system to check the 438 VOLUME 10, 2022    formed into the process of finding the target of the image. 466 We take advantage of Faster RCNN in the target detection to 467 identify the Chinese character structures and implement the 468 component segmentation. In testing the segmentation effect, 469 this paper takes both qualitative and quantitative approaches 470 for evaluation. First, we randomly select 100 Chinese char-471 acters and then segment these characters using the method 472 in this paper, and judge the obtained segmentation results 473 in terms of visual effects. Then we refer to ''Specification 474 of Common Modern Chinese Character Components and 475 Component Names'' to select 20 components that appear 476 more frequently. Then we use Milvus to build a map search 477 system to verify the accuracy of the method segmentation. 478 The more accurate the segmentation, the higher the retrieval 479 accuracy. The experimental results show that the average 480 retrieval accuracy can reach more than 95%. It proves that the 481 method of this paper can still achieve good results in the case 482 of the complex structure of Chinese characters, different posi-483 tions and sizes of components, and the existence of adhesions 484 and laps.From the analysis of the results of the comparison 485 group experiment, it can be seen that YOLOv3spp is not 486 suitable for the scene of segmenting Chinese characters. It is 487 faster than the method proposed in this paper. However, the 488 recognition accuracy cannot achieve the desired effect, which 489 is not as good as the method in this paper.

490
The method in this paper uses rectangular boxes as com-491 ponent boundaries in segmentation, so over-segmentation 492 and under-segmentation may occur in the segmentation 493 process. Next, we will consider how to process the inaccurate 494 components to achieve higher quality and more accurate 495 segmentation.