Method for the Automatic Generation and Application of Landmark Control Point Library

Ground control points play an important role in improving the positioning accuracy of satellite images. At present, most control points must be obtained by manual deployment (calibration field) or feature extraction. The control points obtained by manual deployment are fixed in certain areas and have a high deployment cost. The current method for feature matching is limited by the acquisition of the reference image and matching accuracy, which results in poor flexibility and is not conducive to improvements to worldwide satellite positioning precision. To solve this problem, this study proposes a new automatic generation and application algorithm for landmark control points across the globe based on the deep learning method. Using this method, landmarks can be selected and the deep-learning-based target-detection method can be used to realize the automatic generation of control points. When satellite images with relatively low positioning precision are used, landmark control points can be accurately obtained with a precision reaching the sub-pixel level, which can provide a sufficient foundation for the geometric correction of non-mapping satellite images. In this study, a remote sensing image dataset of road intersections was also constructed, which considers road intersections as landmarks. The experiment is carried out with the road intersection dataset, and CenterNet network is trained. The experiments show that the detection precision of the network can reach 96.27%. Finally, we designed an application strategy for the landmark control points and improved the image matching method, such that the matching precision between the landmark images and images to be processed can reach the sub-pixel level and conform to the requirements of geometric correction for non-mapping satellite images.


I. INTRODUCTION
Landmark control points refer to high-precision ground control point data generated by feature extraction from satellite images. This extraction takes typical and artificial ground features with distinct characteristics, easy identification, and diverse forms, such as road intersections and track fields, as landmarks and stores them into a library according to a metadata system, which contains image blocks, positions, attributes, and other information on the ground features. Figure 1 shows a schematic of a landmark control point.
High-positioning accuracy plays a vital role in giving full play to the performance of high-resolution satellite remote The associate editor coordinating the review of this manuscript and approving it for publication was Abdel-Hamid Soliman . sensing images, which directly affect the quality of subsequent satellite image products, such as DSM, DEM, and 3D scenes [1]- [3]. Currently, there are three main ways for satellite image positioning: The first is to use the on-board equipment to directly measure the orbit, attitude and other information when the satellite images, and directly locate the ground target, that is, ''uncontrolled positioning'' [3]- [5]. The second is to use a certain number of ground threedimensional control points within the coverage of the stereo image to inversely calculate the imaging sensor's orbit and attitude parameters, that is, the ''ground control'' method [6]. The third is the combination of the previous two methods, namely ''hybrid positioning'', the positioning accuracy and reliability of which is more guaranteed [7]. Normally, the first way is the preferred method of positioning control. However, when the satellite is in its orbit, it is susceptible to various internal and external factors, resulting in measurement errors of orbit and attitude data and affecting positioning accuracy. The second way requires many ground control points, resulting in a heavy workload. The third way is the technical scheme commonly used in practical engineering applications. Therefore, in general, ground control points play a very important role in improving the geopositioning accuracy of satellite images [8], [9].
At present, most control points are deployed manually or through feature extraction using reference images [10]. The manual deployment method mainly constitutes the construction of large-scale calibration fields, such as the Stennis Satellite Remote Sensing Calibration Field in the U.S.A. and the Songshan Remote Sensing Calibration Field in China [11]. The control point area obtained by this method is fixed, such that the satellite can only perform geometric correction when passing over the fixed area. For areas without control points, the precision cannot be effectively guaranteed. There are two ways to obtain control points through feature extraction. One is to directly match the reference image with the image to be processed. This method uses reference images to obtain control points and is limited by the acquisition and selection of these images [12]. In the actual application process, the matching precision among images is also an important aspect that restricts the application of control points. The other way is to build GCP Chips with images from different sources [13]. GCP Chips refers to the feature extraction of images through feature extraction algorithms, in which the image block and coordinate point are stored together with the feature point as the center. In specific applications, it has a high sensitivity to the image source, and mismatches are likely to occur between images with large differences.
To solve these problems, this study proposes an automatic landmark control point generation algorithm based on natural ground feature recognition and detection. First, considering practical application requirements, such as a high-efficiency index of data and a high control point positioning precision, a landmark control point metadata system is designed, followed by an explanation of the meaning of each element. Second, road intersections are selected as landmarks, corresponding datasets are constructed, and the CenterNet network is used for training, verification, and detection. Through the trained network, we obtain a deep learning model, which can be used for the high-precision detection of road intersections. Satellite remote sensing images with high positioning precision are then selected as the benchmark. The trained CenterNet network is used to test satellite remote sensing images, whose results are stored according to the landmark control point metadata system. Finally, to verify the reliability of this method, structure information for the landmark control points is adopted, followed by the design of an application strategy for the landmark control points.
Landmark control points differ from GCP Chips in the following ways: 1) Compared with GCP Chips, the landmark control point library has more attribute information, including landmark types, quality levels, vector information, spatial resolution, etc., which can effectively assist the application of landmark control points. 2) The density of the landmark control points is higher, and coarse matching of the landmark control points can be achieved via the structural information between the landmarks. Compared with the artificial design feature matching of GCP Chips, the matching accuracy of the landmark control points is higher. 3) A quality grade is designed in the metadata system of the landmark control points, where different grades can be determined for landmark control points according to their positioning accuracy, image quality, etc. GCP Chips can be used to supplement landmark control points, and the matching results of landmark control points can be evaluated according to the quality grade of the landmark control points in practical applications.
The main contributions of this paper are as follows: 1. A set of new automatic generation methods for landmark control points based on the deep learning method is proposed, which can achieve automatic generation and extension of landmark control points throughout the world and provide a sufficient foundation for the geometric correction of nonmapping satellite images.
2. A remote sensing image dataset of road intersections which we termed XD Crossing dataset, is constructed, which considers road intersections as landmarks, and the CenterNet network is trained. Experiments show that the detection precision of the network can reach 96.27%.
3. An application strategy for the landmark control points and an improved the image matching method are designed, such that the matching precision between the landmark images and images to be processed can reach the sub-pixel level and conform to the requirements of geometric correction for non-mapping satellite images.
The automation mentioned in this paper is mainly embodied in the intelligent detection of landmarks. Based on the detection, landmark is stored according to the metadata system. According to this method, we need to first build a landmark datasets; but since there is a growing number of open source landmark datasets, a large number of datasets can be used directly; Moreover, as a dataset is constructed once-off, once built, it can realize the automatic generation of landmark control points on a large and even a global scale.
The method proposed in this study comprehensively utilizes deep learning methods to overcome the difficulty of acquiring control points and realize the deployment of landmark control points throughout the world. Based on the generation of landmark control points, the application strategy of landmark control point is also studied. To verify the effectiveness of the method, a landmark control point generation experiment was carried out, with the road intersections considered as landmarks. This experiment utilized satellite images from different sources and of different positioning accuracy, and the experimental results show that the method presented in this paper has the advantages of automaticity and high precision.
The remainder of this paper is organized as follows: Section II focuses on the research status of control point generation and highlights the problems of existing methods. Section III describes the method of this paper in detail, including the overall design ideas and specific details. In section IV, three experiments are described to verify the effectiveness of the proposed method in the automatic generation and application of control points. Section V gives a critical discussion on the proposed method. The concluding remarks are presented in Section VI.

II. RELATED WORK
The deployment and use of control points are essential for improving the positioning precision of satellites. Currently, there are two commonly used methods. One method establishes a ground calibration field and manually and evenly distributes fixed ground control points on the calibration field according to certain distribution rules. The construction of a calibration field plays a very important role in improving the positioning accuracy and testing the performance of satellite images. Many high-resolution earth observation systems have adopted this method to construct ground control points, such as SPOT [14], IKONOS [15], [16], ALOS [17]- [20], GeoEye [21], and IRS-P6 [22]. These systems use ground control points arranged by a calibration field to carry out periodic or irregular on-orbit geometric correction. China has also built a ''songshan'' calibration field with independent intellectual property rights in DengFeng, Henan, through which a series of satellites such as Ziyuan and Gaofen can carry out geometric correction and improve the positioning accuracy of images [23]- [25]. Middle and low orbit satellites often use this method to carry out geometric correction processing through the calibration of ground control points in the field. However, the control points laid in this method are fixed in the area and can only be corrected when they pass over the field. In addition, manual selection of control points on the image is required, which entails a large amount of work [26].
A second method uses processed satellite images, with high positioning precision as the reference, to obtain the feature points via feature extraction and build GCP Chips which are used for the establishment of associations with the images to be processed [13], or directly realize the transmission of control information using feature matching between the image to be processed and the reference image [10]. This method first requires the manual selection or processing of the existing satellite image, with high positioning precision, followed by matching this image with the image to be processed to finally achieve geometric correction. The spatial resolution of the reference image and image to be processed are required to be as close as possible. The positioning precision of the reference image must be better than that of the image to be processed. Large water and forest coverage areas should be avoided in the imaging scope. The imaging time should not be excessively long, so as to avoid changes over large areas in the ground features that may affect the accurate matching between images. In addition, the season of the image to be processed and that of the reference image should remain consistent to avoid unfavorable factors, such as changes in vegetation [27]- [30]. GCP Chips can effectively realize the automatic generation and application of control points. However, because this method still adopts the traditional manual design, the density is relatively sparse and lacks attribute information such as feature categories. When applied to heterogeneous images, image matching becomes difficult [31], [32].
Considering the above two methods, we observe that the manual deployment method is costly, being applicable only to a fixed area. Control points for geometric correction can only be obtained when the satellite passes over an area. A large amount of manual intervention is required during the application to perform the corresponding selection of control points on the image to be processed. To obtain control points based on the reference image, the reference image that conforms to the conditions has to be manually selected and the reference image has to be processed in advance. Therefore, matching the image to be processed with the reference image is difficult.

A. LANDMARK CONTROL POINT GENERATION DESIGN
Given the existing problems in control point generation, this study aims to propose a method that is suitable for non-mapping satellites and can generate control points worldwide. The main goal of this study is to use high-precision satellite images to process selected landmarks widely distributed around the globe and intelligently discover and detect ground features via artificial intelligence (AI) methods. At the same time, image blocks, positions, attributes, and other information from ground features are stored in a library as control points based on a unified standard (the landmark metadata system) used for the geometric correction of other satellites. Landmark control points in this study refer to the control point information generated and stored by the methods. Landmark detection is first carried out on the image to be processed, followed by an establishment of the correlation between the landmark block of the image to be processed and the image block in the landmark library. Finally, a registration between the two images is realized to obtain control points VOLUME 8, 2020 with high positioning precision based on the landmark library used for the satellite image to be processed.
There are two significant advantages of the method proposed here. First, as long as the landmarks are appropriately selected and there is sufficient satellite image positioning precision, control points with high positioning precision can be set up globally to provide a wider scope of the geometric correction processing for the satellite images to be processed. Second, compared with the cross-calibration method, the method in this study only has to match between specific landmark image blocks, which reduces image complexity and significantly reduces matching difficulties. Moreover, this method does not require a manual image selection and can significantly improve the processing efficiency. Figure 2 shows the overall idea and implementation flow of the proposed method.

B. LANDMARK CONTROL POINT METADATA SYSTEM
The design of the landmark control point metadata system is the basis for the generation, organization, expansion, and application of landmark control point data. During the design of the landmark metadata system, the application of landmark control points is the main center for the storage of necessary landmark control point information. Certain unnecessary information is not stored to reduce the pressure on database organization. The purpose of this study is to explore a method for the generation and application of global inland standard control points. To globally store the information on control points, this study uses the global discrete grid system to organize the landmark control point library and adopts the strategy of dividing the globe into regular grids and establishes a corresponding relationship between the control points and grids via coding. The location coding information is mainly used to realize the fast indexing of landmarks [33], [34].
When a satellite with a poor positioning precision has to use control point data, landmark detection is first performed on the satellite image. Based on the detection, a rough extraction of the landmark data in the landmark library is realized according to the rough coordinates of the landmarks. Finally, the accurate matching of landmark image blocks occurs via the matching among image blocks and high-precision coordinates of the landmarks in the landmark library, which are provided to non-mapping satellite images for use. Therefore, the landmark metadata system must contain certain information, such as high-precision control point coordinates corresponding to landmark blocks, landmark image blocks, and spatial resolutions corresponding to the images. The control point coordinates can be obtained through stereo positioning of high positioning accuracy satellite images, or positioning based on high-precision DEM and satellite imaging parameters [2], [6]. When image deformation is large, such that the matching of landmark blocks is challenging, we also have to use landmark three-dimensional (3-D) model information to achieve high-precision matching using non-mapping satellite images based on the projections of the 3-D models. Therefore, data, such as that from 3-D models, are also important when using landmark control points. In addition, image sources, landmark attributes, vector data, and quality grade, among others, can also provide sufficient support for the use of control points. Table 1 lists the specific definitions and functions of these data.

C. CONSTRUCTION OF THE LANDMARK DATASET
To globally build landmark control points, the selection of landmarks should meet three conditions. First, the landmarks must be widely distributed, which is more convenient for the generation of global control points. Second, the features of the landmark should be distinct and easily identifiable, which is mainly to enable the landmark control points to be better associated and matched when applied. Third, the landmark must be relatively fixed, and not prone to frequent changes. Landmarks that change frequently, such as rivers and airplanes, cannot be selected. This is more conducive to automatically detecting landmarks on the image and extracting unique control point information. Road intersections, track fields, buildings, and other ground features all meet the above requirements. In practical applications, road intersections [35] are often chosen when manually selecting control points. Therefore, this study takes road intersections as an example. This section mainly investigates the construction of road intersection datasets.
At present, with continuously expanding applications of deep learning methods in remote sensing and other fields, there are numerous corresponding remote sensing image datasets, such as the DIOR dataset [36] from Northwest Polytechnical University and the DOTA dataset [37] from Wuhan University. These datasets contain various targets, such as airplanes and athletic fields, among others. In these datasets, the positions of ground features, such as airplanes, change frequently and cannot be used as landmarks to generate control points. However, in training and detecting datasets with multiple targets, the precision is weaker than that of the network training results for single-target datasets. Therefore, the existing datasets cannot be directly used for network training, such that single-target datasets must be manually constructed for network training and detection.
To improve the generalization ability of the network, this study takes road intersections as an example. According to the principles of multiple scenes, multiple scales, and high-resolution, we constructed an XD crossing dataset, which has a total of 2,736 images and 22,488 road intersections and contains various targets with different scales, hues, and scenes. Figure 3 shows a schematic of the XD crossing dataset.
The original image sources of the dataset are various, including Worldview satellite, Ziyuan satellite, and Google Earth images, where the image resolution varies from 0.15 to 5 m. Road intersections were taken from various global regions. The XD crossing dataset is currently the only existing dataset for road intersections.

D. INTELLIGENT LANDMARK DETECTION AND CONTROL POINT SELECTION
Due to the construction of landmark datasets to realize the automatic generation of landmark control points, we must intelligently select appropriate methods to detect satellite images with high positioning precision. At present, target detection methods based on deep learning are superior to traditional target detection methods [38]. Target detection methods based on deep learning can be divided into methods based on candidate regions and methods based on regression. Representative target detection methods based on candidate regions include Faster RCNN [39] and R-FCN [40]. Target detection methods based on regression mainly include the YOLO series [41]- [43] and the center-based network Cen-terNet [44]. Among these methods, CenterNet has a relatively high detection precision for small targets.
CenterNet mainly detects a target via the target's center point. Based on the detection of this center point, certain attribute information for the target, such as the target size and dimensions, are obtained via regression at the center point position, as shown in Figure 4. The network converts the target detection problem into a key point estimation problem for a solution. This only requires the transmission of the image into the full convolution network to obtain a thermal diagram. The peak point of the thermal diagram is the center point while the peak point position of each characteristic diagram predicts the width and height information of the target [44]. Compared with other methods, such as the Faster RCNN and YOLO, CenterNet is simpler, more efficient, and more accurate. Therefore, this study uses CenterNet to test road intersections.
When training the key point prediction network, its objective function is as follows: whereŶ represents the predicted image of the key points, α and β are hyper-parameters for the focal loss, and N represents the number of key points.
To consider certain factors, such as the offset and scale of key points, corresponding loss functions were also designed with a loss function for the key point offset, defined as follows: whereÔ represents the local offset of the key points,p represents the position of the key points, and R is the output stride (i.e., the scaling of dimensions). The loss function of a target scale change is as follows: whereŜ pk represents the predicted result, s k represents the object size, and N represents the number of results. Based on the above equations, when training the CenterNet network, the target loss function of the entire network can be defined as follows: where λ size and λ off are the coefficients of offset and scale, respectively. CenterNet network considers the scale and offset of a landmark rather than the rotation, the main reasons are: 1) When building the data set, the situation of multiple road intersections has been considered, including those with multiple directions. Therefore, the generalization of the network has been considered. 2) Road intersection choice depends on the angle of the satellite imaging; this does not affect our final detection.
The target detection capability of CenterNet can be tested using the COCO dataset. This dataset has 80 categories that are all-natural scene images. This study mainly examines target detection for remote sensing images. The detection targets belong to the category of road intersections only, which are difficult to detect using networks trained directly by the COCO dataset. Therefore, we attempted to improve this detection by modifying the number of ground feature categories, using the migration learning method for training, with network parameters trained by the COCO dataset as initial values to finally obtain a network suitable for road intersection detection.

E. APPLICATION OF LANDMARK CONTROL POINTS
Based on the construction of the landmark control network, when the satellite positioning precision drops and geometric correction is required, the application of landmark control points can be realized by establishing the correlation between the satellite image to be corrected and the image in the landmark library. The existing control points, with high positioning precision, can then be provided to correct the satellite image. Figure 5 shows the application process.
The key to the application of the landmark control network is to establish the connection between the image to be processed and landmarks with high positioning precision in the landmark template library. This study proposes a method to establish the connection between structural information. The landmark correlation flow chart is shown as Figure 6 and the specific description is as follows: 1. For image I to be processed, the road intersection detection is performed via the trained CenterNet network and the detection dataset, i.e., = {Q i , i = 0, 1, 2, 3, . . . , N }, can be obtained. With element Q i (Xq 0 , Yq 0 , Zq 0 ) in the set as the reference, two points, i.e., Q 1 (Xq 1 , Yq 1 , Zq 1 ) and Q 2 (Xq 2 , Yq 2 , Zq 2 ), nearest to the element can be obtained via a nearest neighbor search.
2. Probabilistic localization is carried out by combining the imaging parameters (RPC parameters) of the satellite image with a digital elevation model (DEM). The search is performed in the landmark library using the range searching method to obtain the nearest target, P 0 (Xp 0 , Yp 0 , Zp 0 ). Taking P 0 as the center, the method searches for the nearest two points, which are denoted by P 1 (Xp 1 , Yp 1 , Zp 1 ) and P 2 (Xp 2 , Yp 2 , Zp 2 ).
3. The three points are reordered clockwise around the two points, i.e., P 0 and Q 0 , to obtain two triangles, i.e., P 0 P 1 P 2 and Q 0 Q 1 Q 2 , in the same order. Then, the three angles of the triangles can be calculated as follows: Similarly, we can obtain the three internal angles, i.e., Q 0 , Q 1 , and Q 2 , of Q 0 Q 1 Q 2 . The differences in the corresponding internal angles between the two triangles are calculated, and the threshold of the differences is set to N. When the minimum value of the difference is less than the threshold, N, the two landmarks are considered to satisfy a one-to-one correspondence.
Based on the correspondence of the landmark blocks, the ORB feature extraction operator [45]- [47] and the RANSAC [48] gross error elimination method were adopted to realize matching between the two image blocks. This provides control points with high positioning precision in the landmark library that are used for the satellite image to be corrected. During the road intersection matching process, as the selection of the ORB corner detection operator and the feature similarity among the four corners of a road intersection is relatively high, this process is easily affected by image rotation, feature similarity, and other factors. This results in mismatching, as shown in Figure 7.
To solve the problem of mismatching, this study introduces a vector constraint based on a position to achieve accurate target registration. This constraint condition is as follows: where (X l0 , Y l0 ) represents the plane geodetic coordinate of the center point from the reference image, i.e., the image block in the landmark library, (X l1 , Y l1 ) represents the plane geodetic coordinate of the feature point extracted from the reference image, (X r0 , Y r0 ) represents the plane geodetic coordinate of the center point from the image to be registered, (X r1 , Y r1 ) represents the plane geodetic coordinate of the feature point preliminarily registered with the image to be registered, and K represents the threshold, whose value is confirmed according to the positioning precision of the satellite image. When using control points for geometric correction or adjustment of an image, we must often distribute control points throughout the entire area. Dense control points in a small range will lead to non-convergence of the solution. Only one control point is often needed for a road intersection. Therefore, maximum gradient information can be used as the matching measure based on the above matching, such that we can finally obtain a unique matching result. The gradient VOLUME 8, 2020 FIGURE 6. Landmark correlation flow chart. This method fully uses the construct information of landmark control points. First, the trained CenterNet network is used to detect the processing images and process the detection results to obtain the three nearest landmarks. Then, Probabilistic localization is carried out by combining the imaging parameters (RPC parameters) of the satellite image with a digital elevation model (DEM), and the nearest landmark control point is obtained by searching the landmark control point library. Finally, the structure information is compared to determine if the association is correct. constraint is as follows: where the set D represents initial matching point pairs and the gradients of the corresponding points on the two images.

IV. EXPERIMENTS AND RESULTS
This study mainly examines the automatic generation and application of landmark control points. Three sets of experiments were set up to illustrate the effectiveness of this scheme. Experiment 1: In the automatic generation method for control point data based on landmarks, the key point is the intelligent detection of landmarks and the generation of control points. Therefore, we designed different networks to determine the best network based on the landmark detection precision. The selected evaluation indices were the average precision (AP), precision-recall curve, and the training and detection times, where correct detection is defined as the marked IoU (Intersection over Union) value of the detection result, with the true value higher than 0.5. If the IoU value of multiple detection results is greater than 0.5, the detection result with the largest IoU value is taken as the correct detection, whereas the other detection results are classified as erroneous detections. Experiment 2: The categories, the quality of the image to be processed, and the dataset may contain certain differences in the practical application process, such that this experiment uses different images to test the trained CenterNet network to verify network generalization. Experiment 3: Based on the generation of landmark control points, we also constructed an application test for the landmark control points. The feasibility of this scheme was verified by object detection, i.e., the association and matching with landmark control points in the landmark library of the processed images.
A description of the experimental environment and results follow.

A. EXPERIMENTAL ENVIRONMENT
The hardware used in this experiment was an Intel i7 7800X, with 32 GB memory, a 500 GB hard disk, and a NVIDIA GTX 1080ti 11G 2 graphics card. The operating system was Ubuntu 16.04 64 bit, using the Pycharm compilation environment and the Python development language on the tensorflow-gpu framework.

B. ROAD INTERSECTION DETECTION
To verify the effectiveness of the proposed method, the XD crossing dataset was randomly divided into the training set, which has 1,750 images, test set, which has 548 images, and verification set, which has 438 images. Different types of target detection methods were selected for the experiments, mainly the Faster RCNN (VGG16) [39], [50], Faster RCNN (Res101) [39], [51], R-FCN [40], YOLO-v3 [43], and Center-Net [44]. The training was carried out for different networks, such that the training effect was optimized through parameter adjustment. To compare the detection status of the above networks with respect to the road intersections, the time for different networks to train to the optimum was counted, with 100 images randomly selected for testing. Table 2 lists the average time required for target detection in a single image and the detection precision of different networks for the road intersections.
Precision (P) and Recall (R) curves and the Average Precision (AP) metric are techniques to compare object detection networks [49]. To more intuitively compare the detection effects of different networks for road intersections, the precision-recall curves of the networks were drawn, as shown in Figure 8. The precision-recall values were counted, as shown in Figure 8, where precision represents the degree of precision and recall represents the recall rate. These values can be calculated as follows: and where TP (true positive) is a hit sample, i.e., a positive sample in the target category correctly recognized by the model, FP (false positive) is commonly referred to as a false alarm or false detection sample, i.e., a negative sample that is incorrectly identified by the model as belonging to the target category, and an FN (false negative) is a sample that is neither hit nor missed in the detection, i.e., a positive sample not recognized by the model as belonging to the target category. A comprehensive analysis of the above results shows the following.
1. The analysis of the Faster RCNN network in Figure 8 and Table 2 shows that the detection precision of the Faster RCNN (Res101) is 0.8926, i.e., higher than those of the Faster RCNN (VGG16) and R-FCN, whose detection precisions are 0.8898 and 0.8883, respectively. This is mainly because different frameworks have less of an influence on the detection results of the Faster RCNN. As the recognition precision of the convolution neural network Res101 is better than that of the VGG16, the detection precision of the Faster RCNN (Res101) network is slightly higher than that of the Faster RCNN (VGG16). The R-FCN network offers an improvement over the Faster RCNN and uses a deeper residual network model to extract image features. However, for targets with large-scale changes, such as road intersections, the effects do not vary significantly. Through a comprehensive comparison based on the three object detection and recognition algorithms of regional candidate classes, we observe that the Faster RCNN has the best detection effect for the scenes described in this study.
2. From the PR curves for the resulting information, we observe that the detection precisions of the YOLO-v3 and Faster RCNN are 0.9608 and 0.8926, respectively. The average time consumed for single image detection using the YOLO-v3 and Faster RCNN is 0.068 s and 0.185 s, respectively. Therefore, YOLO-v3 is superior to the Faster RCNN in terms of the detection precision and effect for road intersections. To analyze the detection effect of the Faster RCNN and YOLO-v3 networks, several images at different scales were randomly selected for comparison, the results of which are shown in Figure 9.
Based on Figure 9, the Faster RCNN and YOLO-v3 have the same detection effect for relatively large targets, but YOLO-v3 has a better detection effect for small targets. The main reason for this result is that Faster RCNN must first extract candidate region networks, such that its detection results are significantly influenced by the methods for this VOLUME 8, 2020 extraction. In addition, Faster RCNN's extraction ability is weak for small targets. YOLO-v3 first divides an image into grids and then directly detects and returns each grid. This method has an improved effect on small target detection.
3. CenterNet detects a target through the target's center point. Compared with previously described networks, this network is anchor-free and takes into account the structure and other characteristics of the target. Therefore, by comprehensively comparing the results of the above networks, we found that the training times of CenterNet and YOLO-v3 are 1.83 and 8.93 h, respectively. The training time to the optimal state for CenterNet is shortest, i.e., approximately 1/8th that of YOLO-v3. The average detection time for a single image by CenterNet and YOLO-v3 is 0.029 s and 0.068 s, respectively, and the average detection time for a single image using CenterNet is approximately 1/3rd that of YOLO-v3. The detection accuracy of CenterNet is 0.9627, which is slightly higher than that of YOLO-v3. The higher precision and efficiency of CenterNet is of great significance for the rapid generation of control point data across large areas.
According to this comprehensive analysis, the CenterNet network is more suitable for the scenes in this study. This network is both precise and efficient in detection and achieves relatively good results.

C. DATA GENERATION FROM SATELLITE IMAGE CONTROL POINTS FROM DIFFERENT SOURCES
To verify the reliability of the control point images generated by the proposed method, Worldview3 image data from different time phases and image data from a satellite developed by China in the Zhengzhou area were selected for the experiment. The spatial resolutions of Worldview3 images and satellite images are 0.6 m and 2.1 m, respectively. The CenterNet model obtained through the above experimental training was used to detect the images to verify the generalization ability of the network and the reliability of control point generation.
A comprehensive comparison of the above detection results shows that the CenterNet network trained in this study has a better generalization ability. CenterNet not only has a better performance ability for landmark datasets but can also effectively realize the detection of road intersections in various images. A comprehensive comparison of the detection results shows that the number of road intersections detected in Figure 10(c) is lower than that in Figure 10(a) and (b). This indicates that the detection precision is related to the image's spectral information. When spectral information is more abundant, there is an improved detection effect. Therefore, images with rich spectral information and high positioning precision should be used to generate the control point library.

D. APPLICATION OF LANDMARK CONTROL POINTS
To verify the effectiveness of this method, Worldview3 image data and image data from a satellite developed by China in the Zhengzhou area of different time phases were selected for the experiments. The positioning accuracy of the World-view3 and Chinese satellite images are 2.0 m and 12 m, respectively. The Worldview3 image is the orthoimage, such that the control point generation and application experiments were performed based on this image. The experiment was carried out using cross-calibration (i.e., the first method in section II) and the method described in this study. The reference image was directly matched with the image to be processed through feature extraction to obtain the control points, which were used for the satellite image to be processed. To verify the reliability of this scheme, experiments were carried out using the SIFT, ORB feature extraction algorithms. Figure 11 shows the SIFT+RANSAC and ORB+RANSAC feature matching results.
Based on the results in Figure 11, direct feature matching does not appear ideal. The main reason for this is that the background of the satellite images is complex. When the image scene is too large, the features between images are likely to be similar, which leads to mismatching. Therefore, obtaining ideal results by direct matching between different source images to obtain control points is difficult.
The trained CenterNet network was used for target detection in the reference image and image to be processed. Figure 12 shows the detection results. Based on the WorldView image, the detection results of the reference image are coded and stored in the library according to the landmark control point metadata system, i.e., the metadata system designed in section III B.
Based on coding and storage in the library, to provide landmark image blocks to be used for the images to be processed according to the method described in section III E, the correlation between landmark images must first be performed based on structural information. Figure 13 shows the correlation between the results obtained from the detection and those in the target library.
From the results in Figure 13, we observe that the correlations among certain image blocks can be realized by using structural information, all image blocks can be correctly correlated, and there is no error in the correlation. Overall, there are also cases of correlation failure, which is mainly due to the different sources of images and large differences in the gray levels among the images, resulting in missed detection. The differences in the detection results directly affect the correlation results. This situation, however, is rare due to the high precision of landmark detection. In addition, the image to be processed is a non-orthoimage, such that there is a certain amount of deformation. This will also cause landmark correlation failures, but this effect can be avoided by adjusting parameters.
Based on the correlations, each correlated image block was matched. The position vector and gradient constraints were adopted to select the optimal point pair as the matching result. Figure 14 shows the matching results of the three landmark control points. Statistics indicate that the results among the image matchings can reach the sub-pixel level.
Compared with the direct matching between the reference image and image to be processed, the proposed method has a simpler image background, more distinct characteristics, and a lower probability of mismatching, thus achieving a better matching effect. Due to the similarity in the features at the corner points, mismatches are prone to occur when a road intersection is matched, as shown in Figure 14(a). After adding the vector constraint based on the position, coarse errors can be essentially eliminated, the results of which are shown in Figure 14(b). Based on correct matching, the point pair with the highest matching accuracy was further determined according to the similarity of the gradient, as shown in Figure 14(c). According to the artificial discrimination, the matching accuracy can reach the sub-pixel level.
This study only uses the traditional ORB+RANSAC + constraint method to match landmark image blocks to achieve certain results. This also verifies the reliability of the landmark control point generation and application method proposed in this study. With the deep learning method, the image matching method has been continuously expanded and has had continuous improvements to its precision. The most advanced method is introduced into this scheme, which can achieve better results.
Based on the above experimental results, compared with the method that directly matches the reference image and image to be processed to obtain the control points, the method proposed in this study has higher precision and increased reliability. This method can support the correlation and matching between the image to be processed and the landmark library. Furthermore, the matching precision can reach the sub-pixel level, conform to the use requirements of the control points, and provide control point information for the image to be processed.

V. DISCUSSION
The above experiments and comparisons demonstrate the effectiveness of the proposed method for the automatic generation and application of landmark control points. Compared with the control point acquisition method based on the reference image, the proposed method has higher stability and precision. VOLUME 8, 2020  For the proposed method, this study adopted the object detection algorithm based on deep learning to achieve the automatic generation of landmark control points, which is different from artificial setting or reference image matching. Taking road intersections as an example for experimental verification, the corresponding dataset was constructed and the experimental analysis was performed by comparing and analyzing classic object detection networks, such as Faster RCNN, R-FCN, YOLO-v3, and CenterNet. The experimental results show that the training time of the CenterNet network to obtain the optimal state and the detection time for a single image are shortest, and the detection accuracy is highest, i.e., up to 0.9627, as listed in Table 2 and shown in Figure 8. Therefore, this method can realize the detection of most road intersections on an image and the automatic generation of landmark control points. The reason for the best network effect of CenterNet lies in the simple structure of the network model and its characterization as an anchor-free network, which adopts the idea of center point detection. Furthermore, CenterNet does not require the extraction of a large number of candidate regions, as is the case for Faster RCNN and R-FCN, and does not need to perform a large number of regression operations, such as YOLO-v3.
To verify the generalization of the CenterNet network, we adopted images with different imaging times, spectra, and resolutions to test the trained CenterNet network. The experimental results show that the network has a good generalization and well detected of different image types. Overall, for an image with abundant spectral information, the efficiency is higher, such that while building the landmark control point library, images with a high positioning accuracy and rich spectral information should be selected.
This study also investigated the application of the method for the automatic generation of landmark control points,  adopting images with different positioning accuracies and sources to perform the experiments. The experimental results show that: 1) the method to acquire control points via direct matching with the reference image has a significant influence on the quality of the image itself. For direct matching, mismatching is prone to occur due to complex scenes, as shown in Figure 11. Compared with direct matching, the advantage of landmark matching is that the features are more concentrated and less prone to mismatching. 2) The landmark control point application strategy adopted in this study is to first perform target detection on the processed image to obtain the landmark. Then, according to the approximate location and structure information for the detected landmark, the connection between the landmark and landmark control point library is established, as shown in Figure 13. Finally, based on the correlation, the ORB + RANSAC + position vector constraint + gradient constraint method is adopted to achieve the exact matching between landmarks, as shown in Figure 14. From the experimental results, we conclude that the strategy used in this study can effectively solve the problem of mismatching caused by the similarity between the feature points in road crossings, so that we can finally obtain the accurate matching of the control points. After artificial discrimination, the accuracy can reach the sub-pixel level and conform to the application requirements for the control points.

VI. CONCLUSION
High-positioning accuracy plays a vital role in giving full play to the performance of high-resolution satellite remote sensing images, which directly affects the quality of subsequent satellite image products, such as DSM, DEM and 3D scenes. This study investigates the difficulties associated with obtaining high-precision ground control points and performing the geometric calibration of satellite images. Combined with the deep learning method, the proposed method designs a set of automatic generation and application methods for control points based on landmark detection. The method builds landmark datasets, trains them through the deep learning method, and obtains a trained model. Finally, a landmark metadata system is established, such that landmarks are detected on images with high positioning precision via the trained model, yielding finally generating control points.
The experimental results show that: 1) A set of automatic generation methods for landmark control points based on the deep learning method can achieve automatic generation and application of landmark control points throughout the world and provide a sufficient foundation for the geometric correction of non-mapping satellite images. 2) compared with the other target detection networks, such as YOLO-V3 and Faster RCNN, CenterNet has a higher detection efficiency and detection accuracy, which is more suitable for the scenario used in this study. 3) The CenterNet network trained by the XD crossing dataset has a suitable generalization, such that an image with rich spectral information can generate landmark control points with higher accuracy. 4) The association and high-precision matching between the image to be processed and landmark control point can be realized by adopting the landmark application strategy set used in this study, resulting in accuracies that can reach the sub-pixel level, which conform to the application requirements for the control points.
This study proposes a method for automatic generation and application of landmark control points and takes road intersections as an example to demonstrate the feasibility of control point automatic generation and application. The method is not only applicable to satellite remote sensing images, but landmark control points can also be generated by multi-source remote sensing image data. In practice, we can generate landmark control points on aerial images. In general, as long as the spatial resolution of the landmark control point is similar to that of the image to be processed, the proposed method can be applied with high precision. Furthermore, According to specific user requirements, bridges, athletic fields, overpasses, and other landmarks can be selected for expansion to enrich the landmark control point generation. This method is limited by the influence of the target detection precision and image matching methods. With the further development of deep learning technology, the precision of the target detection and image matching methods will improve, such that we can also improve the automatic generation precision of the control points.
GUANGLING LAI received the degree in remote sensing science and technology, in 2014, and the master's degree in photogrammetry and remote sensing, in 2017. He is currently pursuing the Ph.D. degree with Information Engineering University.
His main research interests include remote sensing image processing based on deep learning method and spatial data structure and management.
YONGSHENG ZHANG received the Ph.D. degree from Information Engineering University, Zhengzhou, China, in 1990. He is currently a Professor with PLA Information Engineering University. His research interests include remote sensing, geographic information systems, photogrammetry, and artificial intelligence.
XIAOCHONG TONG received the Ph.D. degree from Information Engineering University, Zhengzhou, China, in 2010. He held a postdoctoral position at Beijing Normal University, Beijing, China. He is currently an Assistant Professor with PLA Information Engineering University. His research interests include discrete spatialtemporal grid systems, remote sensing, geographic information systems, and photogrammetry. He is a member of OGC.
YANG WU received the B.S. degree in remote sensing science and technology, in 2012, the M.S. degree in photogrammetry and remote sensing, in 2015, and the Ph.D. degree in science and technology of surveying and mapping, in 2019. He is currently a Lecturer with the National University of Defense Technology. His main research interest includes geometry processing of high-resolution optical satellite imagery. VOLUME 8, 2020