Ship Detection Based on Compressive Sensing Measurements of Optical Remote Sensing Scenes

The compressive sensing (CS)-based optical remote sensing (ORS) imaging system has been verified for the feasibility through numerical simulation experiments. The CS-based ORS imaging system can reduce the demand for sampling equipment, reduce sampling data, save storage space, and reduce transmission costs. However, it needs to reconstruct the original scene when facing the task of ship detection. The scene reconstruction process of CS is computationally expensive, high memory demanding, and time-consuming. In response to this problem, this article proposes an innovation pipeline to perform ship detection tasks, i.e., directly performing ship detection on CS measurements obtained by the imaging system, which avoids the process of scene reconstruction. To achieve the ship detection of CS measurements in the pipeline, we design a convolutional neural network-based algorithm, CS-CenterNet, which jointly optimizes the scene compression sampling phase and the measurements’ ship detection phase. CS-CenterNet is divided into convolution measurement layer (CML), optimized hourglass network (OHgN), and optimized three-branch head network (OTBHN). First, CML without bias or activation function simulates the block compression sampling process in CS-based ORS imaging system, which performs convolutional coding on the scene to obtain the measurements. Second, OHgN extracts high-resolution feature information of measurements. Finally, OTBHN performs heat-map prediction, center-point offset prediction, and width–height prediction. We test the performance of CS-CenterNet using the HRSC2016 and LEVIR datasets. The experimental results show that the algorithm can achieve high-accuracy ship detection based on CS measurements of ORS scenes.

of scene data acquired also increases dramatically. Therefore, to relieve the huge pressure of data storage and real-time transmission, the traditional ORS imaging system does not directly store and transmit the original scene information collected by the detector but compresses the data before transmitting to save time and space resources. However, the theoretical basis for data acquisition in this method is the Nyquist sampling theorem, which states that the underlying analog signal must be uniformly sampled at a sampling rate not less than twice the signal bandwidth to preserve signal information [4]. As a result, redundant information can only be discarded in the compression stage, which wastes the sampling resources acquired by the front-end using high-cost detectors.
Compressive sensing (CS) technology states that if the signal is sparse in a certain transform domain, the high-dimensional signal can be projected to a low-dimensional space through a measurement matrix irrelevant to the transform basis and can be accurately recovered with a sampling rate much lower than that required by the Nyquist sampling theorem [5]. Therefore, CS technology breaks through the bottleneck of the Nyquist sampling theorem and can collect scene data at a low sampling rate (much lower than the Nyquist sampling rate). And it can complete data compression at the same time as data collection. In addition, the CS reconstruction algorithm can ideally reconstruct the original data according to the collected sampling data under the premise that the original data is sparse [6], which relieves enormous pressure on data storage and real-time transmission.
The research works on imaging system [7], [8] have verified the feasibility of CS-based ORS imaging system through numerical simulation experiments. The imaging system simultaneously performs sampling and compression by hardware at the sensing stage via CS technology. Therefore, it can reduce the demand for sampling equipment, effectively reduce sampling data, save storage space, and reduce transmission costs. When CS-based ORS imaging system faces the task of ship detection, the information we are interested in is the location attribute of the ship. Fig. 1(a) shows the routine pipeline of CS-based ORS imaging system to perform ship detection tasks. First, the optical system compresses and samples the ORS scene to obtain CS measurements. Then, the original scene is reconstructed using an image reconstruction algorithm [9], [10]. Finally, the image-based ship detection algorithm [11], [12], [13], [14] is used on the reconstructed scene to get the ship detection result. However, the process of reconstructing the measurements to the original scene is computationally costly, memory demanding, This  and time-consuming. Therefore, avoiding the process of scene CS reconstruction, i.e., directly performing ship detection on the measurements, can effectively solve the above problems.
In this article, when the CS-based ORS imaging system performs ship detection tasks, we innovatively propose a pipeline, as shown in Fig. 1(b). First, the same as step one in Fig. 1(a), the optical system compresses and samples the ORS scene to obtain the measurements. Then, the measurements-based ship detection algorithm is directly used for the measurements to obtain the ship detection result. This avoids the process of scene CS reconstruction.
Recently, there have been a lot of researches [15], [16], [17] on convolutional neural network (CNN)-based image CS. They use a convolutional measurement layer (CML) to obtain CS measurements of the scene. And the weight value of the CML convolution kernel after training is the learned measurement matrix (LMM). Thanks to the powerful self-learning ability of CNN, LMM can better retain the feature information of the image, thereby improving the quality of image reconstruction. Inspired by these researches, we use LMM in CS-based ORS imaging system to obtain measurements of the scene instead of the predefined measurement matrix (PMM). As shown in Fig. 1(b), LMM compresses and samples the scene, and then the measurements-based ship detection algorithm performs ship detection on the measurements. In this way, the joint training optimization of the scene compression sampling phase and the measurements' ship detection phase can be realized through the end-to-end training method. LMM can retain better scene features for subsequent measurements' ship detection, thereby realizing ship detection on CS measurements.
To directly carry out ship detection on CS measurements, we adopt the method shown in Fig. 1(b) and design a CNN-based algorithm, CS-CenterNet, which achieves high-precision ship detection on CS measurements by jointly training the scene compression sampling phase and the measurements' ship detection phase. The overall framework of CS-CenterNet is shown as in Fig. 2. To simulate the scene compression sampling phase, we use CML without bias or activation function to measure the scene. CML not only adaptively generates the LMM from training scenes but also can be jointly trained with the measurements' ship detection network. Besides, since the physical features of ships are extremely compressed in measurements, an optimized hourglass network (OHgN) is introduced to extract high-resolution feature information of the measurements. Compared with the previous hourglass network (HgN) [18], the squeeze-and-excitation network (SENet) is added to OHgN, which enables OHgN to aim to focus on the salient areas that contain ships. Moreover, since the prediction of ship features is extremely difficult in measurements, an optimized three-branch head network (OTBHN) is introduced to perform heat-map prediction, center-point offset prediction, and width-height prediction. Compared with the previous three-branch head network (TBHN) [19], the feature refinement network (FRNet) is added to OTBHN, which enables OTBHN to improve ship detection accuracy.
Our main contributions are as follows. 1) When CS-based ORS imaging system faces the task of ship detection, we innovatively propose the pipeline as shown in Fig. 1(b), which avoids the scene CS reconstruction process. And we design the CS-CenterNet to complete the pipeline, which implements ship detection on CS measurements of ORS scenes. 2) We convolutionally encode the scene using CML without bias or activation function to obtain CS measurements, which simulates the compression sampling process in CSbased ORS imaging systems. 3) Considering that the physical features of ships and backgrounds are extremely compressed in CS measurements, a novel OHgN is designed, which extracts the highresolution feature information from CS measurements. 4) For feature prediction of high-resolution feature information, a novel OTBHN is designed, which refines ship features and improves detection accuracy. The experimental results on the HRSC2016 dataset [20] and LEVIR dataset [21] demonstrate that CS-CenterNet implements excellent ship detection performance on CS measurements of ORS scenes, proving the feasibility of our model.
The rest of this article is summarized as follows. In Section II, we review the related works of CNN-based compression sampling in CS, CNN-based ORS image ship detection, and compressed learning: image processing in the measurement domain. Section III introduces the structure of CS-CenterNet in detail. Section IV shows the experimental results and results discussion. Finally, Section V concludes this article.

A. CNN-Based Compression Sampling in CS
The compression sampling network can be connected to the reconstruction network for end-to-end training. And the kernel weight value of the trained CML is LMM. The CS measurements collected by LMM can obtain more image information, which is more beneficial for image reconstruction. Xiao et al. [17] proposed fused features and perceptual loss encoder-decoder residual network (FFPL-EDRNet), which connects CML and reconstruction network for end-to-end training. LMM in this model can improve the image reconstruction quality in CSbased ORS imaging system. Zhao et al. [22] proposed a region of interest (ROI)-aware compressive sensing network (ROI-CSNet), which achieves higher reconstruction quality in ROI while preserving scientific quality in the rest of the image. The measurement matrix in this model is also LMM. Shi et al. [15] proposed an image CS model using CNN (CSNet), which contains a sampling network and a reconstruction network. The measurement matrix in this network is also LMM. Shi et al. [16] proposed a multiscale model for image CS (SCSNet), which uses a sampling network to learn sampling operators and implement the compression sampling process. Shi et al. [23] proposed a novel video CS model based on CNN (VCSNet) for the CS of video to explore both intraframe and interframe correlations. The model still uses the sampling network to learn the sampling operator and implement the compression sampling process. All these CNN-based methods are used to implement the CS process of images or videos, i.e., signal compression sampling and measurements' reconstruction, rather than for ship detection on CS measurements.

B. CNN-Based Ship Detection on Images
The CNN-based ship detection on images can effectively learn complex features and achieve high-accuracy ship detection. Guo et al. [11] proposed a rotational Libra R-CNN, which adds the balanced feature pyramid module and the intersection over union-balanced sampling module to overcome the limitation of dense distribution and different scales. Wang et al. [12] proposed SDGH-Net, which avoids the overfitting problem through Gaussian heatmap regression. Wang et al. [13] proposed fused features and rebuilt (FFR) YOLOv3, which improves the speed and accuracy of ship detection in ORS images. Fu et al. [14] proposed a feature balancing and refinement network (FBR-Net), which achieves an excellent ship detection effect in the case of the wide diversity of scales and the strong interference of the nearshore background. Shan et al. [24] proposed the SiamFPN, which can realize visual object tracking in various maritime applications. All these CNN-based methods are used to improve the accuracy of image ship detection, rather than for ship detection on CS measurements.

C. Compressed Learning: Image Processing in the Measurement Domain
Compressed learning (CL) is a joint signal processing and machine learning framework, which can infer signals from a small number of CS measurements. Calderbank et al. [25] provide a theoretical basis for reasoning directly in the compressed domain. Moreover, the CL method proposed by them uses the support vector machine (SVM) classifier to realize image classification in the measurements, which has a high probability, and its real accuracy is close to the accuracy of the best linear threshold classifier in the data domain. Lohit et al. [26] proved that CNN can extract nonlinear features from CS measurements for image recognition based on the theory of Zisselman et al. [27] proposed an end-to-end method to solve CL, which is composed of fully connected layers and convolutional layers. In the training stage, the sensing matrix of the fully connected layers and the nonlinear inference of the convolutional layers are jointly optimized. Although these CL methods are actually image processing of CS measurements, their back-end processing only uses machine learning method for image classification, not complex target detection.

III. METHODOLOGY
CS-CenterNet is designed to construct a high-accuracy ship detection framework on CS measurements. The overall framework of CS-CenterNet is shown in Fig. 2, including three key components: 1) Scene compression sampling part: CML. 2) CS measurements feature extraction part: OHgN. 3) CS measurements feature prediction part: OTBHN. Section Ⅲ-A-Ⅲ-E starts with an overview, then the detailed implementations of the three key components in the model, and finally the detailed introduction of the joint training optimization of CS-CenterNet.

A. Overview of CS-CenterNet
To begin with, we describe the problem as follows. Given the ORS scene X to obtain the CS measurements Y , CML is used to compress sampling the scene X. The process simulates the compression sampling process of CS-based ORS imaging system. This process can be expressed as where CML(·) denotes the compression sampling process. Then, given the CS measurements Y , the backbone network is used to extract high-resolution convolutional features, and the feature prediction network is used to predict the category and location information of the ship. Therefore, we design a backbone network, OHgN, to extract high-resolution feature information F OHgN of CS measurements. This process can be expressed as where OHgN(·) denotes the feature extraction process. We also design a feature prediction network, OTBHN, to refine the feature information F OHgN and predict the ship information. This process can be expressed as where OTBHN(·) denotes the feature prediction process. Finally, we jointly train the CML(·), OHgN(·), and OTBHN(·) by learning all parameters in CS-CenterNet. Specifically, the overall network is trained using the loss function L det and all parameters Θ are updated with (4).
Moreover, after the overall framework is jointly trained, the weight value in CML is the measurement matrix in CS-based ORS imaging system. The OHgN and OTBHN constitute the ship detection model on CS measurements of ORS scenes. In Algorithm 1, we present more details of CS-CenterNet.

B. Scene Compression Sampling Part
In the traditional compression sampling problem in CS theory, first, the scene needs to meet the sparse condition, and then the sampling matrix needs to meet the restricted isometry property (RIP). The existing sampling matrices are all signal-independent, and do not consider the characteristics of the sampled signal so that more information cannot be retained in measurements. The CNN-based method can solve the compression sampling problem in CS more effectively.
The prerequisite for ship detection on CS measurements is the acquisition of measurements. The CS-based ORS imaging system compresses and samples the scene with the measurement matrix to obtain CS measurements. Therefore, the key point is the design of the measurement matrix when simulating the compression sampling process of the imaging system. In the design of measurement matrix in this article, we refer to the compression sampling process in the related work [15] on CS reconstruction, i.e., LMM is adopted. It is worth noting that, the weight value of CML convolution kernel after training is LMM. Therefore, we adopt a CML without bias or activation function to measure the scene to simulate the compression sampling process of the imaging system. MRs is the measurement rates in CS, i.e., the ratio of the compression measurement data obtained by the ORS imaging system based on CS to the original scene data.
The compression sampling process is shown in Fig. 3. As shown in Fig. 3 i are acquired using a measurement matrix Φ CML of size MRsB 2 Dp × B 2 Dp. This process can be expressed as Since the number of columns in the measurement matrix Φ CML is B × B × Dp, the size of each convolution kernel in CML is also B × B × Dp, so that each convolution kernel outputs one measurement. Since the number of rows It should be noted that the stride of CML is B × B for nonoverlapping sampling. Furthermore, there is no bias or activation function in CML. As shown in Fig. 3(b), the output of each image block from CML is composed of MRs × B × B × Dp feature maps.
where * denotes the elementwise convolution. X denotes the scene. W CML denotes the weight value of CML, i.e., LMM in the CS. Y denotes the CS measurements of the scene. Since the number of convolution kernels needs to satisfy the inequalities MRs × B × B × Dp ≥ 1, the MRs can be any frequency larger than 1 12 ( 1 12 ≈ 8.33%). To avoid the contingency of scene compression sampling at a single MR, MRs will be directly taken as 25%, 10%, 4%, and 1% in the research works [10], [28] of CS. Therefore, the corresponding relationship between B × B strides and MRs is shown in Table I in this article. Fig. 4 provides the frequency domain visualization results of PMM (Gaussian random matrix) and LMM on the HRSC2016 dataset at MRs = 25%. Since the data dimension of the image block is 12 (2 × 2 × 3) and the MRs is 25%, the scale of the PMM is 3 × 12. The expression of the PMM is shown in (7) at the bottom of this page. Similarly, since the data dimension of the image block is 12 and the MRs is 25%, the scale of the LMM is 3 × 3 × 2 × 2 (the first 3 is obtained by 12 × 25%, the second 3 is the depth of the convolution kernel, and 2 × 2 is the size of the convolution kernel). We select all three rows from each of LMM and PMM for visualization. To obtain a better visual effect, the frequency visualization is the result of the Fourier transform of each row of measurement matrix. It can be seen from Fig. 4 that the frequency of each PMM row (PMM[0], PMM [1], and PMM [2]) is randomly distributed, i.e., PMM will randomly sample scene information, while the frequency of each LMM row (Conv Kernel 0[nc], Conv Kernel 1[nc], and Conv Kernel 2[nc] with nc = 0, 1, 2) is a regular distribution, i.e., LMM will sample the specific frequency information of the scene. As we all know, the specific frequency sampling of the scene can better maintain the scene feature information than the random frequency sampling of the scene. Therefore, by training the compression sampling phase of the scene together with the ship detection phase of the CS measurements, LMM captures scene features information more efficiently than the PMM.

C. CS Measurements Feature Extraction Part
Since the data volume of CS measurements is much lower than their corresponding original scenes, the feature extraction network of CS measurements needs to aggregate global information and multi-scale local information to obtain high-quality high-resolution feature information. We are inspired by the context refinement module (CRM) in [29] and adopt another existing unified framework structure HgN which captures and integrates information across all scales of measurements. Therefore, to extract the high-resolution feature information of measurements, we design the backbone network, OHgN, based on the HgN.

1) HgN:
The structure of the HgN in this article is as shown in Fig. 6, where ResB1 denotes the residual block with 1 2 downsampling, ResB2 denotes the residual block without downsampling, and Conv denotes the convolutional layer.
The CS measurement Y from CML is denoted as Y ∈ R w m ×h m ×MRsB 2 Dp , where w m × h m is the size of measurements. First, a ResB1 is adopted for feature extraction to obtain a 128 × 128 × 256 feature map C1, and then ResB1 is adopted to perform five consecutive feature extractions. To aggregate the feature information of two adjacent sizes, upsampling and cross-scale feature combination methods are used. C6a and C5a with the same size are added elementwise and then the nearest neighbor upsampling is performed to obtain C5b. C5b and C4a are added elementwise and the nearest-neighbor upsampling is also performed to obtain C4b, and so on to get the respective upsampling results (C5b, C4b, C3b, C2b, C1b). The feature map sizes of C5b, C4b, C3b, C2b, C1b are increased to 1 16 , 1 8 , 1 4 , 1 2 , 1 and the feature map dimensions are reduced to 512, 384, 384, 384, 256 in turn. After reaching the output resolution of 128 × 128, a 3 × 3 Conv is applied to generate the final high-resolution feature map. In HgN, low-level, weak semantic features have rich location information, which is very useful for object positioning. Highlevel, strong semantic features have rich semantic information, which is very useful for object classification. Therefore, the two characteristics are fused in HgN. The advantage of HgN is that it can capture global and local features in a single unified structure. Therefore, the final feature information can contain almost all the critical points of the detected object.
Although HgN can extract high-resolution feature information from CS measurements, it cannot select the information that  is more critical to ships from measurements. However, SENet can devote more attention to the ships' area to obtain more detailed information about the ships, thereby suppressing other useless information. Therefore, we add SENet to HgN to refine the features and focus on the salient areas that contain ships.
2) SENet: The position where SENet is added to HgN is shown in Fig. 2. Specifically, SENet is added to the Bottleneck-Layer part of HgN. The feature processing process of SENet is shown in Fig. 7. The input feature of it is denoted as F int ∈ R ( W sse )×( H sse )×Cse , where C se is the channel dimension of feature F int , and s se is the corresponding downsampling ratio to the input scene (s se = 32). Maxpool denotes the max-pool layer, Avgpool denotes the avg-pool layer, and Sig denotes the sigmoid function. First, it uses the Maxpool and Avgpool along the channel axis to generate features F max , F avg ∈ R ( W sse )×( H sse )×1 . Then, it applies a 3 × 3 Conv and a Sig to get the feature . The calculation of the above process is shown in (8). Finally, the feature A SENet is multiplied by the initial feature F int as shown in (9).
where × denotes the elementwise multiplication. Especially, the tensor dimension of F SENet ∈ R ( W sse )×( H sse )×Cse is the same as the We select eight channels for visualization from the 128 × 128 × 256 high-resolution feature map predicted by HgN and OHgN on two ORS scenes at MRs = 25%. The eight channels are the 1th channel, the 32nd channel, the 64th channel, the 96th channel, the 128th channel, the 160th channel, the 192th channel, the 224th channel, and the 256th channel, whose visualization results are shown in Fig. 8. It can be seen from Fig. 8 that the high-resolution feature map predicted by OHgN contains more ship target area information than HgN. This is because the SENet in OHgN can refine the features and focus more on the target area of the ship.

D. CS Measurements Feature Prediction Part
The traditional feature prediction network adopts the anchor box method to predict the category and location information of the target. However, using anchor boxes introduces many hyperparameters and design choices. These hyperparameters make network-tuning difficult and also increase network complexity and computational complexity. Recently, the research work on anchor-free [30] showed that the anchor-free method can eliminate the anchor problem and ensure detection accuracy. Therefore, we also predict the category and location information of ships based on an anchor-free method.
To adopt an anchor-free method for heat-map prediction, center-point offset prediction, and width-height prediction of high-resolution feature information, we design the feature prediction network, OTBHN, based on the TBHN.

1) TBHN:
The structure of the TBHN in this article is as shown in Fig. 9. The feature F OHgN from the OHgN is denoted where C tb is the channel dimension of feature F OHgN (C tb = 256), and s tb is the corresponding downsampling ratio to the input scene (s tb = 4) [19]. All branches in the TBHN have a 3 × 3 × 256 Conv and a 1 × 1 × 256 × T a Conv with a = 1, 2, 3 (T 1 = Cls, T 2 = 2, T 3 = 2, where Cls is the number of categories). Especially, a 3 × 3 Maxpool is used to perform the equivalent nonmaximum suppression execution in the extraction branch of peak key points.
The illustration of the detection based on the center-point for TBHN is shown in Fig. 10. First, we independently extract the peak points on the heatmap of each category Cls. Then, we useĈP to denote the set of n detected center-points [ where (x i ,ŷ i ) denotes the predicted each key point position]. Finally, we get the coordinates of the upper left corner and the lower right corner of the prediction box and generate a horizontal box at this position. We denote the coordinates of the upper left corner and the lower right corner of the prediction box as follows: where (ox i , oŷ i ) denotes the predicted position offset and ŵ i ,ĥ i denotes the predicted size. Because Conv in TBHN is a fixed geometric structure, it limits its modeling of geometric deformation. To strengthen the ability of the feature prediction network to model deformation, we refer to the deformable convolution in [31]. By learning an additional offset, the deformable convolution makes the feature offset to focus on the target area of interest, which helps to solve the structural information between similar objects, thereby improving the accuracy of ship detection. Therefore, we design FRNet, which mainly uses deformable convolution to refine features. The FRNet is added to TBHN to refine the ship features and improve detection accuracy.
2) FRNet: The position where FRNet is added to TBHN is shown in Fig. 2. The structure of FRNet is shown in Fig. 11 Then, we use the adjusted feature conf1 to generate the feature off1 and size1 with features conf and size. Afterwards, a 3 × 3 Conv is applied for feature off1, size1 , and conf1 to generate the features off11, size11 , and conf11. The calculation of the above process is as follows:   Finally, we adopt deformable convolution [32] for FR-Net. The kernel off set fields (offset off11 , offset conf11 , and offset size11 ) of the three features (of f 11, conf11, and size11) were originally generated, respectively, by using a 1 × 1 Conv. Afterwards, a 3 × 3 deformable Conv is applied to offsets (offset of f 1 , offset conf1 , and offset size1 ) to obtain the refined fea- The calculation of all process is as follows: (conf2, off2, size2) = F RNet(conf, off, size) (15) where F RNet(·) denotes the operation of FRNet module, and the features conf2, off2, size2 denote the output of the features after refining treatment of FRNet, respectively. We visualize the heat maps predicted by the TBHN and OTBHN on two ORS scenes at MRs = 25%. The visualized result is shown in Fig. 12. It can be seen from Fig. 12 that OTBHN can locate the ship position more accurately than  TBHN. This is because FRNet is added to OTBHN, and the deformed convolution in FRNet refines the features.

E. CS-CenterNet Joint Training Optimization 1) Loss Function:
Our training loss function consists of three parts.
where L k , L size , and L off denote the heat-map loss, center-point offset loss, and width-height loss, respectively. λ size and λ off are hyperparameters. Inspired by Zhou et al. [19], we set their values to 0.1 and 1, respectively. Considering the imbalance between negative and positive samples, the focal loss [33] is adopted for L k . The calculation method for L k is as follows: where Y xyz denotes the key point heatmap of the target;Ŷ xyz denotes the key point heatmap of the network output (Ŷ xyz ∈ [0, 1]); N denotes the number of key points in the scene; and α and β are hyperparameters. Inspired by Zhou et al. [19], we set their values to 2 and 4, respectively. L size and L off adopt the L1 loss, and they can be formulated as where s k denotes the true length and width of the target andŜ pk denotes the predicted length and width of the target.
where p denotes the center-point of the target box,p denotes the center-point of the predicted box,Ôp denotes the output offset of the network, and R denotes the downsampling multiple (R = 4 [19]).

2) Joint Training Optimization:
Since joint training optimization plays a vital role in the detection performance of ships, we train the CML and the measurements' ship detection network by learning all parameters in the model. The set of all parameters in the model can be expressed as Θ = {W CML , W OHgN , W OTBHN }, where W OHgN denotes the network parameters of OHgN and W OTBHN denotes the network parameters of OTBHN. And the process of joint training is to obtain the optimal network parameters Θ. During training, the input and output of CS-CenterNet are the scene information and the ships' location information (box_xmin, box_xmax, box_ymin, box_ymax), respectively, i.e., the training samples are represented as {scene, ship location information}. After the training optimization, optimized CML simulates the compression sampling process of CS-based ORS imaging system. What's more, OHgN and OTBHN constitute the ship detection model on CS measurements of ORS scenes.
As shown in Fig. 13, the black arrow denotes the joint training process of the scene compression sampling part and the measurements' ship detection part. The red arrow denotes the test process of the measurements' ship detection. First, Fig. 11. Illustration of the structure of FRNet.  the trained CML compresses and samples the ORS scene, and then the measurements' ship detection network detects the ship information on CS measurements.

A. Dataset
We evaluate our model on two public ORS scene datasets: The HRSC2016 dataset [20] and LEVIR dataset [21]. There are 1680 images in the HRSC2016 dataset. In the experiment, the number of samples of the training set, validation set, and test set is divided as shown in Table Ⅱ. The training set, validation set, and test set contain 1176, 168, and 336 images, respectively. Some ORS scenes in the dataset are shown in Fig. 14(a). There are 1482 images in the LEVIR dataset for ship object. In the experiment, the number of samples of the training set, validation set, and test set is divided as shown in Table Ⅱ. The training set, validation set, and test set contain 1037, 149, and 296 images, respectively. Some ORS scenes in the dataset are shown in Fig. 14(b).

B. Implementation Details and Parameters
The training and testing of CS-CenterNet require a relatively high hardware environment, so we use the experimental environment in Table Ⅲ to train the model. Since the size of the image data in the experiment is different, it is uniformly adjusted to 512 × 512 before inputting the model. Table IV shows the parameter settings during the CS-CenterNet training process. In particular, considering the 8G limitation of GPU memory, we set the batch size to 2. The initial learning rate of 0.001 is reduced by half every 10 epochs.

C. Evaluation Metrics
Different from the target detection models based on Intersection over Union (IoU) [35], [36], [37]   Precision and recall rate are usually used as the evaluation criteria for target detection, and their calculation methods are shown in (20) and (21).
However, because the precision rate and recall rate are numerically contradictory, we add F1 and AP value as evaluation indicators. F1 is a comprehensive indicator of the imbalance between precision and recall. The AP reflects the overall quality of the network, which defines the average precision under a set of equidistant recall rates S= {0, 0.01, . . . , 1}. In this article, we calculate AP when the IoU threshold is 0.5. The calculation methods of F1 and AP are as follows:

D. Comparison With Ship Detection Based on ORS Images
To test the effect of CS-CenterNet on the ship detection of CS measurements, we need to compare it with the ship detection model based on ORS images. Since CS-CenterNet refers to CenterNet [19], whose backbone network is ResNet50, we compare the ship detection performance of CS-CertenNet with CenterNet. In particular, CS-CenterNet is a ship detection model based on the CS measurements of ORS scenes while CenterNet is a ship detection model based on ORS images.
Tables Ⅴ and Ⅵ show the ship detection results in the HRSC2016 and LEVIR datasets, respectively. Figs. 15 and 16 show the ship detection effect of some images of the models in the HRSC2016 and LEVIR test sets, respectively. The blue box denotes the ground truth, the green box denotes TP, the red denotes FP, and the pink denotes FN.    Table Ⅴ, the CS-CenterNet at MRs = 25% score is 84.35% in terms of detection precision, and the CS-CenterNet score is 92.19% in terms of recall. The precision and recall of the CS-CenterNet can get its F1, which is 0.8810. In terms of AP, the CS-CenterNet score is 90.76%. The P, R, F1, and AP of CS-CenterNet at MRs = 25% is higher than that of CenterNet. According to Fig. 15, we can find that CS-CenterNet has better ship detection performance in terms of visual quality.
According to Table Ⅵ, the CS-CenterNet at MRs = 25% score is 70.60% in terms of detection precision, and the CS-CenterNet score is 78.20% in terms of recall. The precision and recall of the CS-CenterNet can get its F1, which is 0.7421. In terms of AP, the CS-CenterNet score is 75.44%. The R, F1, and AP of CS-CenterNet at MRs = 25% is higher than that of CenterNet and the P of CS-CenterNet at MRs = 25% is basically the same as that of CenterNet. According to Fig. 16, we can find that CS-CenterNet has better ship detection performance in terms of visual quality.
It is worth noting that the quantitative indicators of the ship detection results in the LEVIR are lower than that of HRSC2016, because the ship targets in the LEVIR are smaller and denser.
Although the data volume of the CS measurements is only 25% of the original scenes, CS-CenterNet has detection P, R, F1, and AP that will not decrease but slightly increase compared with the CenterNet. This is because the backbone network HgN can fully extract the feature information in the CS measurements and the SENet added to it can improve the accuracy of ship detection. In addition, FRNet in THBN can refine ship features, which again improves the accuracy of ship detection. According to Figs. 15 and 16, we can find that CS-CenterNet has better ship detection performance in terms of quantitative indicators visual quality.
In short, for the innovation pipeline to complete the ship detection task in CS-based ORS imaging system, CS-CenterNet can directly detect ships on CS measurements while ensuring the quality of the detection.
In addition, we also tested the parameter size of the CS-CenterNet and CenterNet. Table Ⅶ shows the parameter size of them.
The parameter quantity of CS-CenterNet is much higher than that of CenterNet because CS-CenterNet detects CS measurements while CenterNet detects scenes. Therefore, the complexity of the former feature detection network OHgN and feature prediction network OTBHN is higher than that of the latter feature detection network ResNet50 and feature prediction network TBHN. In addition, the parameter quantity of CS-CenterNet at MRs = 25% is lower than that of CS-CenterNet at MRs = 10%, which is caused by different data dimensions of CS measurements. The input of CS-CenterNet at MRs = 25% is the measurements with dimension 256 × 256 × 3, and the input of CS-CenterNet at MRs = 10% is the measurements with dimension 256 × 256 × 1.

E. Ablation Studies
To verify the effect of the HgN, SENet, and FRNet modules in CS-Center, we conduct ablation studies on these three modules. The MRs of the experiment in this subsection are set to 25%.

1) Effect of HgN:
To evaluate the performance of HgN, we conduct ablation experiments on HgN, and the corresponding experimental results are shown in the first and second rows of Table Ⅷ and Fig. 17. In the first and second rows of Table Ⅷ, "ResNet50 [38]+TBHN" and "HgN [18]+TBHN" can analyze the HgN performance. It can be seen that P increased by 4.47%, R increased by 9.66%, F1 increased by 0.0671, and AP increased by 11.16%. According to Fig. 17, we can find that using HgN as the backbone has better ship detection performance in terms of visual quality, especially the correct ship position can be detected effectively.
Therefore, using HgN as the backbone can achieve better detection accuracy. This is because HgN can capture global and local features from CS measurements.
2) Effect of SENet: To evaluate the performance of SENet, we conduct ablation experiments on SENet, and the corresponding experimental results are shown in the second and third rows of Table Ⅷ and Fig. 18. In the second and third rows of Table Ⅷ, "HgN+TBHN" and "OHgN+TBHN" denote that the backbone network is different to analyze the SENet performance. The detected comprehensive indicators F1 and AP are improved. According to Fig. 18, we can find that using OHgN as the backbone has better ship detection performance in terms of visual quality, especially the false detection of ships can be effectively reduced.
Therefore, using OHgN as the backbone can achieve better detection accuracy. This is because SENet in OHgN can focus on the salient areas that contain ships in the compressive measurements.

3) Effect of FRNet:
To evaluate the performance of FRNet, we conducted ablation experiments on FRNet, and the corresponding experimental results are shown in the third and fourth rows of Table Ⅷ and Fig. 19. In the third and fourth rows of Table Ⅷ, "OHgN+TBHN" and "OHgN+OTBHN" can analyze the FRNet performance. It can be seen that the P and F1 of the network are greatly improved. According to Fig. 19, we can find that using OTHBN as the prediction has better ship detection performance in terms of visual quality, especially more small ships can be detected effectively.
Therefore, using OTHBN as the prediction can achieve better detection accuracy. This is because FRNet can refine the ship features.   Table Ⅸ. From Table Ⅸ, we can find that the ship detection performance of CS-CenterNet at MRs = 10% is worse than the ship detection performance of CS-CenterNet at MRs = 25%. This is because as the amount of acquired scene data decreases, the features of ships in the CS measurements decrease, which leads to a decrease in the detection performance of ships.
2) Ship Detection Performance of CS-CenterNet Under Different B × B: In CS-CenterNet, we adopt CML to perform block compression sampling processing on the ORS scene. As explained in Section Ⅲ-B, the resolution size W B × H B of the CS measurements is related to the block size B × B in the block compression sampling. Here, we test the effect of ship detection performance of CS-CenterNet under different B × B. It can be seen from Table Ⅹ that the block size of 2 × 2 will obtain the best ship detection results. This is because the resolution of the CS measurements obtained by the block size of 2 × 2 is W 2 × H 2 , which is higher than the resolution W 4 × H 4 of the CS measurements obtained by the block size of 4 × 4. The high-resolution CS measurements are more conducive to the backbone network to extract ship feature information, and improving the accuracy of ship detection.
3) Failure Cases of Our CS-CenterNet: Fig. 20(a)-(d) shows some failure cases of our CS-CenterNet at MRs = 25%. CS-CenterNet is not sensitive to small ships. For example, the two failure cases (a) and (b) reflect that the prediction box of CS-CenterNet cannot detect some small ships. Moreover, CS-CenterNet is sensitive to ship-like objects. For example, the two failure cases (c) and (d) reflect that the prediction box of CS-CenterNet incorrectly detects long objects and small bases on the shore as ship objects.
In future work, it is of great significance to set appropriate hyperparameters according to the ship object and improve the distinguishing ability of the model.

4) Limitations:
To simulate the compression sampling process of CS-based ORS imaging system, we use CML to compress the scene to obtain CS measurements. However, this method of acquisition is ideal. In future work, we will obtain the CS measurements on the physical platform of a CS-based ORS imaging system.

V. CONCLUSION
This article proposes an efficient model, CS-CenterNet, for ship detection on CS measurements of ORS scenes. Specifically, our model uses CML to perform convolutional coding on the scene to obtain the CS measurements, which simulates the block compression sampling process in CS-based ORS imaging system. A OHgN is designed, which can effectively extract the high-resolution feature information of measurements. A OTBHN is designed, which can refines the ship features and perform feature prediction with high accuracy. Experiments based on the HRSC2016 dataset show that the detection precision of our model for the detection of ships with measurements in ORS scenes is 84.35%, the recall is 92.19%, the F1 value is 0.8810, and the AP value is 90.76%. Therefore, it can achieve high-accuracy ship detection on CS measurements of ORS scenes. In the future, we will try to perform experiments of measurements' ship detection on the physical platform of CS-based ORS imaging system (such as the CS-based ORS camera). What is more, we will continue to study the basic theories of deep learning (DL) to better design the network structure and the detection accuracy of small ships.