Feature-Compression-Based Detection of Sea-Surface Small Targets

This paper aims to develop a feature-based detector using seven existing salient features of radar returns to improve the detection ability of high-resolution maritime ubiquitous radars to sea-surface small targets. Maritime ubiquitous radars form simultaneously dwelling beams at multiple azimuths by digital array receiver and allow long observation time for detection. Due to absence or incompletion of training samples of radar returns with various types of sea-surface small targets, the detection boils down to designing a one-class classifier in the seven-dimensional (7D) feature space mainly by using training samples of sea clutter. A feature compression method, though maximizing interclass Bhattacharyya distance, is proposed to compress the 7D feature vector into one 3D feature vector with the help of simulated radar returns of typical targets. In the compressed 3D feature space, a modified convexhull learning algorithm is given to determine one convex polyhedron decision region of sea clutter at a given false alarm rate. In this way, a feature-compression-based detector is constructed, which can exploit more features of radar returns to improve detection performance. It is verified by the recognized and open IPIX and CSIR radar databases for sea-surface small target detection. The results show that it attains obvious performance improvement.

INDEX TERMS High-resolution maritime ubiquitous radars, sea-surface small targets, feature compression, convexhull learning, one-class classifier, feature-compression-based detector. The associate editor coordinating the review of this manuscript and approving it for publication was Chengpeng Hao .

I. INTRODUCTION
As one of important tasks of sea battlefield perception, it is always a difficult problem for maritime surveillance radars to find sea-surface small targets, such as small boats, icebergs, frogmen, debris, and periscopes of submarines. Owing to low power level of sea clutter, high-resolution radars are used to improve signal-to-clutter ratios (SCR) of small target returns. High-resolution maritime radars often adopt dwelling mode or fast scan mode to obtain integration gain of target returns in radar slow time [1]- [4]. At fast scan mode, interscan noncoherent or binary integration is used for detection, for instance, anti-submarine radars [1], [2]. Experimental radars for sea-surface small target detection often work at dwelling mode, i.e., radar beam stares at one azimuth angle to collect data [3], [4]. In conventional surveillance radar systems, long observation time for small target detection and search efficiency at azimuth are conflictive, which is one of reasons for long time integration methods are limited in practical applications. Recently, the conflict is completely removed in ubiquitous radars [5], [6]. Ubiquitous radars using VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ MIMO digital array can implement all-time observation at all directions by digital beam forming (DBF) of multiple channels and thus allow long observation time for sea-surface small target detection. Therefore, it becomes imperative to develop an effective method to detect sea-surface small targets in the case of long observation time.
In high-resolution ubiquitous radars, sea-surface small target detection suffers from difficulties from three aspects. First, both high spatial and Doppler resolution of radars can lift SCR over a critical value of small targets to be detectable, but some small targets must be detected in low SCR cases. Second, high resolution complicates sea clutter characteristics and aggravates the effect of sea spikes [7], [8]. Third, interactions between sea-surface small targets and waves make that radar returns of sea-surface small targets have complex amplitude and Doppler modulations, which are difficult to be parameterized by some simple models. A series of attempts have been made, unfortunately no one attains satisfactory results in all cases, even on a special database such as the IPIX database [3] or CSIR database [4].
Traditional moving target identification/detection (MTI/MTD) techniques [7]- [9] fail to detect sea-surface small targets, because target weak returns fall inside the main clutter region of sea clutter in the Doppler domain, and have complex amplitude and Doppler modulations. As a way once expected, nonlinear time series analysis is the common foundation of the fractal-based methods [10]- [14] and the neural network (NN) learning methods [15]- [17]. Fractal features are not as a replacement of other statistics but only as additional indexes of sea clutter characteristics. The fractal-based methods require observation time long up to several seconds and have low detectability in some cases, a common defect of most of detection methods using one single feature. Neural network (NN) learning methods originate from the understanding that sea clutter time series is the output of an unknown nonlinear chaotic system [10]. Several structural NNs [15]- [17] are trained by sea clutter data to short-term predict the behavior of sea clutter, and the predictive error is naturally used as a statistic to find anomaly in data. The NN learning methods have two flaws. Radar returns with targets do not participate in the learning process from first to last, and a detection problem relevant to two-class classifiers is mandatorily reduced to one-class classifiers [18]. The NN-based detectors can be interpreted as one-class classifiers using full implicit features of sea clutter. To minimize the predictive error, the NN must capture as many implicit features as possible. Most of them serve for sea clutter description rather than discriminating target returns from sea clutter, so they degrade the generalization ability of the NN learning methods.
Due to diversity of sea-surface small targets and their complicated interactions with waves, it is impossible to collect radar returns of all sea-surface targets of interest in all cases. We refer to the phenomenon as the incompleteness of training data of returns with targets. However, it is rather important for radar returns with targets to participate in the learning process. For instance, focusing on a special kind of small targets such as growlers [19], [20], the two-class classifiers using radar returns with targets in training attain satisfactory detection results. Besides, radar returns of simple targets can be simulated to train a detector and an experimental radar system using a NN as a tool is established [21]. Designing a detector, training samples of radar returns with targets are as important as that of sea clutter. One-class classifiers under the anomaly detection framework are unavoidable but not preferential choice in the sea-surface small target detection.
By qualitative analyses of sea clutter and radar returns with targets and introduction of three salient features of radar returns, the sea-surface small target detection boils down to designing a one-class classifier in a 3D feature space only using training samples of sea clutter [22], which is realized by means of convexhull learning algorithm [23], [24]. This trifeature-based detector behaves well on the IPIX database [3]. It is noted that characteristics of returns with targets take effect at feature selection instead of learning stage. In comparison with the ν-SVM algorithm for design of one-class classifiers [18], the convexhull learning algorithm can precisely control false alarm rate and gives visualized decision regions in the 3D feature space. Later, three time-frequency (TF) features are introduced to build a TF-tri-feature-based detector [25], which attains better overall performance on the IPIX database than the tri-feature-based detector [22]. Also, the tri-feature-based detector behaves better than the TF-trifeature-based detector in some datasets. It shows that the three TF features [25] are not replacement of the amplitude and Doppler features in [22]. Exploitation of more complementary features is an approach to further improve performance. However, the dimension limitation of the convexhull computation [23], [24] impedes the cooperation of more than three features.
In this paper, seven existing salient features are available to construct a feature-compression-based detector, so as to realize effective and robust detection of sea-surface small targets. Due to the dimension limitation of the convexhull learning algorithm, the 7D feature vector must be mapped on one feature space whose dimension is no more than three. Feature compression requires quantitative statistics of the feature vectors of sea clutter and radar returns with targets [26], [27]. The incompleteness in training data on radar returns with targets is a major obstacle of feature compression. The detection problem is a semisupervised two-class classification problem with unbalanced demand of mistake probabilities and unbalanced training data for the two classes. A large amount of sea clutter data and only a small quantity of radar returns with special test targets are available in learning. Referring to characteristics of sea-surface small targets in measured data, a generator is constructed to yield radar returns of typical targets. Simulated target returns plus sea clutter yield radar returns with targets and training samples for feature compression. Using the training samples of the two classes, a feature compression method is proposed to compress the 7D feature vector to a 3D feature vector by maximizing the interclass Bhattacharyya distance [28], [29]. In the compressed 3D feature space, a modified convexhull learning algorithm piloted by the training samples of radar returns with targets is presented to determine the decision region of sea clutter at a given false alarm rate. A featurecompression-based detector is constructed. The proposed detector is verified by data from the IPIX and CSIR databases and is compared with the early fractal-based detector, our previous detectors, and two more recent detectors [30], [31].
This paper is organized as follows. Section II reviews detection problem of sea-surface small targets. Seven existing salient features are introduced and their complementarity is analyzed. Section III presents a generator of target returns, proposes a feature compression method and constructs a feature-compression-based detector. Section IV includes experimental results and performance comparison. Finally, we conclude our paper in Section V.

A. DETECTION PROBLEM OF SEA-SURFACE SMALL TARGETS IN HIGH-RESOLUTION UBIQUITOUS RADAR
A radar attains high range resolution by transmitting wideband pulses and high Doppler resolution by dwelling long time at a beam position. When dwelling time is long up to a few tenths of second or several seconds, radar returns of a sea-surface small target have to be modeled as a nonlinear frequency modulated (FM) signal with complex amplitude fluctuation, because its radial velocity and RCS are severely affected by wind waves and swells [25], [32]. Due to long observation time and high range resolution, temporally nonstationary sea clutter time series is characterized by the compound-Gaussian model (CGM) with a time-varying texture or piecewise spherical invariant random vector model [7], [32], [33]. Sea clutter time series at adjacent spatial resolution cells can be regarded to have same or similar characteristics. Like in adaptive detection [9], radar returns received at spatial resolution cells around the cell under test (CUT) are used to predict characteristics of sea clutter at the CUT. Detection of sea-surface small targets boils down to the following binary hypothesis test [9]- [14], [22], [32], where z(n) is the received complex time series at the CUT, z p (n) is sea clutter time series at the reference cells around the CUT, H 0 is the null hypothesis, and H 1 is the alterative hypothesis, target returns are specified by amplitude series a(n), Doppler frequency series φ(n), and initial phase ϕ 0 . For simplification, the pulse repetition interval (PRI) or sampling interval of time series is neglected. Detection strategies depend on models of the target returns and sea clutter.
In the CGM, high-resolution sea clutter time series is expressed as the product of two independent stochastic sequences: slowly-varying non-negative texture sequence τ (n) and fast-varying speckle sequence u(n). τ (n) degenerates to a random constant when the observation time is shorter than the texture coherent length (TCL) of sea clutter. TCL is about several hundreds of milliseconds in X-band highresolution radars and relates to the time scale of tilt modulation from long waves [33]- [35]. u(n) follows complex Gaussian distribution of unit variance and has a decorrelated time of tens of milliseconds [7]. Hence, when the observation time is long up to the order of seconds, τ (n) must be modeled as a time-varying function [32]. Radar returns of sea-surface small targets have more complex characteristics. Target returns of sea-surface small targets have distinct amplitude fluctuation and Doppler modulation relevant to target type, movement state, and sea state and have to be modeled as nonparametric forms, such as smooth functions of time. Under nonparametric models, detection often relies on ad-hoc test statistics, referred to as features, instead of the optimum or suboptimal test statistics depending on rigorous parametric models of the target returns and clutter [36], [37].

B. SEVEN SALIENT FEATURES IN SEA-SURFACE SMALL TARGET DETECTION
This paper focuses on exploitation of existing salient features rather than feature extraction. This subsection gives a brief review of seven existing salient features in sea-surface small target detection and more details refer to [12], [14], [22], [25]. Two amplitude features are the normalized Hurst exponent (NHE) [12], [14] and relative average amplitude (RAA) [22]. Two Doppler features are the relative Doppler peak height (RDPH) and relative vector entropy (RVE) [22]. Three TF features are extracted from the normalized TF distribution (NTFD) and thresholded NTFD [25], including the ridge integration (RI) of the NTFD and the number of connected regions (NR) and maximal size of connected regions (MS) in the thresholded NTFD. The prefixes 'relative' and 'normalized' in the names of these features imply the idea of constant false alarm rate (CFAR). The features are computed from time series at the CUT and reference cells so as to adapt spatial-temporally varying characteristics of sea clutter.
In what follows, we briefly introduce the mechanism behind each feature. It is found that amplitude time series of sea clutter exhibit multifractal characteristics in a time scale of 0.01 to several seconds [12], [13]. Hurst exponent relevant to the fractal dimension of sea surface is a salient feature to detect radar returns with targets from sea clutter. Hurst exponent related with sea state and viewing geometry of the radar is spatial-temporally varying. The NHE feature is the Hurst exponent at the CUT normalized by the average value and standard deviation of the Hurst exponents at the reference cells, so as to adapt spatial-temporally varying characteristics of the Hurst exponent [14]. The RAA is a commonly-used test statistic in non-coherent CFAR detection. It is the average amplitude at the CUT divided by the average value of the average amplitudes at the reference cells [22]. The NHE and RAA features are dimensionless and take large positive values for returns with targets while small positive values for sea clutter.
Sea clutter and radar returns with targets exhibit differences in Doppler amplitude spectrum (DAS) [22]. The DAS of sea clutter has an obtuse peak because the power of sea clutter distributes on the wide main clutter region in the Doppler domain while that of radar returns with targets has a sharp peak, because the power of target returns concentrates on several Doppler bins. The difference is measured by the Doppler peak height (DPH): the ratio of the peak to the average amplitude on its two sides. In view of the fact that the DPH of sea clutter alters with sea state and viewing geometry of radar, the DPH at the CUT divided by the average value of the DPHs at reference cells generates the relative DPH (RDPH) [22]. It takes large positive values for radar returns with targets while positive values around one for sea clutter. The vector entropy of the DAS measures its complexity, and the relative vector entropy (RVE) [22] is the VE at the CUT divided by the average value of the VEs at the reference cells. The RDPH reflects a local difference of radar returns with targets and sea clutter in the DAS, and the RVE reflects a global difference in the DAS.
As nonstationary time series, sea clutter and target returns exhibit salient differences in the TFD. It transfers 1D complex time series into 2D image. The TFD of sea clutter is modeled as a stochastic process on the 2D TF plane with a mean function and standard deviation function, which are estimated from the TFDs at the reference cells [25]. The normalized TFD (NTFD) at the CUT is the TFD at the CUT normalized by the estimated mean and standard deviation functions. The NTFD reflects differences of the radar returns at the CUT from sea clutter in the TF characteristics. From the NTFD, the three TF features are extracted. The ridge integration (RI) is the sum of the grayscale values of the pixels along the ridge of the NTFD. Significant pixels in each time slice of the NTFD form multiple connected regions in the TF grid. The number of the connected regions (NR) and their maximal size (MS) are the other two important TF features. For radar returns with targets, the RI and MS take large values while the NR takes small values, because the power of target returns concentrates on a small number TF pixels along the instantaneous Doppler curve of the target. The situation is just converse for sea clutter. The first four features can be computed from the received time series at the CUT and reference cells in a low computational cost by the Fast Fourier Transformation (FFT). The computation of the TF features is involved in the TFD of the received time series, and the computational complexity of each TFD is O(NKlogN) where N is length of the time series and K is the frequency samples.

C. DEFICIENCIES AND COMPLEMENTARITY
As in [25], twenty datasets from the IPIX database [3] are used to testify detection ability of each feature and to analyze the main factor to affect its ability. As shown in Fig.6 in [25], the average signal-to-clutter ratios (ASCRs), Doppler offsets and bandwidths of sea clutter alter in wide ranges. Besides ASCR, overlapping extent of sea clutter and target returns in the Doppler domain or TF domain is another factor. Test targets, an anchored spherical block of Styrofoam wrapped with wire mesh in the first ten datasets and small boats at anchor in the last ten datasets, have Doppler offsets that vary pseudo-periodically around zero owing to joint actions of anchor and waves. When sea clutter has a large Doppler offset from zero and a small Doppler bandwidth, target returns and sea clutter little overlap in the Doppler domain and on the TF plane [25]. In this way, the overlapping extent can be measured by the ratio of the Doppler offset to the Doppler bandwidth of sea clutter and it is referred to as the overlapping index.
Each among the seven salient features can be used as a test statistic to yield a single-feature-based detector like the fractal-based detector. At the same observation time and false alarm rate, the detection ability of one feature on a dataset is evaluated by the detection probability for the test target. In experiments, the observation time is 0.512s, false alarm rate is 10 −3 , and the number of reference cells is P = 24. The detection probabilities of individual features at the twenty datasets are illustrated in Fig.1 To examine the dependence of the detection ability on ASCR, the eighty datasets at the four polarizations are arranged in the ascending order of ASCR in Fig.1 (a). From Fig.1 (b-c), the detection abilities of the two amplitude features and the RVE have obvious ascending trend with ASCR, and their abilities highly depend on ASCR. From Fig.1 (c-d), the detection abilities of the RDPH and the three TF features sharply fluctuate with ASCR, and large detection probabilities are still achieved at some datasets of low ASCRs, showing that ASCR is not major factor to affect their detection abilities. In fact, SCR is important for all the seven features. The ability of the RDPH mainly depends on the local SCR on the Doppler interval of target returns to occupy in the Doppler domain. The reason of the sharp fluctuation in Fig.1(c) is that large ASCRs do not always correspond to the large local SCR. Similarly, the abilities of the three TF features mainly depend on the local SCR on the TF region of target returns to occupy on the TF plane. Table 1 lists the average, minimal, and maximal detection probabilities of the seven salient features on the twenty datasets for quantitative analysis. The average one reflects their overall ability and the order of the seven features is NR, MS, RI, RDPH, RVE, NHE, and RAA from strong to weak. The two amplitude features have the worst behavior. It is easy to explain. Sharp amplitude fluctuations of both target returns and sea clutter lower the ability of amplitude characteristics to discriminate them. In other words, non-coherent detection methods are not very effective for sea-surface small targets due to spiky high-resolution sea clutter and target returns of sharp amplitude fluctuation. The fact that the average detection probabilities of all the seven features are under 70% means that no single feature can realize effective detection of sea-surface small targets. For other features except the NHE, the maximal detection probabilities are over 95% and the minimal ones are under 15%. This fact shows that a single feature fails to provide stable detection results. For the seven salient features, the best results and worst results often occur at different datasets. This fact shows their complementary in detection ability. At last, we analyze the correlation of the detection probability and ASCR and overlapping index for each feature. The correlation coefficients of the seven features are listed in the last two columns of the Table 1. All the seven features positively correlate with ASCR and its increase lifts up the detection ability of every feature. This is just the reason for spatial high-resolution radars to be used in sea-surface small target detection for increasing ASCR. From the correlation coefficients, we draw the following conclusions. The ASCR is the major factor to affect the detection abilities of NHE, RAA, and RVE but the overlapping index hardly affects their abilities. The ASCR is still the main factor to affect the detection abilities of RDPH and RI but the overlapping index also takes considerable effect. From the correlation coefficients of the NR and MS, it can be known that the correlation of the detection probability and overlapping index is higher than that of the detection probability and ASCR. So the overlapping index is the major factor to affect the detection abilities of NR and MS, and their abilities are also affected by ASCR. In conclusion, the seven existing features are complementary and their detection abilities depend on several factors. Therefore, in order to realize stable and effective detection of sea-surface small targets, all the seven features require to be jointly exploited.

III. TRAINING SAMPLES GENERATOR AND FEATURE-COMPRESSION-BASED DETECTOR
Detection of sea-surface small targets encounters two unbalances. One is sufficient and complete sea clutter data versus insufficient and incomplete radar returns with targets. The other is required false alarm rate less than 10 −3 versus allowable missed probability of a few tenths. Due to the first unbalance, designing detectors mainly relies on the semisupervised learning based on characteristics of sea clutter under the anomaly detection framework. The problem boils down to design of one-class classifiers without anomaly instance or with insufficient anomaly instances as [15]- [18], [22], [25]. Due to the second unbalance, existing methods to design two-class classifiers in pattern recognition, which often require balanced mistake probabilities of two classes, fail to be directly applied in sea-surface small target detection problem. The convexhull learning algorithm in 3D feature space is an effective way to design feature-based detectors with controllable false alarm rate [22], [25] but the dimension limitation impedes exploitation of more features. Thus, it is necessary to compress the 7D feature vector to one 3D feature vector. The compression requires training samples or probability models of the two classes rather than only that of sea clutter. In this section, a generator of typical radar returns of sea-surface small targets is given. By the generator and measured sea clutter, training samples of the H 1 hypothesis in the 7D feature space are generated. Using the training samples of the two hypotheses, a feature compression method to maximize interclass Bhattacharyya distance [28], [29] is proposed to transfer the 7D feature vector to a 3D feature vector. A modified convexhull learning algorithm in the compressed 3D feature space is presented to design a detector.

A. GENERATION OF RADAR RETURNS OF SEA-SURFACE SMALL TARGETS
Because of complex interactions between sea-surface small targets and waves, radar returns of various types of seasurface small targets are difficult to be characterized by a universal model. Sea-surface small target detection in long observation time focuses on amplitude fluctuation and Doppler modulation of radar returns. Amplitude fluctuation originates from RCS change during observation, and Doppler modulation originates from the change of its radial velocity. VOLUME 8, 2020 It is a long-term and time-consuming task to collect returns data of as many types of sea-surface small targets as possible by field tests. It is an approach to simulate typical returns of sea-surface small targets to assist design of a detector.
The RCS of a sea-surface small target has fluctuation, because its posture is severely affected by waves. The posture change rate relates to the change rate of swells on the sea surface, which is much slowly relative to the time scale of the radar's PRI of several milliseconds. The RCS or amplitude of the target slowly varies with pulses in a wide dynamic range. Based on this point, amplitude series of typical target returns are simulated by a highly correlated positive stochastic sequence. In the observation time of the order of seconds, the radial velocity of the sea-surface small target alters in a complex form for complex interactions between it and waves. The radial velocity is simulated by a simple linear model. In other words, the target is assumed to be of constant radial acceleration. Typical target returns are simulated by where P c is the power of sea clutter,Ā is a positive factor to adjust SCR, a(n) is a highly correlated amplitude sequence, λ is the radar wavelength, ϑ 0 , ϑ 1 are the initial and last radial velocities, t is the PRI of the radar, and ϕ 0 is a random initial phase of the uniform distribution. Below we discuss the parameter selection of the model (2). The amplitude sequence a(n) is a unit-mean positive stochastic sequence. When simulated target returns are added to sea clutter of power P c , the SCR of radar returns is 20 log 10 (Ā). The parameterĀ is generated by the uniform distribution of the interval [10 −1 , 10 1/2 ], corresponding to the SCR range from −20dB to 10dB. In fact, targets can be always detected as the SCR is too high while fail to be detected as the SCR is too low. It is significant to simulate target movement and amplitude fluctuation. Sea-surface small targets generally have small velocity and acceleration. The simple model of constant acceleration is enough for sea-surface small targets when the observation time for a test is within several seconds. Target velocity is assumed to follow the uniform distribution of [−η, η], target acceleration is restricted in [−ζ , ζ ] in the observation time interval of length N t, and the angle between moving direction and the sight line of the radar follows the uniform distribution of [−π, π]. Under these assumptions, the initial and last radial velocities are generated by where the random numbers x, y, and z are mutually independent. Note that the random numbers x and y are generated again if the constraint on acceleration is not satisfied.
In simulation, η is taken as 5 m/s and ζ is taken as 2m/s 2 , corresponding to the fact that the velocity of small targets is within 10 knots and their maximal acceleration is 2m/s 2 [7]. a(n) is modeled as a nonnegative, highly correlated, unitpower stochastic sequence with adjustable dynamic range and decorrelated time, and is generated in four steps. First, an independent and identical distributed (IID) sequence u(n) is generated that follows the uniform distribution of [−1,1] and has zero mean and a variance of 1/3. Second, it is input into a first-order auto-regressive system to generate a highly correlated sequence v(n), v(1) = u(1), v(n + 1) = ρv(n) + u(n + 1), n = 1, 2, · · · · · · ; ρ ∈ (0, 1) (4) It can be proved that v(n) ∈ [−1/(1-ρ), 1/(1-ρ)]. Third, the sequence in (4) is shifted to generate a highly correlated non-negative stochastic sequence, Sea-surface small targets are occasionally fully shadowed by swells as the radar works at low grazing angle and the amplitudes reduce to zero. Shadowing easily occurs at high sea states. For examples, in the dataset #17 of the third sea state in the IPIX database [3], the test target is fully shadowed in about 36 seconds out of the observation time of 131 seconds [22].
The sequence (5) is not of unit power. When n is large enough or after the transition effect from the initial value vanishes, v(n) is zero-mean and its power equals to 1/(3(1−ρ 2 )). The power of the sequence in (5) is Fourth, a(n) is given by An enough large integer M is taken to avoid the transition effect from the initial value in the auto-regressive system. The sequence a(n) is correlated and unit-power but its mean is close to one only when the one-lag correlation coefficient ρ approximates to one. Its coefficients of variation (CV), the ratio of standard deviation to the mean, approximates to zero and little amplitude fluctuation occurs when the ρ is close to one. Besides the fluctuation of the a(n), the ρ also controls the decorrelated time of the a(n). Its k-lag correlation coefficient is For a time series, what are interested in applications are the coherent time and decorrelated time [34]. Change of the time series within the coherent time is ignored to be processed as a random constant. Two samples outside the decorrelated time are regarded to be uncorrelated. The coherent time and decorrelated time are all relevant to the k-lag correlation coefficient. It attenuates with the increase of the lag k. The largest lag of the correlation coefficients over 0.5 is referred to as the coherent length and the smallest lag of correlation coefficient under 0.1 is referred to as the decorrelated length. When the ρ = 0.95, the coherent length is 14 and the decorrelated length is 45. When the ρ = 0.99, the coherent length is 70 and the decorrelated length is 229. When the PRI of radar is one millisecond and the ρ ∈ The simulation of the target returns aims at generation of the training samples of the H 1 hypothesis in the 7D feature space. A mass of sea clutter data can be collected even in several minutes once the radar starts to work. Thus, sufficient training samples of the H 0 hypothesis are easy to obtain. By means of adding the simulated target returns to sea clutter, sufficient training samples of the H 1 hypothesis are generated by the flowchart in Fig.2. Training sample collection is finished in a short time after the radar starts to work and thus hardly affects the normal operation of the radar. In the generation, more random parameters are used so that the generated training samples cover as many characteristics of sea-surface small targets as possible.

B. FEATURE COMPRESSION METHOD BASED ON BHATTACHARYYA DISTANCE
When training samples of the two classes are available, the detection problem corresponds to design of a two-class classifier with quite unbalanced mistake probabilities of the two classes. Most of existing methods to train two-class classifiers fail to train two-class classifiers with this demand. More importantly, target returns generation can generate sufficient training samples of the H 1 hypothesis. However, these training samples remain incomplete. In other words, they fail to cover the characteristics of all sea-surface small targets of interest. Therefore, the detector is still designed in anomaly detection framework to assure the generalizability of the detector to various small targets. In order to utilize the convexhull learning algorithm [22], [25], the 7D feature vector must be compressed into one 3D feature vector with as little performance loss as possible from compression.
In what follows, we deal with feature compression problem. From the training sample sets of the two classes, the means and covariance matrices are computed by where #S denotes the cardinality of the finite set S. Since #S 0 and #S 1 are very large, the means and covariance matrices can be regarded to be precise. It is impossible to theoretically build the probability models of the two classes in the 7D feature space because of complex characteristics of target returns and sea clutter and nonlinear operations in feature extraction.
Herein, the feature compression is based on the first and second-order statistics from the training samples. We consider a linear map. The feature compression is to find a 3×7 projective matrix A to map the 7D feature vector to a 3D vector so that the compressed feature vector keeps a classification ability as good as possible. Let the linear map be y = Ax, R 7 → R 3 . Then, the compressed feature vector y has the mean Aµ 0 and covariance matrix A 0 A T under the H 0 hypothesis and the mean Aµ 1 and covariance matrix A 1 A T under the H 1 hypothesis. A quantitative measure needs to be introduced to evaluate the separability of the two classes in the compressed 3D feature space so as to select a 'good' projective matrix A. The interclass Bhattacharyya distance (B-distance) is the most effective measure in texture discrimination and other applications [38], [39]. It is applied to our problem. The interclass B-distance is a function of the projective matrix A and is given by where 0 , 1 , are all positive-definite matrices and |A| denotes the determinant of the matrix A. The B-distance exists only when the matrix A is full-rank in row, i.e., rank(A) = 3. A full-rank A assures that the compressed features are at least linearly uncorrelated if so are the seven salient features. However, the rank(A) = 3 specifies an open manifest in the matrix space R 3 × R 7 . In some cases, the values of the determinants in (10) are very close to zero and VOLUME 8, 2020 as a result the matrix A A T has a large condition number and the interclass B-distance is unstable in numerical computation. The full-rank in row is strengthened to the roworthonormality, i.e., AA T = I 3 . It assures that the computation of the interclass B-distance is stable if the feature extraction makes the matrices 0 and 1 have good condition numbers. In this way, the feature compression boils down to an optimization An optimal compression matrix is searched to maximize the interclass B-distance. The interclass B-distance degenerates to Mahalanobis distance (M-distance) without the second term in (10) [38], [39] if the covariance matrices in (10) are equal. The M-distance is used in Fisher linear discriminant analysis (LDA) to find a linear fusion of multiple features to characterize or separate the two classes. Equivalently, the feature vector is compressed into a scalar test statistic by a linear combination. In some sense, the feature compression in (10) is a generalization of the LDA. Owing to the fact that a linear combination is replaced by three linear combinations, the feature compression suffers from less loss from dimension reduction in comparison with feature fusion. For our problem, the interclass B-distance fails to be replaced by the M-distance in two reasons. On the one hand, from sea clutter data and radar returns with the test targets on the IPIX database [3], the covariance matrices of the two classes are quite different for each dataset. The condition of the interclass B-distance to degenerate to the M-distance does not hold. On the other hand, we try the feature compression using the M-distance instead of the B-distance and found that it brings significant loss in performance.
Due to highly nonlinear objective function and quadratic constraints, a good initial projective matrix and gradient ascend algorithm are cooperated to solve the optimization (11). The initial projective matrix is obtained by global search in all the 3D combinations of the seven features and all the 3D combinations of the seven eigenvectors of the positive-definite matrix . Let e k , v k , k = 1, 2, . . . , 7 be the coordinate vectors of the 7D feature space and the seven row eigenvectors of the matrix corresponding to the eigenvalues λ 1 , λ 2 ,. . . , λ 7 , respectively. The initial projective matrix is selected from the finite set of projective matrices The matrix set consists of seventy row-orthonormal projective matrices. The initial projective matrix is given by Starting from A 0 , the penalty function method [40] is used to find a solution of (11). The row-orthonormal constraint is not a rigid demand. Once the matrix A is full-rank in row and the condition number of AA T is not too large, it is a feasible solution of the problem. Thus, a penalty function is introduced as follows, where ω > 0 is the penalty factor. Using A 0 as the initial point, the gradient ascend algorithm [40] is used to maximize (14) to attain a solution A * . Generally speaking, a large penalty factor ω corresponds to a small value of A * A T * − I 3 2 F . Therefore, the penalty factor can be tuned to make A * A T * − I 3 2 F ≈ ε(=0.1), so as to assure approximate row-orthonormality of the projective matrix. By the matrix A * , the two sets of the training samples are transferred to the 3D feature space, Based on them, a detector is designed in the compressed 3D feature space.

C. MODIFIED CONVEXHULL LEARNING ALGORITHM
In [22], [25], the two tri-feature-based detectors are designed in the anomaly detection framework by only using training samples of sea clutter. From the set S 0 of the training samples of the H 0 hypothesis, designing a one-class classifier with a given false alarm rate P F boils down to the optimization min ⊂R 3 is bounded convex set where the volume( ) denotes the volume of a 3D convex set and #S denotes the cardinality of a finite point set S. The solution of (16), the convexhull of minimal volume that contains (1-P F )#S 0 out of #S 0 training samples of sea clutter, is the decision region of the H 0 hypothesis in the tri-featurebased detector with a desired false alarm rate P F . In (16), the convexity on the decision region is a regularity constraint to assure the generalizability of the one-class classifier and computability to search for a decision region. Solving (16) is a combination explosion problem, and a greedy convexhull learning algorithm is given to attain a solution [22], [25]. Differently, the sufficient but incomplete training samples of the H 1 hypothesis are available. Thus, the volume of the decision region in (16) is replaced by the probability of the training setS 1 to fall outside the region . The convexity as regularity constraint is kept to assure the generalizability of the detector. In the compressed 3D feature space, designing a detector boils down to the optimization, It minimizes the missed probability on the training sample set of the H 1 hypothesis subject to the false alarm rate on the training sample set of the H 0 hypothesis, which accords with the Neyman-Pearson criterion [7] except the convex constraint on the decision region for generalizability and computability. For the combination explosion problem (17), a greedy convexhull learning algorithm, which is a modified version of the algorithm in [22], [25], is given to attain its sub-optimal solution. The steps of the algorithm are listed in Table 2. Once the decision region of the H 0 hypothesis is obtained, a decision is made by the extraction of the 7D feature vector, feature compression, and binary decision. When the compressed feature vector falls into the decision region, the H 0 hypothesis holds at the CUT. Otherwise, the H 1 hypothesis holds and a target is declared to be in the CUT. The decision is made by computing multiple 3×3 determinants [22], [25] and thus can be fast implemented. Comparing with other learning algorithms, the convexhull learning algorithm can accurately control false alarm rate. It is worthy of noting that the learning process is designed under the anomaly detection framework by mainly using the training samples of sea clutter. This lowers the effect of the incompletion of the training samples of the H 1 hypothesis and assures that the proposed featurecompression-based detector possesses strong generalizability to other types of sea-surface small targets.
As a detector based on learning the characteristics of sea clutter from data, it needs to adapt the change of characteristics of sea clutter. Below, we discuss its use in practical radar systems. For an island-based or shipborne high-resolution radar with fixed operating parameters, the characteristics of sea clutter are mainly affected by sea states and viewing geometry of the radar. Generally, sea state varies at the time scale of tens of minutes and at the spatial scale of hundreds of square kilometers. The viewing geometry of the radar includes the grazing angle and angle between wave direction and the sight line of the radar. The grazing angle varies with radial distance, and the latter varies with azimuth. Detecting sea-surface small targets, the surveillance scene of the radar is partitioned into multiple sections along the radial distance and azimuth. On each section, one feature-compressionbased detector is trained for detection. Moreover, the decision region is aperiodically learned again as sea state in the section obviously alters. In this way, the detector based on online training can adapt spatial-temporally varying oceanic environment. Each training process spends short time and little influences the normal work of the radar. In addition, when the radar is at work, the feature vectors at the resolution cells that are declared to have a target can be collected to enrich the set of training samples of the H 1 hypothesis to improve the detection performance.
At last, a flowchart diagram of the proposed featurecompression-based detector is illustrated in Fig.3. It consists of the training branch and working branch. The pink boxes denote the common blocks, the green boxes denote the blocks in the training stage, and the baby blue boxes denote the blocks in the detection stage. The training branch includes more blocks, which may be carried out offline. The detection branch includes only three blocks: extraction of the 7D feature vector, feature compression via the linear transform, and the decision whether it falls into the convexhull or not. And thus the detection can be performed fast. It is worthy to be noted that as an open framework the feature-compressionbased method allows more salient features to be utilized to improve detection performance.

IV. EXPERIMENTAL RESULTS AND PERFORMANCE COMPARISON
In this section, experimental results of the proposed detector on the IPIX database [3] and two datasets of the CSIR database [4] are reported. The performance comparisons with existing detectors are given. Moreover, some limitations are discussed.

A. EXPERIMENTAL RESULTS ON TWENTY DATASETS OF IPIX DATABASE
Twenty datasets at the dwelling mode from the IPIX database are available for performance evaluation. The first ten datasets were collected at the Dartmouth, Nova, Scotia,  Canada, in 1993. The radar was mounted on a cliff 100 feet above sea level facing the Atlantic ocean at a low grazing angle (about 0.33 • ). Each dataset contains radar returns synchronously collected at the HH, HV, VH, and VV polarizations. At each polarization, the data consist of complex returns time series of length 2 17 at 14 adjacent range cells (about 131 seconds). The test target is an anchored spherical block of Styrofoam wrapped with wire mesh and diameter of about 1m. The range cell of the target to occupy is referred to as the primary cell. Around it, two or three cells whose returns are affected by the target are called the secondary cells. The last ten datasets were collected by the IPIX radar with an improved quantizer at Grimsby, Ontario, Canada, in 1998. The test targets are small boats at anchor and each dataset consists of complex returns of length 6×10 4 (one minute) at 28 contiguous range cells at the four polarizations. The range resolution of the first eighteen datasets is 30m, that of the 19 th dataset is 15m and that of the 20 th dataset is 9m. These datasets were collected at different sea states and thus can give high-confidence assessments. In experiments, for each dataset at each polarization, the sets of training samples of the 7D feature vector under the H 0 and H 1 hypotheses are computed from sea clutter time series and the simulated target returns. And then, the two sets are used to train the detector. The set of the test samples of the 7D feature vectors are computed from the time series at the primary cells. This set is used to compute detection probabilities. In the experiments, take N = 512 and 1024, i.e., the observation time for one decision is 0.512s and 1.024s respectively, and the false alarm rate is 0.001.
On the twenty datasets, the proposed feature-compressionbased detector is compared with the early fractal-based detector [12] and our previous three detectors [22], [25], [32]. Fig.4 and Fig.5 illustrate the detection probabilities of the five detectors on the twenty IPIX datasets, where the four polarizations are separately plotted in subfigures. The featurecompression-based detector attains the largest detection probabilities at 75 out of the eighty datasets, and its detection probabilities are quite close to the largest ones at the five exceptions. The early fractal-based detector has the smallest detection probabilities due to limited ability of a single feature. 8380 VOLUME 8, 2020   Table 3 lists the average detection probabilities of the five detectors at each polarization when N = 512 and 1024. Doubling observation time brings some benefits for all the five detectors. It is found that the proposed detector attains the least benefit due to the ceiling effect. Or rather, when the observation time is 0.512s, it attains the detection probabilities quite close to one (so-called ceiling). So doubling observation time has little space to further improve detection performance. In terms of the overall performance, the proposed detector is the best one, followed by the TF-tri-feature-based detector [25], adaptive composite GLRT detector [32], tri-feature-based detector [22], and fractalbased detector [12]. In comparison with the second best one, the feature-compression-based detector improves the overall detection probability of 6.1% at the observation time of 0.512s and of 3.7% at the observation time of 1.024s. In fact, due to usage of more salient features and their complementarity, the proposed detector provides better and more stable detection performance.
For a full comparison, the feature-compression-based detector is compared with two recent detectors using the IPIX database: the decision tree based detector [30] and detector based on graph description of amplitude series [31]. The former uses three features, Hurst exponents in the time domain and frequency domain, and RDPH, and the detector is designed by learning two-class decision tree in the 3D feature space from radar data. The latter is based on subtle graph modelling of amplitude series of sea clutter, and is a substantially single-feature-based detector. Table 4 lists detection probabilities of the three detectors on the ten datasets in 1993 of the IPIX database, where the false alarm rate is 0.001. In terms of the analysis method in [22], in the third dataset (#30), the test target is fully shadowed by swells during about 60 seconds out of 131 seconds. This dataset is ignored in computation of the average detection probabilities. The proposed detector has detection probability loss of 0.02 in comparison with the decision tree based detector for the ten datasets of the IPIX database at the HH VOLUME 8, 2020  polarization when the observation time is 0.512s and 1.024s. As the observation time becomes longer, the proposed detector attains some improvement. Besides, the proposed detector provides the average detection probability improvement of 0.22 in comparison with the graph-based detector for six datasets of the IPIX database. Due to a single feature and absence of phase information of radar returns in detection, the detector using graph description has poor performance though it provides obvious improvement relative to singlefeature-based detectors.
In what follows, we discuss the effectiveness of the interclass B-distance in the feature compression. The two trifeature-based detectors [22], [25] involve in two sample sets in the 3D feature spaces: the training sample set of the H 0 hypothesis from sea clutter, the test sample set from radar returns with test targets. Besides the two sets, the feature-compression-based detector still involves in the training sample set of the H 1 hypothesis from the simulated target returns plus sea clutter. For the three detectors, the average B-distances between the first two sets in the 3D feature spaces at each polarization are listed in Table 5. Contrasting the average detection probabilities in Table 3 and the average interclass B-distance in Table 5, it is found that larger average B-distances always correspond to larger average detection probabilities. This fact shows that the interclass B-distance is a good measure for feature selection and compression. The average B-distances under the initial projective matrices A 0 and the optimized projective matrices A * are also listed in Table 5. It is seen that optimization increases the average interclass B-distances of the training sample set of the H 0 hypothesis and the test sample set of the H 1 hypothesis, though the optimization is based on the statistics of the training sample sets of the H 0 and H 1 hypotheses. This fact shows that the generation method of target returns is effective at least for the test targets in the twenty datasets of the IPIX database.
In order to further testify the effectiveness of the generation method of target returns, we use the training sample set of the H 0 hypothesis and the test sample set of the H 1 hypothesis for the feature compression and convexhull learning. For each dataset, the obtained feature-compressionbased detector should give an upper bound of the detection performance because the test samples for assessment are also used for learning. The extent of the detection probability of the feature-compression-based detector to be close to the upper bound can assess the effectiveness of the generation method of target returns. Fig.6 is the detection probabilities of the feature-compression-based detector and the upper bounds when the observation time is 0.512s and the false alarm rate is 0.001. The detection probabilities are almost equal to the upper bounds at 71 out of the 80 datasets. This fact shows that the generation method of target returns is effective.

B. EXPERIMENTAL RESULTS ON TWENTY DATASETS OF CSIR DATABASE
In order to testify the adaptivity of the proposed detector for other types of sea-surface small targets, two datasets (TFA17-005 and TFC15-007) from the CSIR database [4] are used, where the test targets are a moving wooden fishing boat of length of about 5.5m and a moving wave-rider rigid inflatable boat of length 5.7m, respectively. The details of the two datasets are listed in Table 6. Each dataset consists of   radar returns of 25 seconds at 96 contiguous range cells at tracking mode. In experiment, N = 512 corresponds to the observation time of 0.124 and the false alarm rate is 0.001. The detection probabilities of the five detectors at the two datasets are listed in Table 7. The proposed detector attains the largest detection probabilities for the two test targets. Differently, the composite GLRT detector [32], that focuses on nonstationary movement of the targets, obtains very good performance. Fig.7 illustrates the power map of the second dataset and detection results, where each subplot has eighteen false alarm points. The fractal-based detector hardly attains the track of the fishing boat, because phase information of the radar returns is not used. The proposed detector obtains a more continuous track of the boat. The proposed detector possesses a strong generalizability to other types of seasurface small targets, owing the fact that it is designed in the FIGURE 7. Detection performance comparison at the second dataset of the CSIR database [4]. (a) Power map of the data, (b) fractal-based detector [12], (c) tri-feature-based detector [22], (d) TF-tri-feature-based detector [25], (e) composite GLRT detector [32], and (f) featurecompression-based detector.
anomaly detection framework and the radar returns of special test targets do not participate in design of the detector.

V. CONCLUSION
In this paper, a feature-compression-based detector was developed for effective and robust detection of sea-surface small targets. By means of the generation of target returns, the training samples of radar returns with targets are generated and cooperate with the training samples of sea clutter for feature compression. The feature compression based on interclass B-distance was proposed to compress the 7D feature vector into one 3D feature vector. In the compressed feature space, the modified convexhull learning algorithm is given to determine the decision region of sea clutter at a given false alarm rate. Experimental results on the IPIX radar database and two datasets of CSIR database show that the feature-compression-based detector attains better and more robust detection performance. Besides better detection performance, the other merit is to allow more salient features to be exploited.
It is a long-term work to improve the detection ability of maritime surveillance radars to sea-surface small targets. As an open framework, the future research devotes to the exploitation of more features to further improve detection performance and construction of replacements of interclass B-distance. It is known that the acquisition of training samples of the H 1 hypothesis is a key to relax the limitation of one-class classifiers. Though the simulation of target returns can realize the sufficiency of the training samples of the H 1 hypothesis, it fails to overcome substantially the incompleteness of training samples of the H 1 hypothesis in the feature space to all types of sea-surface small targets at all sea states. It is a potential way to achieve the completeness of the training samples by long-term collecting data of the H 1 hypothesis during the radar works.
ZIXUN GUO was born in Xi'an, Shaanxi, China, in 1994. She received the B.S. degree in electrical engineering from Xidian University, Xi'an, in 2016, where she is currently pursuing the Ph.D. degree in signal and information processing with the National Laboratory of Radar Signal Processing.
Her research interests include radar signal processing, weak target detection, and their applications.
SAINAN SHI was born in Nantong, Jiangsu, China, in 1990. She received the B.S. degree in electrical engineering from Xidian University, Xi'an, China, in 2013, and the Ph.D. degree from the National Laboratory of Radar Signal Processing, Xidian University, in 2018.
She is currently a Lecturer with the College of Electronic and Information Engineering, Nanjing University of Information Science and Technology. Her research interests include radar signal processing and weak target detection in sea clutter. VOLUME 8, 2020