Radio Frequency Interference Detection in Passive Microwave Remote Sensing Using One-Class Support Vector Machines

Radio frequency interference (RFI) is a serious threat to the accurate estimation of critical geophysical parameters via passive microwave remote sensing and the presence of RFI in microwave radiometer measurements is increasing over time. On the other hand, the nature and the occurrence of RFI captured by radiometers are usually unknown making their detection and mitigation difficult. To overcome this challenge, this article presents a novel RFI detection algorithm that relies only on the information extracted from the RFI-free radiometer measurements which can be collected over oceans and rural areas with limited human activity, i.e., a one-class algorithm, to be implemented in future remote sensing radiometers. The algorithm transforms raw time-series radiometer measurements into a heterogeneous feature-based representation. Then, a feature selection algorithm identifies the most discriminant features to detect interference based on the probabilities of misdetections and false alarms. Finally, the optimal decision boundaries that discriminate the RFI-contaminated radiometer measurements from the RFI-free ones are computed via support vector machines (SVM) using only the RFI-free radiometer measurements. Regardless of the characteristics of RFI contamination, the algorithm, therefore, outputs a generalized decision boundary for RFI-free measurements. A performance evaluation of the proposed algorithm against the traditional RFI detection algorithms has been performed using simulated radiometer data, and the results have shown that the novel algorithm, unlike the traditional methods, can successfully detect RFI, even when the interference-to-noise ratio (INR) of the radiometer measurements is as low as $-18$ dB.


I. INTRODUCTION
T HE passive microwave remote sensing measurements of the earth's surface and atmosphere have growing relevance in modern society as they exceedingly impact everyday life. Data measured by space-borne microwave radiometers are primary indicators to estimate critical variables of earth systems [1]. These measurements are usually performed across various so-called "protected" frequencies adjacent to the bands used by active users such as radars and wireless communication systems [2], [3], and radio frequency interference (RFI) has been reported to exist in them due to leakage from neighboring frequency bands and potential illegal emissions present at these protected frequencies [4], [5], [6], [7]. Furthermore, the presence of RFI is increasing over time due to the exponential growth in communication and other active systems. If not detected and mitigated properly, RFI may cause biases in radiometric measurements, which may translate into erroneous scientific measurements. Therefore, effective RFI detection and mitigation techniques are needed to be implemented in space-borne radiometer systems, especially against low-level interference, which is challenging to identify. Many single-domain (time, frequency, statistical, etc.) algorithms and methods have been proposed and applied in microwave radiometry to cope with the RFI problem with little success against low-level as well as wideband, long-duration, noiselike interference [8]. Recently, more comprehensive techniques have also been developed and implemented by combining the outputs of several such single-domain techniques for maximum likelihood of detection [9]. For instance, NASA's Soil Moisture Active Passive (SMAP) radiometer implements a multidomain RFI detection procedure by combining the detection outputs of several single-domain algorithms with a logical OR operator [10]. However, SMAP brightness temperature products have also been reported to be susceptible to RFI, especially when the contamination is wideband and continuous [11]. On the other hand, it has been suggested that better detection performances are achievable when radiometer measurements are analyzed simultaneously in multiple domains [12], [13], [14]. This is understandable considering the fact that the RFI environment includes interference signals with various properties such as bandwidth, duration, and amplitude; thus, the assumption that they are differentiable from natural emissions in a single domain is not always true [15], [16], [17], [18].
Machine learning and deep learning algorithms have also been tested for RFI detection and mitigation in microwave radiometry in recent studies. For instance, a convolutional neural network (CNN) architecture trained with the spectrogram images generated by the SMAP measurements has revealed that the detection performance of the deep learning algorithm is primarily dependent on the quality of the training images [19]. Other studies utilizing simulated high-resolution times series of radiometer measurements, on the other hand, have demonstrated that multidomain machine learning algorithms can provide better interference detection performances compared to the state-ofthe-art implementations, especially in cases of low-level interference contamination [20], [21], [22], [23].
This article, expanding on the work presented in the 2022 International Geoscience and Remote Sensing Symposium (IGARSS) [24], presents a feature-based, multidomain machinelearning algorithm developed using support vector machines (SVMs) to detect RFI in microwave radiometer measurements. In contrast to the previous single and multidomain methods, this algorithm is trained using only RFI-free measurements that can be easily obtained over regions with low human activity and analyzes an extensive list of radiometer measurement features in a multidimensional feature space to identify interference contamination. Hence, the time and effort required for building a reliable training dataset for this one-class classification algorithm are drastically reduced [25]. Furthermore, as the algorithm is not trained based on the properties of the RFI signals, it is robust against possible changes in the characteristics of the RFI environment over space and time. A statistical method to select the significant signal features to be used in the algorithm has also been proposed to be included in the detection process to maximize efficiency. The effectiveness of the novel algorithm has been tested against single and multiple pulsed sinusoidal interference signals with various frequencies, interference-to-noise ratio (INR) levels, and duty cycles (DC) representing a variety of RFI conditions. Furthermore, a performance comparison against traditional RFI detection methods has been provided. The rest of this article is organized as follows. First, in Section II, the simulated radiometer dataset used to develop and test the novel RFI detection algorithms is described. In Section III, the RFI detection problem is mathematically formulated. Sections IV and V explain the novel detection algorithm with the feature selection procedure. Then, in Sections VI and VII, simulationbased experiments where the algorithm was implemented and the resulting RFI detection performance are discussed. Finally, Section VIII concludes this article.

II. SIMULATED RADIOMETER DATA
The RFI-free thermal noise within a specific narrow frequency band measured by radiometers follows a Gaussian distribution with a uniform power spectrum [26]. Thus, in this work, RFI-free radiometric measurements, i.e., radiometer voltage counts, are modeled as white Gaussian noise with a mean (μ) and standard deviation (σ), which can be expressed as x N (t) = N (μ, σ 2 ). The interference is additive to the naturally occurring thermal noise, therefore, the RFI-contaminated measurements have been simulated by adding the interference signal to white Gaussian noise. In particular, pulsed sinusoidal interference signals have been used because of their ability to create short pulses (low DC) as well as continuous contamination (high DC, continuous wave) [27]. Multiple such signals with different amplitudes, duty cycles, and frequencies combined can resemble realistic RFI environments with possible distinct features in which microwave radiometers operate. The following mathematically describes the RFI-contaminated radiometer data used in this study: The first term in (1) denotes the RFI-free radiometer measurements as mentioned previously and the second term denotes the pulsed sinusoidal interference. Note that the equation can accommodate K ≥ 1 number of interfering signals, and the interference parameters A i , f i , and φ i denote the amplitude, frequency, and phase shift of each sinusoidal interference signal, respectively. The rect() function provides a rectangular pulse envelope indicating the duty cycle, and t 0 and ω i denote the time delay and width of this envelope. The ω i is defined in terms of the DC of the interference signal and the radiometer integration period (T ) as DC = ω i T . To mimic a realistic radiometric scenario, the values of RFI-free thermal noise and RFI signal parameters used in this study were extracted, empirically, from the Soil Moisture Active Passive Validation Experiment 2012 (SMAPVEX12) airborne data measured by the Passive Active L-Band System (PALS). Furthermore, the radiometer sampling rate and the integration period were assumed to be 75 MSPS and T = 350 μs, equal to those of the PALS radiometer, which also allows enough samples in each integration period for computing data features [28].
The parameters that determine the RFI-free radiometer measurements are the mean (μ) and the standard deviation (σ) of the normal distribution. The empirical parameter estimation is primarily dependent on the number of samples that are used to compute the value of the parameter. Therefore, the SMAPVEX12 radiometer voltage readings were divided into temporal windows, where each window consists of N number of samples. The value of N varies from 100 to 200 000. Fig. 1 shows the variation of window standard deviation as a function of the number of samples. The figure shows that the standard deviation of the voltages converges to 8.035 V as the number of samples in each window increases. Therefore, the value of the σ has been assumed to be 8 V. The mean of the voltage readings, on the other hand, was 0; thus, μ has been accepted as 0 V. On the other hand, the variables of the RFI signals are the amplitude (A i ), frequency (f i ), and the duty cycle (d i = ω i T ). For the sake of generality, the distribution of these parameters was considered uniform in their respective ranges, again extracted from the SMAPVEX12 data. For instance, during the SMAPVEX12 campaign, amplitudes of the RFI signals, mostly shorter than 100 ms, were observed to be up to 100 K in 250 K RFI-free thermal noise when the radiometer measurements were averaged over 2 s [28]. For 350 μs integration periods, this amplitude range would translate into INR values up to 10 dB; thus, INR values were varied randomly between −45 and 10 dB. Similarly, RFI frequencies were varied uniformly within the bandwidth of the intermediate frequency (IF) signals measured by the PALS radiometer during the SMAPVEX12 campaign, which ranges from 15 to 35 MHz. The phase of the interference signals was also taken as a uniformly distributed random variable between 0 and 2π radians. Finally, the width of the pulse envelopes was varied in such a way that the DC of the interference pulses was uniformly distributed between 1% and 100%, and the time delay of the pulses at each integration window was random. Table I summarizes the simulated radiometer data parameters used in this study. Note that the RFI detection algorithms discussed in this article have been evaluated for each INR and DC case separately; thus, the result of this study is independent of the abundance of RFI sources with particular INR and DC values in an RFI environment.
The simulated RFI-free and the RFI-contaminated measurements were sequential in time, which implies that the complexity of the data was high. In order to use the simulated measurements as the input for machine learning algorithms, at first, they needed to be transformed into a well-defined feature space in a way that features adequately describe the measurements. In this work, each radiometer integration window has been described using 31 commonly used features in time, statistical, and spectral domains, which are summarized in Table II. III. RFI DETECTION PROBLEM Consider a set R of N radiometer integration windows where each window r i , i = {1, 2, . . . , N} is described by d number of features, namely F 1 , F 2 , . . . , F d . The corresponding feature values for window r i are denoted as f in , n = {1, 2, . . . , d}. Further, each integration period r i contains M number of samples. A one-class classification problem can be formulated where r i may belong to RFI-free (class N ) class or not. In one-class classification, only the RFI-free measurements are used for training. The detection technique creates a (representational) model of this training data. If a newly encountered radiometer integration window is too different from this model, it is labeled as RFI-contaminated (class C). In this study, one-class support vector machines (OCSVM) have been used as the data modeling approach, where the decision boundary is computed as a separating hyperplane [36]. The final class label for a radiometer integration window is estimated using this decision boundary.

IV. ONE-CLASS SUPPORT VECTOR MACHINES
For a given set R N of RFI-free radiometer integration windows, the OCSVM finds the hyperplane that separates them from the origin of the training data in a higher dimensional feature space. It should be noted that the training data consist only of the RFI-free radiometer integration windows. During the training process, the OCSVM learns a hyperplane that maximizes the margin between the origin and the data from the RFI-free radiometer integration windows. Its decision function projects the test data onto the normal vector w to produce the SVM scores based on the distance from the hyperplane. Primal problem for one class SVM is defined as follows [37]: The column vector ξ = [ξ 1 , ξ 2 , . . . , ξ N T ] consists of ξ i , the slack variable corresponding to the ith training radiometer window. Φ(.) is the mapping function that maps the r i into higher dimensional space. b is the bias term, ω is the normal vector to the hyperplane, and ν denotes the tradeoff parameter maximizing the distance of the hyperplane from the origin and the number of data points that are allowed to cross the hyperplane (the false positives). N T denotes the number of training windows. Schölkopf et al. [37] proposed to solve the problem formulation in (2) via its dual form as follows: where k(r i , r j ) denotes r i and r j in the high dimensional feature space and α i , i ∈ {1, 2, . . . , N T } denotes the dual variable. This optimization problem can be solved for α i and b with one global minimum point. For a new test integration window r test , the class that this window belongs to is determined by evaluating which side of the hyperplane it falls in the feature space. The final decision function of the dual problem is given bŷ Ifŷ test is positive, the test radiometer integration period r test falls in the region of highly dense training measurements.  Therefore, it is classified as RFI-free as it demonstrates similar characteristics with most of the training data. Ifŷ test is negative, then r test is different from the training measurements; therefore, its class label is declared as RFI-contaminated.

V. FEATURE SELECTION
The goal of the RFI detection algorithm is to decide if a given set of radiometer integration windows is RFI-contaminated or not from the heterogeneous feature representation of the data. Even though the feature-based dataset tends to explain the overall dynamics of the data in multiple domains, various features need to be analyzed and ranked to determine which ones are more useful. In other words, the most consistent and relevant features should be prioritized to improve the efficiency and performance of the machine learning models. Two error terminologies, i.e., type I error and type II error have been defined and used for this purpose. Type I error occurs when an RFI-free window is identified as RFI-contaminated. Type I error is also known as a false alarm. On the other hand, type II error occurs when an RFI-contaminated window is identified as RFI-free, which is a misdetection. To calculate these errors, the likelihood distribution of each feature for the given class is analyzed. The best discriminant features are the ones that show the maximum margin between the conditional probability distributions associated with RFI-free and RFI-contaminated classes. Fig. 2 graphically illustrates the distribution of (a) nondiscriminant and (b) discriminant features when their values (f n ) are conditioned on the class label, i.e., RFI-free or RFIcontaminated. Each feature has been analyzed using this approach, and the sums of type I and type II errors were computed. Fig. 2(a) visualizes such sums, which is called error static, in terms of the likelihood plots as the highlighted regions under the curves. Features have been ranked based on their error static such that a lower error value indicates that the feature is more discriminant than the feature with the higher error value.

A. Experimental Setup
The experimental setup for the novel RFI detection approach is demonstrated in Fig. 3. RFI-free and RFI-contaminated radiometer integration windows for three thousand one hundred pairs of INR and DC values have been generated as described in Section II. The features in time, frequency, statistical, and spectral domains shown in Table II have been extracted as a 31-D feature vector to describe the generated radiometer dataset. The relevant subset of features was identified and selected as described in Section V, and the dataset has been divided into two parts for training and testing. In order to train the RFI detection model, i.e., to compute the hyperplane between the RFI-free and RFI-contaminated windows for the SVM algorithm, the RFI-free training data have been used. The value of parameter ν and the selection of the kernel function are the hyperparameters for the OCSVM algorithm described in Section IV need to be decided. In this study, the Gaussian kernel was used with the width parameter σ to compute the distance between the integration time windows, r i and r j . The kernel width σ determines the distance between the integration time windows in the high-dimensional feature space. The small values of the σ lead to increased complexity and overfitting since all the training data will be considered as support vectors. The large values of σ will provide better separation in high-dimensional feature space [38], [39]. The value of the parameter ν ∈ (0, 1] has an upper bound on the number of false alarms and a lower bound on the number of support vectors, i.e., model complexity. To determine the best possible pair of σ and ν, in the experiments, a grid search was run by varying σ from 2 −6 to 2 2 in steps of 0.25. Fig. 4 shows the fraction of the training data considered as the support vectors as a function of ν and σ. In this work, an algorithm with less model complexity (i.e., less number of support vectors) was preferred to perform accurate detection of the RFI-contamination. Therefore, the value of σ was selected as 3 considering that for a given value of ν, the changes in the fraction of the support vectors are considerably small. To determine the value of ν, the accuracy was analyzed by setting the kernel parameter, σ at 3.  shows such accuracy obtained for multisinusoidal RFI for INR ranging from 0 to 10 dB. It should be noted that the change in accuracy with respect to ν shows similar characteristics for other INR and RFI scenarios. From the figure, it can be seen that the change in accuracy is not very high after ν > 0.5. Therefore, ν was set to be 0.5. The values of α i 's in (3) were computed using MATLAB's sequential minimal optimization (SMO) solver [40]. Then, the trained RFI detection model has been evaluated using the test dataset containing both RFI-free and RFI-contaminated radiometer radiometer integration windows. Separating training and testing datasets, the performance of the trained detection model can be evaluated on unseen data. This has been done using the metrics introduced in the following section. The reported values of performance metrics have been fivefold cross validated, which means the data matrix has been divided into fivefolds of approximately equal size, and each fold has been treated as a validation set for the model trained on the remaining fourfolds. The performance metric values have been averaged over these validation sets for evaluation to prevent the model from over-fitting to the training data.

B. Evaluation Metrics
The performance of the RFI detection algorithm have been mathematically quantified using four evaluation metrics, namely, the accuracy, precision, recall, and the area under the curve (AUC). The accuracy, precision, and recall metrics are mathematically defined as follows: T P , T N, F P , and F N indicate true positive, true negative, false positive, and false negative rates, respectively. Thus, the accuracy denotes the total number of correct classifications Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. (RFI-contaminated classified as RFI-contaminated, and RFIfree classified as RFI-free) out of the total number of cases. And the precision is the fraction of the true RFI-contaminated cases out of the total number of integration windows predicted as RFIcontaminated which may include RFI-free data as well. Finally, the recall indicates the fraction of the truly RFI-contaminated data correctly classified as RFI-contaminated by the detection algorithm. The values of these performance metrics range from zero to one, where values closer to one indicates better performance in differentiating RFI-free and RFI-contaminated data. The AUC metric, on the other hand, is the integration of the receiver operating characteristics (RoC) curve which gives the performance of the detection model versus false alarm rates. Similar to the other metrics, the higher the AUC value is, or the closer it is to one, the better the performance of the detection algorithm is.

A. Performance of the OCSVM Algorithm: Detection of a Single RFI Source
The OCSVM RFI detection algorithm has been first tested against a single RFI source with various DC and INR levels contaminating the radiometer measurements. In order to select the most discriminant subset of relevant features listed in Table II for RFI detection, the sums of type I and type II errors, i.e., the combined error static values have been computed for all features for the simulated radiometer dataset and ranked. The feature with the lowest error static value is the best for RFI detection as it indicates that the overlap between the likelihood probabilities for RFI-free and RFI-contamination cases is less likely. For example, Fig. 6 demonstrates the ranking of the "mean of the auto-correlation coefficient" feature as a function of INR and DC of the RFI-contaminated radiometer measurements. Rank 1 corresponding to a specific INR-DC pair implies that the feature has output the lowest error static for RFI-contaminated measurements with those INR and DC values, whereas rank 31 means the highest error static. As seen in the figure, the auto-correlation coefficient, with low error static values, has performed well against most of the RFI cases except when the INR and DC values are both very low. Similar ranking analyses have been performed for all other features as well and the following features have been selected to be included as the better half of the features in the novel OCSVM RFI detection algorithm as their average rankings over all possible INR and DC cases were below 15: variance, power, average over absolute value of the first differences, mean of the auto-correlation coefficient, power spectral maximum, spectral entropy, spectral skewness, spectral kurtosis, spectral crest, spectral flatness, and spectral flux.
The OCSVM model has been trained using the RFI-free radiometer integration periods and the optimal decision boundary around the origin of the RFI-free measurements has been identified in the feature space. The resulting RFI detection performance metrics, demonstrated as functions of the INR and DC of the RFI-contaminated radiometer measurements in Fig. 7, have been fivefold cross-validated. The dataset has been randomly divided into five parts. Then the proposed one-class SVM algorithm has been trained on the RFI-free integration widows from the four parts of the dataset (i.e., approximately 80% of the data), and the algorithm has been tested on the remaining part of the data (i.e., around 20% of the data). The performance metric values have been computed on the test set. This process has been repeated five times, and each time onefold is treated as the testing set for the model trained on the remaining fourfolds. Finally, the average performance has been reported over the number of folds. It can be seen from the figure that nearly perfect accuracy, precision, recall, and AUC have been achieved for INR levels as low as −10 dB. This performance can be also extended to even lower INR cases if the DC is high enough.

B. Performance of the OCSVM Algorithm: Detection of Multiple RFI Sources
The novel RFI detection algorithm has also been tested against multiple RFI signals contaminating the simulated radiometer measurements. Specifically, five RFI sources have been generated for each RFI-contaminated radiometer integration period (The number was kept low for computational simplicity) by varying the amplitude and DC of the RFI signals as described in Section II, resulting in INR values from −45 to 10 dB and DC levels from low (DC ranges from 0% to 25%) to high (DC ranges from 75% to 100%). In total, five hundred RFI-free and RFI-contaminated radiometer integration periods have been generated.
In order to identify the most discriminating features for RFI detection in this multiple RFI sources dataset, a feature selection analysis has been performed. The error statics for each feature have been computed as described in Section V and shown in Fig. 8 for RFI contamination with low and high DC levels. The features with lower error statics (i.e., better features for RFI detection) for both low and high DC interference sources are the variance, power, peak-to-peak distance, the average over absolute value of first differences, mean of the auto-correlation coefficient, distance, inter quantile range, centroid shift, spectral spread, power spectral maximum, spectral entropy, spectral skewness, spectral kurtosis, spectral crest, spectral flatness, spectral flux, and Ljung-Box test. It should be also noted that the error static values for the peak-to-peak distance, centroid shift, spectral spread, spectral skewness, spectral kurtosis, spectral crest, spectral flatness, and spectral flux increase with the DC of the RFI contamination. On the other hand, the standardized moments, i.e., skewness, kurtosis, m 5 -m 10 , and the normality tests including Jarque-Bera, Lilliefors, and Anderson-Darling tests perform poorly in discriminating between RFI-free and RFI-contaminated measurements which is expected as an increased number of interference sources leads to a convergence to a normal distribution similar to the RFI-free measurements. Considering these observations, and for the sake of consistency with the single RFI source cases, the variance, power, mean of the absolute value of first differences, mean of the auto-correlation coefficient, power spectral maximum, spectral entropy, spectral skewness, spectral kurtosis, spectral crest, spectral flatness, and spectral flux were selected for the OCSVM algorithm against RFI contamination with multiple sources. However, one should note that the best features to detect RFI may change as a function of number and type of RFI sources, specifically as the number of sources increases and the total RFI contamination becomes noiselike.
The OCSVM algorithm, trained using these eleven features of the RFI-free radiometer measurements, has been implemented on the dataset, and the accuracy, precision, and recall performance metrics have been calculated. The rows identified as OCSVM 11 in Table III shows the values of these metrics for various INR levels. From the table, it can be observed that the RFI detection algorithm is capable of efficiently identify RFI contamination with INR levels as low as −15 dB.

C. Performance of the State-of-The-Art Algorithms
It is imperative to compare the performance of the novel RFI detection method introduced in this article with that of the traditional state-of-the-art algorithms such as the kurtosis detection and pulse blanking techniques [41], as well as the combination of those with a logical OR operator as implemented in the SMAP mission, hereinafter referred to as the "OR method." Thus, the state-of-the-art algorithms have been implemented on the simulated data where the RFI-contamination included a single RFI source with varying INR and DC levels.
The kurtosis is the fourth standardized moment of the radiometer measurements which estimates the total tailedness of the integration window. For a zero-mean white Gaussian noise representing RFI-free measurements, the kurtosis estimate itself is a Gaussian random variable with a mean value of three. A kurtosis detection algorithm has been implemented on the simulated dataset in a way that radiometer integration periods with kurtosis values more than three standard deviations away from the mean kurtosis value are considered as RFI-contaminated. This threshold would allow only 0.3% false alarm rate [42]. Fig. 9 shows the accuracy, precision, and recall values of the kurtosis detection as functions of INR and DC of the RFI-contaminated radiometer  measurements. In addition, the threshold has been varied to calculate the empirical AUC values which is also demonstrated. As seen in the figure, the kurtosis algorithm performs well in detecting the RFI-contaminated cases with INR ≥ −5 dB except for the RFI cases with DC values around 50%. The blind spot of the kurtosis detection against pulsed sinusoidal signals with 50% DC is a well-known fact; thus, this is expected.
The pulse blanking method is applied on the power of the radiometer measurements assuming that the RFI is localized in time and large instantaneous amplitudes imply RFIcontamination. The detection threshold in this study has been defined in terms of the mean and standard deviation of the RFI-free power measurements. Specifically, power values deviate from three standard deviations from the mean have been flagged as RFI-contaminated, resulting in 0.3% false alarms similar to the kurtosis detection [42]. The performance of the pulse blanking algorithm in terms of accuracy, precision, and recall is shown in Fig. 10. Again, the detection threshold has been varied to calculate the AUC values as well. It can be observed from the figure that the pulse blanking method achieves high accuracy, precision, and recall for RFI cases with INR ≥ −10 dB, depending on the DC value. Comparing with the kurtosis detection, pulse blanking eliminates the blind spot for the 50% DC RFI cases.   The OR method combines the detection outputs of the kurtosis detection and pulse blanking algorithms for the maximum likelihood of detection. The method flags a measurement as RFI-contaminated if RFI-contamination is detected by either of the two algorithms. Fig. 11 depicts accuracy, precision, and recall values of the OR method as functions of the INR and DC of the RFI-contaminated radiometer measurements. The performance of the OR method has been found to be similar to that of the pulse blanking algorithm.

D. Comparisons of RFI Detection Performances
The performance of the OCSVM RFI detection algorithm has been compared with the state-of-the-art OR algorithm. Fig. 12 demonstrates the performance differences between the two algorithms against single source RFI contamination calculated by subtracting the values of the accuracy, precision, and recall metrics associated with the OR algorithm from those of the OCSVM algorithm as functions of the INR and DC of the RFIcontaminated radiometer measurements. The figure highlights the improvements in RFI detection capabilities, especially for lower INR cases, due to the multidimensional nature of the OCSVM approach with additional features in the time, spectral, and statistical domains of the radiometer measurements.

VIII. CONCLUSION
In this article, a novel feature-based, multidimensional, oneclass support vector machine algorithm for detecting RFI in microwave radiometer measurements has been described and analyzed. RFI-free and RFI-contaminated radiometer measurements have been simulated and defined by their 31 heterogeneous features that characterize them in time, frequency, and spectral domains. Then, the novel algorithm selects the most relevant set of features for RFI detection and computes the hypersurface separating RFI-free and RFI-contaminated measurements based on its training using RFI-free measurements only. Radiometer measurements are classified as RFI-free or RFI-contaminated based on their location in the feature space with respect to this hypersurface. Note that characterizing RFI sources contaminating radiometer measurements in many remote sensing applications is difficult due to the large footprint sizes of space-borne antennas, thus an algorithm trained with RFI-free measurements, which can be obtained over low human activity regions, only is highly desirable.
RFI detection performance of the novel OCSVM algorithm has been compared with state-of-the-art methods such as pulse blanking and kurtosis detection techniques, as well as their combination, i.e., the OR algorithm. It has been demonstrated that the OCSVM algorithm performs better in low INR RFI contamination cases owing to its multidomain nature, analyzing various properties of the measurements in time, frequency, and statistical domains. To highlight the RFI detection performance of the multidomain OCSVM algorithm itself, rather than the number of features utilized in it, a separate analysis has been conducted in which only power and kurtosis features were used in the OCSVM algorithm, and the RFI detection performances have been compared with those of the OR algorithm against a single RFI source. Fig. 12 demonstrates the differences in RFI detection performances between this 2-feature OCSVM and the OR algorithm in terms of accuracy, recall, and precision metrics. As shown in the figure, the OCSVM algorithm with only two features still provides similar RFI detection performances as the state-of-the-art OR method. Including more relevant features, however, enables the algorithm to detect low-level, i.e., low-INR, RFI contamination. Fig. 14 demonstrates this fact by comparing the performance of the OCSVM algorithm against RFI contamination due to multiple sources when it utilizes only kurtosis and power features versus all the features reported in Section VII-B. In the figure, the RoC demonstrate that the full-scale OCSVM algorithm significantly improves the detection performance against RFI contamination with INR levels between −16 and −25 dB. Similar information can be seen in Table III as well by comparing the performance metrics of the OR algorithm (OR), as well as the 2-feature (OCSVM 2) and full-scale (OCSVM 11) OCSVM algorithms against RFI contamination with various INR levels due to multiple sources. This is very important as low-level RFI, being difficult to differentiate from natural variations in radiometer measurements, is the most challenging problem to overcome in microwave radiometry.
Future research will include investigations on detecting RFI other than pulsed sinusoidals such as chirps and wideband continuous noiselike signals so that new features that give a better representation of the dynamic RFI environment can be identified and included in the novel algorithm. In addition, much higher number of RFI sources will be included in the analyzes to represent large radiometer footprints observing high-human activity regions. As mentioned in Section VII-B, such cases may weaken statistical features mentioned in this article for RFI detection; thus, additional features may need to be incorporated in the OCSVM process. Also, the algorithm will be implemented on real radiometer data. The high-resolution SMAP validation experiment 2012 (SMAPVEX12) data measured by the PALS instrument [43] will be utilized and the resulting RFI detection performances will be compared with SMAP's state-of-the-art procedure. Finally, the feasibility of the implementation of the novel algorithm in real hardware will be tested as space instruments have varying data sampling, integration, processing, and power limitations and the multidimensional RFI detection procedures like the OCSVM algorithm may require fine temporal and spectral resolution, high precision, and can be computationally expensive. In a real remote sensing scenario, the training parameters for the algorithm can be computed offline prior to launch and the algorithm can be updated during the mission's lifetime by recomputing the hypersurfaces using the most recent RFI-free data observed online.