A Metacognitive Approach to Adaptive Radar Detection

Detecting objects of interest is one of the core functions of radar systems and doing so in the presence of interference is an ongoing challenge in the domain. Clutter is an especially problematic form of interference that can result in a large number of false alarms. In general, the goal of radar detection systems is to maximize the likelihood of detecting targets while maintaining a constant false alarm rate (CFAR). Adaptive detectors like the generalized likelihood ratio test (GLRT) have been developed to achieve this. However, they are derived assuming that the clutter can be modeled according to a consistent probability distribution. This assumption typically does not hold true in many real-world applications, particularly on airborne or naval systems, which degrades detection performance and eliminates the desired CFAR behavior. In this work, a metacognitive approach to adaptive detection is proposed to achieve CFAR-like behavior over a range of clutter distributions. It is demonstrated that this metacognitive detector maintains CFAR-like behavior when presented with data randomly selected from a range of clutter distribution models (Gaussian, K, and Pareto) and that it matches the performance of the traditional GLRT in Gaussian interference.


I. INTRODUCTION
A fundamental challenge for all sensor systems is that of detection-is an object of interest present in a measurement, or is it absent?Traditionally, the signal processing approach for detection is the venerable statistical hypothesis test [1].If the signal present and signal absent hypotheses are perfectly known, the Neyman-Pearson criterion may be applied to maximize the probability of detection while maintaining a desired false alarm rate.However, in practice the noise and interference statistics must be estimated.As a first step, noise and interference statistics may be assumed to be Gaussian due to the Central Limit Theorem [1].In this case, adaptive detection such as the Reed, Mallet, and Brennan (RMB) [2] provide an improvement to the detection performance by estimating a threshold for the rejection of correlated interference through a sample matrix inversion (SMI) technique.The RMB utilizes a set of training data that is assumed to contain only Gaussian noise to estimate the covariance matrix of the noise and create a matched filter to apply to the signal being evaluated for detection.The output of the matched filter would then be compared against some threshold value based on the probability density function (PDF) of the signal-to-interference ratio (SIR).The use of sample data that is assumed to contain only interference to estimate the interference covariance matrix has served as the basis for adaptive radar detection algorithms since.
As an example, radar detection in a ground clutter background may classically be treated as Gaussian distribution noise and interference, where the multiplicative clutter term is treated as interference.The family of space-time adaptive processing (STAP) techniques have leveraged and expanded on the SMI technique to overcome the challenges presented by clutter [3], [4].STAP dynamically changes filtering properties in both space and time [5] to improve the differentiation between the target returns and the clutter.
Much of the work done in developing adaptive detection algorithms since has been focused on defining the test statistic used to evaluate whether the return signal under evaluation is caused purely by noise/interference or by a combination of interference with a target of interest.Early algorithms focused on the case of homogeneous Gaussian noise with unknown covariance, while more recently work has been done to expand these implementation for use in the partially homogeneous and non-Gaussian cases [6], [7], [8], [9], [10], [11], [12], [13], [14].Non-Gaussian interference has also been noticed in the case of additive interference from communications users in congested spectrum [15], [16], [17].
A key issue with existing adaptive radar detection algorithms is their sensitivity to the distribution of the interference they encounter.If the interference deviates from assumptions, detector properties (e.g., constant false alarm rate) may not be robust to the actual distribution.While some algorithms, like the adaptive coherence estimator (ACE), are more resilient, they still experience this degradation [6], [18], [19], [20], [21], [22], [23], [24], [25], [26].That resilience also usually comes at a general performance cost.In real world settings, it is not realistic to expect mobile radar systems to encounter consistent interference distributions.This issue is well documented in the literature, and much of modern adaptive detection research is focused on finding ways to overcome this limitation.
An area of research with significant potential is the use of machine learning (ML) techniques to augment adaptive detection algorithms (i.e., the development of cognitive detection algorithms) [27], [28].ML approaches excel at identifying patterns within complex data and classify it based on those patterns.This previous work examined techniques for robust threshold selection across a library of potential interference distributions [27], [28].A number of other deep learning (DL) approaches to adaptive radar detection have been proposed in recent years.A review of recent DL-based detection algorithms is provided in [29].
Some use artificial neural networks (ANN) to attempt to replicate or replace existing detection approaches.In [30], Nuhoglu et al. proposed using an autoencoder with bidirectional long short term memory (LSTM) network to denoise return signals and perform detection.Coluccia et al. [31] presented a method using a K-nearest neighbors (KNN) approach to combine the adaptive matched filter (AMF) and GLRT test statistics for detection in Gaussian noise.Coluccia et al. [32] developed a KNN-based detection statistic for use in K distributed clutter.Authors in [33] and [34] performed detection using a deep ANN that is trained to replicate the behavior of a cell-averaging CFAR (CA-CFAR) detector on range processed data with Gaussian interference, and Wunsch et al. [35] used an ANN trained to perform peak detection in the frequency domain.Mata-Moya et al. [36] used a deep neural network (DNN)-based approach for adaptive thresholding in range-Doppler processed data.Both random decision forests and recurrent neural networks (RNNs) for detection based on I/Q data are explored in [37].
Others have used DL to augment existing detection algorithms.In [38], a DNN is prepended to a GLRT detector to identify and remove nonlinearities in received signals.Kerbaa et al. [39] presented a multiheaded DL system to estimate the shape and scale parameters of Pareto distributed clutter.Xiang et al. [40] proposed a method of clutter classification based on the combination of kernel density estimation and batch orthogonal matching pursuit algorithm.An adaptive detection framework for changing clutter backgrounds is proposed in [41] that uses a cumulative sum algorithm to identify areas where the clutter distribution changes, kernel density estimation to classify the new distribution, and a predefined look-up table of detection thresholds.
Each of the approaches above demonstrate the potential of using DL for detection.However, a number of challenges exist with each.Those that utilize RD processing struggle with detection of slower moving objects that fall near or within the clutter ridge.SAR processing is a specialized technique with high computational cost that is not broadly used for real-time radar detection.Larger DL models such as CNNs, RNNs, and autoencoders are computationally costly and can be difficult to train on relatively simple datasets (such as I/Q data) without overfitting.Deep ANNs are interesting for use in detection, due to their speed and ability to generalize well.However, the relative simplicity of these models makes it challenging to train them to perform detection over a broad range of clutter distributions without a high level of model confusion.
Existing techniques using deep ANNs display the same core limitations present in non-DL detectors; either an interference distribution is assumed and used with a single threshold setting ANN, or an ANN is used purely for clutter classification and then a look-up table of detectors/thresholds is used.Setting thresholds over a wide range of distributions is too complex a problem for a single ANN to learn, particularly given the overlap in certain distribution regions.But if a distribution is assumed, then the model will not generalize well when presented with other clutter distributions.Look-up tables of detectors/thresholds limit coverage to the distributions and parameters contained within them, and grow rapidly to increase coverage or resolution.
From a high level, the pattern recognition capability of ML can be used to categorize clutter distributions present in training samples and select an appropriate adaptive detection algorithm to use in that scenario.Additionally, we consider the use of ML regression techniques to set thresholds for an adaptive detector.The system presented in this article is the development of a metacognitive detector that combines these applications.Specifically, the metacognitive detector consists of a set of ML agents specialized to select detector thresholds for particular clutter models and a higher level agent capable of classifying the clutter model most appropriate for the observed data and selecting the corresponding threshold-setting agent.This approach overcomes the limitations of existing DL detection techniques by intelligently identifying statistically unique distribution regions and using a set of specialized agents to set thresholds for each of them.Pairing the specialized threshold agents with a higher level classification agent trained to identify the distribution region allows us to reduce the complexity of the problem space without loss of distribution coverage, enabling our system to maintain a desired rate of false alarms over any set of interference distribution regions.Each specialized agent can operate over the continuous distribution region it was trained for, making it a much more extensible and streamlined approach than look-up tables.Additionally, the networks are trained to operate using a downsampled set of ordered statistics found from support samples, an approach that is computationally efficient and which we demonstrate to contain sufficient information for a deep ANN to identify the underlying distribution.It also makes our approach robust to variations in the number of available support samples, with the covariance estimate for the GLRT serving as the driver for sample support requirements.While this metacognitive detector is considered in the context of detecting radar targets in a clutter background, it is also applicable to the more general problem of hypothesis testing in correlated, non-Gaussian statistics.For example, radar detection in the presence of distributed interference sources from telecommunications devices [15], [16].
The rest of this article is organized as follows.Section II-A presents the model used to define the detection problem.The derivation of the GLRT is given in Section II-B.The clutter distribution models used in this work are discussed in Section II-C.Metacognition and its applications to radar are discussed in Section II-D.The architecture of the proposed metacognitive detector is presented in Section III-A and Section III-B describes the training of the various ML agents.Results of full system testing are provided in Section IV.Finally, Section V includes both our conclusions and proposed future work for the proposed metacognitive detector.The contributions of this article are as follows: 1) An approach to downsample order statistics to improve the stability of deep learning (DL) techniques, allow for a static input layer to a DL system with dynamic input data size, and reduce network training times.2) A new DL-based technique to determine the threshold for the generalized likelihood ratio test (GLRT) threshold for a variety of non-Gaussian distributions.3) An innovative DL-based clutter distribution classifier.4) A metacognitive approach to adaptive radar detection with constant false alarm rate (CFAR)-like behavior using an ensemble of the aforementioned clutter identification and threshold selection models.

II. BACKGROUND
Radar detection algorithms conduct a hypothesis test to determine whether a sample-under-test contains a target, or whether it is simply distributed according to the null hypothesis (noise and potentially interference).Usually a test statistic is formed, and a threshold determined where signals with a return above the threshold are determined to be a detection.Fundamentally, the threshold is determined by the assumed distributions for each hypothesis and the desired tradeoff between the probability of detection (i.e., sensitivity) and the probability of false alarm.Determination of the threshold is challenging when there are other reflectors in the environment that produce radar returns that are similar to the target.The additional reflectors are known as clutter and can be comprised of foliage, buildings, roads, and many other items in the environment.The presence of ground clutter (roads, foliage, ground returns) provides a distributed return in the radar signal.As a result, the clutter produces distributed radar returns that exceed the detection threshold.The detection of the clutter is undesired in most applications as they produce ambiguity in the target detection performance of the radar.In addition, the clutter returns produce false alarms, or false detections, of targets in the environment.To mitigate the impact of clutter and noise altering the assumed null distribution statistics, the class of adaptive detectors has been developed.Adaptive detectors use test statistics that incorporate training data to dynamically adjust the threshold or mitigate the impact of changing noise variance.The most desirable property of adaptive detectors is the constant false alarm rate (CFAR) property [53], where the detector provides a desired false alarm rate regardless of the input noise variance.

A. Mathematical Model
In order to frame the rest of this article, it is important to first establish the problem space.Consider the scenario in which a pulsed-Doppler radar transmits m pulses over the course of a CPI.The return signal is received, processed, and sampled appropriately.For a given range bin, this results in the complex signal z, where ( The function of a detection system is to determine whether or not z contains a signal return from a target of interest.In the event that it does, then z is comprised of an additive combination of the signal of interest and interference.Otherwise, z consists entirely of interference.What has just been described is a binary hypothesis test.The two hypotheses can be represented as In (2), n is the m-dimensional complex interference vector and s is the m-dimensional target return vector.Throughout the literature, the interference is modeled as n = √ τ x, where τ is referred to as the texture and x is the speckle.Here, τ is positive and represents localized clutter power.x represents local back-scattering and is typically modeled as an m-dimensional Gaussian random vector with zero mean and normalized Hermitian covariance matrix For convenience in simulation, here we adopt the model, where M i, j = ρ |i− j| (1 ≤ i, j ≤ m).The correlation coefficient ρ is positive in the range 0 ≤ ρ ≤ 1 with higher values modeling more correlated interference and lower values modeling less correlated interference.The results presented in this article are general and applicable to any scenario, where the interference term is correlated.In practice for a single channel radar, a Doppler power spectral density model (PSD) would be adopted [54], and for a STAP IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL.60, NO. 1 FEBRUARY 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
application a more advanced covariance model would be introduced [3].
The target return signal is modeled as s = αp.α is an unknown positive random value representing the amplitude of the signal return.This is meant to capture the effects of channel propagation.p is the m-dimensional known steering vector p = [e − j2π f D T s , . . ., e − j2π m f D T s ], where f D is the Doppler frequency and T s is the pulse repetition interval.We can now rewrite the previous hypotheses as In addition to the sample under test, it is assumed that there are a number K of additional samples containing only interference.These are typically referred to as training or secondary data in the literature and here they are represented as

B. Generalized Likelihood Ratio Test
In what would prove to be a pivotal work [55], Kelly proposed the generalized likelihood ratio test (GLRT) as a new adaptive radar detector for use in the presence of unknown homogeneous Gaussian noise.This detector has been thoroughly investigated and expanded upon in subsequent works: [56], [57], [58], [59], [60], [61].As its name suggests, it is an extensions of a likelihood ratio test (LRT).In order to set up this likelihood test, two joint probability density functions (PDFs) are first defined for the input signals, one for H 0 and one for H 1 [55] For the null hypothesis, the signal consists only of interference that is assumed to be zero mean Gaussian and identically distributed to the training signals, so the joint PDF becomes

K+1
(5) where N is the number of pulses in each sample, equivalent to m in (1).In this case, only the interference covariance matrix M is unknown.
For H 1 , where z contains both a target return and interference, the PDF becomes

K+1
(7) Here, both the interference covariance matrix M and the target return signal amplitude α are unknown.Maximum likelihood estimation is used to independently maximize each of these PDFs over all unknown parameters, and then the ratio of the two, referred to as the test statistic, is compared with a threshold to determine target detection.This takes the form: Plugging ( 5) and ( 7) into ( 9) and performing the maximization yields the test statistic The true interference covariance matrix is generally not known and is estimated as Substituting ( 11) and η = λ−1 λ into (10) yields the more well known form of the GLRT test statistic: Thresholds are selected in order to maintain some desired probability of false alarm (P f a ), which is defined as In [56], (P f a ) for the GLRT detector is derived as Note that this is only a function of K, N, and λ 0 , which proves that the GLRT detector is CFAR with respect to the clutter covariance.
The GLRT detector broadly serves as the benchmark against which other adaptive detectors are judged.For the case of matched target signal returns in homogeneous interference, it typically displays the best probability of detection with a relatively low false alarm rate.On the other hand, it is more complex and computationally costly than algorithms like the AMF [62].The GLRT detector is also designed assuming that the interference is zero mean Gaussian with an unknown covariance matrix, providing it with CFAR behavior with respect to the clutter covariance, but not to its interference power level [18].So long as the interference is homogeneous, this is not an issue, but it degrades clutter rejection in more complex clutter scenarios.
The GLRT was constructed for the detection of a signal with constant but an unknown amplitude.It is shown to have better detection performance and CFAR property compared with existing statistics [63].The GLRT has been applied across a multitude of radar applications.For example, Kakoolvand et al. [64] utilized the GLRT method to efficiently generate difference image (DI) for change detection (CD) when utilizing synthetic aperture radar (SAR) images.They utilize GLRT due to it possessing important applications within multichannel radar detection involving both white Gaussian noise and spherically invariant random process clutter.Results of their system showed superior performance with the best Kappa coefficient and percentage correct classification (PCC) in four datasets of Ottawa, San Francisco, Farmland C, and Inland water [64].Luong et al. [65] extended the GLRT to quantum radar and explored the question of what test statistic or detector function should radar utilize when conducting the hypothesis test.They built upon two previous works [66] and [67] to include explicit expressions for the GLR detector and the associated ROC curve including a small-N analysis.Due to the demonstrated robustness and potential to generalize across applications, the GLRT will serve as the core adaptive algorithm within the subsequent machine learning framework developed in Section III.

C. Clutter Distributions
As previously noted, clutter is one of the primary sources of false alarms in radar detection algorithms.Most adaptive detection algorithms are developed assuming that the clutter can be modeled as a particular probability distribution.The test statistics for these detectors are derived based on this assumption and the performance characteristics of these detectors only hold true in scenarios, where the clutter matches the model used to derive the algorithm's test statistic.While Kelly's GLRT was derived using the assumption of Gaussian clutter, test statistics can be derived for adaptive detection algorithms based on other distribution models [7], [10], [11], [13], [68].
The Gaussian distribution is the most broadly used to model clutter in radar detection.It works well in many scenarios and is simple to generate and analyze.However, it fails to accurately represent clutter returns from more complex environments like the ocean or urban settings.Compound Gaussian distributions have been broadly applied in the literature to better model these more challenging clutter scenarios.In this work, three of the more commonly used clutter distributions are used in the development and testing of the proposed system: Gaussian, Pareto, and the K-distribution.
As was described in Section II-A, the interference (n) is modeled as a multiplicative combination of speckle (x) and texture (τ ).Throughout this work, speckle is modeled as a complex zero-mean Gaussian vector with Hermitian covariance matrix (M) defined by the one-lag correlation coefficient (ρ).The clutter texture is used to represent the local power level of the clutter and is modeled as a positive real value that remains constant over the course of a signal but can vary between signals.Modeling the clutter in this way as n = √ τ x means that the clutter texture can be used to easily adjust the distribution of the generated data so that if fits either a Gaussian or compound Gaussian distribution ( [13], [69]).As described in [70], this results in the PDF of a clutter-only signal taking the form: This allows for the generation of different clutter distributions based on the selection of τ .When the texture parameter is set to τ = 1, (15) simplifies to and the clutter is modeled as a complex Gaussian distribution as in (5).
In order to generate the K-distributed clutter model, τ is set to be Gamma distributed with shape parameter a and scale parameter b where (•) is the Gamma function.The amplitude PDF of the clutter can then be represented as a K distribution [70]: where K (•) is the modified Bessel function of the second kind.
Pareto distributed clutter can be modeled by using an inverse gamma texture with shape parameter a and scale parameter b: [13], [71] In each of these cases, adjusting the shape and scale parameters for the clutter texture τ also modifies the shape and scale (respectively) of the K and Pareto distributions (as one would expect).As such, these will here on out be referred to in the broader context as the shape and scale parameters for the clutter distribution.All interference samples are assumed to be independent.Since this work explores a novel approach to detection, the scale parameter is held constant throughout testing to reduce the number of parameters to be explored.The shape parameter for both the K and Pareto distributions have a substantial impact on the underlying distribution characteristics, and testing is performed over a broad range of shape parameters (values ranging from 0.1 to 10 for K and 0.5 to 10.5 for Pareto).This produces a vast range of distribution variation, and demonstrates that the proposed approach can generalize well and learn to operate over a range of varying distributions.Given that this approach focuses on identifying statistically unique distribution regions and using a combination of a discriminator model to identify those regions and a set of specialized threshold setting ML models trained for each region, it can reasonably be assumed that the proposed approach could trivially be extended to cover a range of scale parameters.At most, simultaneously sweeping both the scale and shape parameters would simply add additional distinct distribution regions.This would only require the training of additional specialized threshold agents and increase the size of the labels set for the discriminator model.
Distributions under the class of compound Gaussian models, also known as spherically invariant random processes (SIRPs) have been widely shown to fit to a variety of clutter models and communications interference models [15], [72], [73], [74].In particular, for clutter the compound Gaussian model can be shown to arise from an extension of the central limit theorem [70].Further, due to the closure properties of the sampled vectors from a SIRP process, denoted as spherically invariant random vectors (SIRVs), any linear transformation of a SIRV will result in a SIRV with the same instantiation of the characteristic random variable τ [74], [75], [76].In other words, the statistical distribution is unchanged under linear transformations of data, only the mean and covariance matrix may change.Hence, these models are robust to the operations we investigate and are fundamentally applicable across a variety of radar scenarios beyond what is examined in this work.A wide range of common clutter models can be fit to SIRV distributions, including homogeneous clutter, compound clutter, and heavy-tailed clutter.However, they do not account for heterogeneous clutter or clutter contaminated with target interference.These types of clutter models will be explored in future work.To further demonstrate the broad applicability of the proposed detection method, the system has been tested on clutter with a lognormal distribution that it has not been trained to recognize.This is a challenging heavy-tailed clutter distribution that is not part of the SIRP class.

D. Metacognition
Before exploring metacognition, it is necessary to first understand the meaning of cognition in radar applications.Cognitive radar (CR) is achieved by coupling the inherent qualities of radar with ML techniques.It is largely considered to be made up of three key parts: intelligent signal processing based on the sensed environment, feedback from the receiver to transmitter, and preservation of information contained in the radar returns [77], [78].In this manner, a CR system can classify, or "learn," things about its environment to adaptively determine an appropriate system response.
Metacognition is a relatively new state-of-the-art technique that adds a second layer of learning to a CR system.This technique allows the overall system to monitor how well a selected CR system response achieves the desired outcome and adjust/modify the CR system as needed.This is an extension of the traditional knowledge-aided approach.For example, Capraro et al. described an approach for metacognition based on an Airborne Intelligent Radar System (AIRS) [79].The goal of this system is to optimize performance while maintaining P f a and maximizing P d .The proposed system inputs environmental data to intelligently assign weights and other parameter values to filter/CFAR algorithm pairs.The weighted and summed results would then be analyzed by a performance processor to select the most appropriate filter/CFAR algorithm pair for the current environment [79].
Gadhiok et al. [80] described a metacognitive approach for wireless railway communications.The authors present a case study and initial results using a simulation of a metacognitive radio for train control.Here, the goal is to maintain communication performance in a high-noise environment.The cognitive engine relies on environmental data to adapt the radio's transmit power to achieve the required performance.Two cognitive strategies, and the performance tradeoffs associated with each, are discussed.A metacognitive engine is employed to select when to use each of these strategies in an effort to optimize the overall performance [80].
More recently, Martone et al. [81] investigated a metacognitive radar (MCR) model for dynamic spectrum access in congested environments.This MCR model monitors the congestion and complexity level of the environment and selects different spectrum sharing CR strategies.It then samples the CR strategies to evaluate performance before selecting the one that yields the best performance given the current spectral conditions.The authors implemented a simulation that demonstrated the capability of the MCR model to improve the performance by accurately identifying and adapting to a changing spectral environment [81].

III. APPROACH A. Model Architecture
The metacognitive architecture is designed to classify the distribution of the clutter (Gaussian, K-distribution, Pareto) into large shape parameter regions, then select the threshold for adaptive GLRT detectors, and combine the classification with the GLRTs to generate a dynamic detection.This mitigates the issues with assuming Gaussian distributed clutter and allows the detector to be aware of the clutter distribution observed in the signal.As a result, the detector is able to more robustly improve the detection performance and better maintain P f a for differently distributed clutter.This metacognitive detector architecture is shown in Fig. 1.
Therefore, this metacognitive structure fuses the concepts of adaptive radar detection (through use of the GLRT), knowledge-aided radar (through the selection of the interference statistics), and metacognition.The following section will discuss 1) the data preprocessing, 2) the discriminator design, 3) the threshold selector design, and 4) the ensemble evaluator.
1) Data Preprocessing: Section II-A discusses the format of the data generated to represent the radar domain problem space.In this architecture, the data block represents the sample under test and K = 32 support samples, all of length N = 16.The architecture performs some preprocessing of the data before it is input to the different sections of the metacognitive detector.
To generate the input to the discriminator and the threshold selectors, the (16 × 32) matrix of support samples are vectorized into 512 complex data points, and then ordered.These ordered statistics are then downsampled by a factor of 8 to reduce the dimensionality that the machine learning must address.As a result, a downsampled set of uniformly spaced ordered statistics with n = 64 samples is then available for the discriminator to learn, as described in the following: (1) , z ds,(2) , . . ., z ds,(n) ] T (20) where Or(•) is the order function, vec(•) refers to vectorization, and ↓ 8 (•) denotes downsampling by a factor of 8. Using downsampled ordered statistics as the inputs to the models has two key benefits.First, it provides additional distribution information to the ML models.There is a high degree of variability to the values contained within raw signal returns, and training an ML agent on such data will typically cause them to either over fit or fail to converge to a solution.Using a set of ordered statistics requires very little preprocessing, reduces the variability of the inputs to the ML models, and provides crucial information related to the underlying distribution that the models were demonstrated to learn.This is in part what allows the proposed system to generalize effectively.In addition, the use of a constant number of downsampled ordered statistics makes the proposed system largely agnostic to the amount of sample support available, as long as at least 64 (in this case) values are available.At worst, some models in the system may require retraining, but no structural/architectural changes would be needed.Testing has not been performed in this work to identify the minimal number of ordered statistics required for model training, but future work will examine this optimization problem.
The GLRT (as discussed in Section II-B) requires a clutter covariance estimation and the sample under test in order to generate a test statistic.The clutter covariance estimation is calculated using the training samples Z K as shown in (11).Finally, the GLRT needs the sample under test z SU T which will include the clutter distribution and may or may not include a target.
2) Discriminator: The discriminator is a classification DNN, which takes z ML as inputs.This classifier trains to identify the patterns in seven classes of clutter and outputs a probabilistic classification of which classes the input aligns to most.The seven classes chosen for this metacognition system are Gaussian, three regions of K-distribution, and three regions of Pareto distribution.The K-and Pareto   distributions represent regions in shape parameter a, but the DNN truth labels are a classification label ranging between 0 and 6.The individual samples were generated with a random shape parameter in the labeled region.Regions and labels are shown in Table I.
The model design is shown in Fig. 2 and is a two hidden layer DNN using the exponential linear unit (ELU) activation function.The output is a probabilistic classification and uses sparse categorical cross entropy (SCCE) as the loss function.Given output classes as the output takes the form [P(z ML ∼ C G ), P(z ML ∼ C K1 ), . . ., P(z ML ∼ C P3 )] with a summed probability of 1.
3) Threshold Selection: This metacognitive system has a single Gaussian GLRT detector and six DNNs trained to select the continuous threshold value within a region for the detectors.Each of the K-and Pareto distribution regions have a threshold selector DNN that have identical model structure but are each uniquely trained.The input to the DNNs are the n = 64 downsampled order statistics z ML and the inputs to the GLRT are the estimated covariance matrix S and the sample under test z SU T .The model for each DNN is shown in Fig. 3.
In order to complete supervised training, the DNNs needed labels for the optimal threshold at discrete points within their region.A halving line search algorithm was used to identify an optimal threshold within an error range .The relationship between the shape parameter and the optimal threshold for a given P f a was monotonically decreasing for both K-Distribution and Pareto.Therefore, the optimal threshold was found by using Algorithm 1.
Algorithm 1 uses Monte Carlo analysis to identify optimal thresholds for training the regression models.Specifically this algorithm will iterate toward the optimal threshold for the target P f a and then halve the threshold delta until it converges on the optimal threshold given some margin ( ).Monte Carlo methods for identifying are commonly used to identify thresholds for covariance inversion detectors like the GLRT due to a lack of closed form expressions to calculate thresholds in non-Gaussian clutter.It is, of course, infeasible to use Monte Carlo methods in real-time systems, so many classical approaches to dynamic thresholding in non-Gaussian scenario use them to populate a look-up table of threshold values.Such approaches are limited by their size and resolution, limiting their utility in real-world applications.Here, we use Algorithm 1 purely for generating training labels.Each region had a spread of threshold labels determined for the target P f a = 10 −4 presented in Table II.In Algorithm 1, C n represent the filter classes and a i are the shape parameters in Table II.λ 0 is the threshold and λ 0, is the amount the threshold will change each loop iteration.P f a,target is the desired probability of false alarm and P f a is the current P f a calculated given the current threshold.The optimal GLRT thresholds for the K-and Pareto distributions are shown in Figs. 4 and 5.Each of the threshold selector DNNs used mean absolute error (MAE) as its error function during training.
4) Ensemble Evaluation: The final Ensemble Evaluation system used the probabilistic classification from the discriminator to decide which of the seven threshold selector and GLRT pairs to process the current sample and output  its detection evaluation for the system.This process was completed automatically for each sample (z SU T and z K ) and the evaluator chose the maximum probability class from the discriminator evaluation.

1) Discriminator:
Training and evaluation of the discriminator included two parts.First, the clutter distributions with shape parameters for the chosen classes were evaluated regarding their similarity.To accomplish this similarity comparison the Jensen-Shannon divergence [82] between every class was calculated.The Jensen-Shannon divergence is symmetric divergence to the average of two distributions based on the Kullback-Leiber divergence method [83].The Kullback-Leiber divergence provides a measure of statistical distance by calculating the relative entropy between two probability distributions (P and Q).For discrete distributions within a space X , the Kullback-Leiber divergence is The Kullback-Leiber divergence is not symmetric (D Kl (P||Q) = D Kl (Q||P)), but the Jensen-Shannon divergence is symmetric by calculating the Kullback-Leiber divergence with respect to a new divergence M = 1/2(P + Q), which is the average of the original distributions.The Fig. 6.JS-divergence for system's seven classes.
Jensen-Shannon divergence is The Jensen-Shannon divergence matrix is shown in Fig. 6 and was used to evaluate how similar the selected classes were with 0 being identical and 1 being completely dissimilar.To ensure consistency a total of 20 000 samples of Z K (length 512 complex values) for each class were generated and the average Jensen-Shannon divergence over those samples is presented in Fig. 6.
The second component in training of the discriminator was to generate data labels and train the discriminator.The discriminator model was written in Python and trained with Tensorflow.For training, the data were generated as 20 000 length 512 complex values (Z K ) for each of the seven classes in C dist with the average shape parameter a from Table I.The samples were preprocessed into z ML from (20) and labeled with a class target [0 to 6].The training set was 75% of the total data with a training validation set of 12.5% and the evaluation testing set of the final 12.5%.The discriminator DNN was trained with a learning rate of 10 −4 and had a stopping criteria using a patience factor of 30.The patience factor represents a stopping criteria for training through evaluation of the validation dataset.Specifically, training ends when the number of successive epochs with validation metric (loss/accuracy) no better then the stored best metric has occurred.This method is used to have statistical confidence that the DNN has trained and is either stable or beginning to overfit.Following the patience stopping condition, the model weights with the best metric are returned.
The discriminator training and validation accuracy are presented in Fig. 7 and show a stable training for the DNN model.These training results are very promising for the success of the metacognitive system due to the very high classification accuracy (∼ 98%) and near zero confusion in classification.Compared to the Jensen-Shannon results, it is evident that the distributions had enough uniqueness in their order statistics for the discriminator to identify the clutter class with high accuracy.The classification confusion matrix for the discriminator is shown in Fig. 8.  2) Threshold Selection: In order to develop a system that could detect targets in diverse clutter distributions while also maintaining CFAR-like performance, the distribution space was subdivided into threshold regions.Seven regions were identified and each had a threshold selector implemented and trained to perform in that region with the Gaussian GLRT detector.The first region was the simple Gaussian case for which the GLRT was designed.In this region, the threshold was selected using the closed form expression found in (14).The other six regions were pulled from the K-and Pareto distribution models: 1) low shape parameter K; 2) medium shape parameter K; 3) large shape parameter K; 4) low shape parameter Pareto; 5) medium shape parameter Pareto; 6) large shape parameter Pareto.
The shape parameter values for these regions can be found in Table II, and plots of the ideal threshold values can be found in Figs. 4 and 5.The vertical dashed lines in these figures depict the transition boundaries between each of these regions, and it can be readily observed that the threshold curves are approximately linear within each of these regions.Regression DNNs were used to set the thresholds used by the six GLRTs operating in these regions.In this work, the regression DNNs trained to select thresholds are referred to as the threshold selectors.Each of the six threshold selectors used for the full system was trained on data generated using the shape parameters listed in Table II.20 000 test samples z SU T and associated support samples Z K were generated for each.All samples generated for training of the threshold setting agents were clutter only signals.The support samples were used to generate a set of order statistics z ML and training labels were generated according to Table II.75% of the data was used for training, 12.5% for validation, and 12.5% for testing.Training data and labels were jointly shuffled prior to training the models.Each model was created and trained in Python with Tensorflow using the MAE loss function.This was selected due to the threshold output being bounded between 0 and 1.The training loss curves for the threshold selectors can be found in Figs. 9 and 10.
After training, each of the threshold selectors was integrated with the GLRT detector and performance was verified experimentally.Performance verification was done by generating 100 000 test samples and associated support samples at each of the distributions/shape parameters listed in Table II.H 0 and H 1 samples were generated and used to verify P f a and P d performance of the detectors.Note that the P d performance was not the focus of this work and was only verified with a SIR of 35 dB.P f a performance is depicted in Figs.11 and 12, while the P d performance is shown in Figs. 13 and 14.In these figures, the vertical dashed black lines represent the transition points between threshold selectors.The vertical dashed red line in Figs.11 and     shows the transition into an extremely heavy-tailed clutter region that adaptive detectors traditionally struggle with.Special attention is paid to this low shape parameter region of the K-distribution in the analysis of the full ensemble system in Section IV.

IV. RESULTS
Evaluation of the ensemble system was conducted in two stages.In order to test the fidelity of the proposed detection approach in each distribution region, the clutter data used for system training and characterization was directly created with the desired distributions, rather than being simulated as environmental returns.All clutter data used for training and testing of the proposed system were generated with a one-lag correlation coefficient of ρ = 0.9.First, the system was tested using 500 000 test samples (with associated sample support) spread across the full region of distributions and shape parameters of interest.200 000 of these samples were generated using the K-distribution clutter model, 200 000 were generated using the Pareto model, and the remaining 100 000 were generated with the Gaussian model.For each the K-and Pareto distributed clutter data, 100 shape parameters were randomly selected (between 0.5 and 10 for K and between 0.5 and 10.5 for Pareto) and each shape parameter was used to generate 2000 samples.These 500 000 samples were first generated as clutter only (H 0 ) samples and were used to evaluate the false alarm rate of the detection algorithm.Those same samples were then used to generate H 1 ) samples by adding a target as in (3) to determine P d for the detector over different distributions and shape parameters.For this work, a static steering vector was used for all H 1 test samples with p i = exp − j2π (i−1)0.2 .α was set to ensure the desired SIR for the SUT using [84] The same data were also passed through the standard Gaussian GLRT, the Gaussian adaptive matched filter (AMF), the DL-based detector described in both [33] and [34], and the Gaussian GLRT with threshold selector trained on low shape parameter (0.1−1.0) K-distributed clutter (K 1 filter) for performance comparison.The Gaussian GLRT and AMF serve as a comparison with the state of the art in non-ML-based adaptive detection.The alternative DL detector is used as a comparison with modern DL detection approaches.This approach uses range processed (i.e., matched filtered) return signals as the input to a single neural network trained to perform detection directly.This approach was selected for comparison due to the similarity in the data model and level of processing, allowing for a more direct performance comparison.This also provides an interesting comparison against a purely ML-based detection system.The implementation and training of this alternative DL approach were performed exactly as detailed in [34].The low-shape parameter K-distribution is the most non-Gaussian distribution region considered here, and the threshold selector trained in this region consistently has the highest threshold values of the detectors evaluated here, making it a natural choice for comparison with the ensemble system.The Gaussian GLRT paired with the heavy-tailed K threshold setting ML model is referred to as the K 1 GLRT for the remainder of this article.
It is worth noting here that while the shape parameters generated for this testing fell within the regions that the threshold selectors were trained for, virtually none matched the training values exactly.This is important as it helped to demonstrate how effectively the system could generalize to the target distribution regions.It should also be noted that the region of K-distribution shape parameters used for this testing was limited slightly from what the Threshold Selectors were trained on.Specifically, both the Gaussian GLRT and the ensemble system struggled with the K-distributed clutter samples generated with extremely low shape parameter values (a = 0.1 − 0.5).This clutter region is known to be extremely challenging for detection algorithms, and so was ultimately evaluated separately.
In the second stage of ensemble testing, the system was tested in the most heavy-tailed clutter region considered here: the aforementioned low-shape parameter Kdistribution.500 000 samples were used in this evaluation, 100 000 generated from each of the following five shape parameters: 0.1, 0.2, 0.3, 0.4, and 0.5.Once again, the same clutter data were used to generate both H 0 and H 1 test samples for determining P f a and P d (respectively) of the algorithm.Performance comparison was conducted using both the Gaussian GLRT and K 1 GLRT.
The results for the metacognitive detector, the Gaussian GLRT, and the K 1 GLRT from the full region of clutter distributions are presented in Table III.These full range data results showed several impressive outcomes.First, the proposed metacognitive detector correctly identified and matched the performance of the Gaussian GLRT for the Gaussian interference.Second, across the sweep of K-and Pareto distributed data it maintained our trained target P f a of 10 −4 and a very high probability of detection.Across all of the distributions, the most restrictive single filter (K 1 ) maintained a low P f a but had much worse P d .The full dataset results show the optimal performance of the Meta-Cognitive Detector as it maintains a P f a ≈ P f a,Target and had an over all P d ≈ 94.5%.By comparison the Gaussian GLRT had much worse P f a ≈ 6 × 10 −3 and on the other extreme the restrictive K 1 GLRT was below P f a target by had worse P d ≈ 72.2%.
The results for the heavy-tailed clutter region of the K-distribution with extremely low shape parameter values (a = 0.1−0.5) is shown in Table IV.As the shape parameters approach 0, the ability for the metacognitive detector to maintain target P f a degrades, but it still out performs the Gaussian GLRT.The targeted filter K 1 is able to maintain  the target P f a .Considering the positive performance of K 1 filter but the degradation of the metacognition detector it is likely that the discriminator has issues correctly classifying these low shape parameter considering it was trained to identify the average shape parameter (a = .55)in the region of K 1 .Given that this region has high volatility, training thediscriminator with an additional class in this low region could result in better selection and full metacognitive results that more closely match K 1 GLRT results.
Additional testing was performed to generate ROC curves for each of the detectors.The Gaussian-only and Gaussian/K/Pareto sweep datasets were each run through the detectors over a range of target P FA s from 10 −4 to 10 −1 .This test was conducted with constant SIR values of 35 and 10 dB.The P D and P FA for each detector were recorded and plotted against the target P FA in Figs.15-17.The proposed detector is labeled as CD, the alternative detector of [33], [34] as Alt.This testing again demonstrates the superior performance of the proposed metacognitive detection algorithm.In Fig. 15, it is clear that this metacognitive detector maintains ideal P FA when only Gaussian data are present [see plot 15(a)] and when the full range of clutter distributions are tested [plot 15(b)].In Fig. 15(a), as expected all approaches other than the K 1 GLRT produce the desired P FA .However, in the case where the full range of clutter distributions and shape parameters are tested only the metacognitive approach (the dark blue trace) matches the desired (i.e., target) P FA .The GLRT, AMF, and alternative DL approach all have a higher false alarm due to the presence of heavy tailed clutter.In Fig. 16, the receiver operating characteristic (ROC) curve is shown for the Gaussian clutter only case [see Fig. 16(a)] and the case where all distributions are included [see Fig. 16(b)] for an SIR of 10 dB.In this case, the alternative DL approach, GLRT, and AMF have a small detection loss.However, when the SIR is increased to 35 dB, as shown in Fig. 17, all approaches other than the K 1 GLRT have excellent detection performance (as would be expected).
As the K 1 GLRT expects extremely heavy tailed clutter, it naturally sets a high threshold and therefore has a lower P FA than desired.The loss of sensitivity causes corresponding lower probability of detection, as shown in Fig. 16.Even more interesting is the observation that for both the low SIR and high SIR cases P D for the proposed detector demonstrates no variation between the Gaussian-only and full distribution sweep cases.Therefore, the metacognition allows the proposed system to maintain desired performance across the variety of clutter types it was trained with.Further, if Gaussian clutter is present, there is no penalty for using the proposed approach relative to an ideal detector.
Finally, testing was performed on a set of simulated returns containing clutter with a lognormal distribution.The metacognitive detector was not trained on lognormal data prior to running these tests, making this a good test of how well the approach generalizes to unknown distributions.The clutter was generated as a set of correlated, complex lognormal random variables.As with the other clutter distributions used in this work, a correlation coefficient of ρ = 0.9.Fig. 18 shows the histogram of the magnitude of the generated lognormal clutter coupled with the PDF of the distribution.This distribution does not fall within the category of SIRVs, but closely resembles a heavy tailed distribution similar to the low-shape parameter K distribution.
A test set of 100 000 samples was generated and run through each detector twice; once with a target signal added to each sample and once with no added target.This process was repeated over a range of P FA values ranging from 10 −4 to 10 −1 .The ROC curves for this testing are shown in Figs.19 and 20.It is worth noting that the proposed approach has degraded P FA performance in this scenario.However, from examination of Fig. 19, it is much closer to the ideal than the Gaussian GLRT, AMF, and the alternative DL approach.Only the GLRT with ML-based dynamic thresholding agent trained for low shape parameter K distributions (here called K 1 GLRT) has better performance.This makes sense, given the previously noted similarity.Additionally, given that K 1 GLRT is one of the specialized agents in the metacognitive detector and the previously noted confusion in the discriminator agent for the very low shape parameter K distribution, these results give strong evidence that the proposed metacognitive approach to detection generalizes to new distributions more effectively than the comparison detectors.Note that in Fig. 20 the performance is plotted against the target P FA , not the actual P FA .Therefore, while the AMF, GLRT, and alternative DL method will have better sensitivity for a desired probability of false alarm, their actual probability of false alarm will be the much higher value shown in Fig. 19.

V. CONCLUSION
In this article, a metacognitive approach has been presented to adaptive radar detection in clutter with diverse distribution models: Gaussian, K with shape parameters ranging from 0.5−10.0,and Pareto with shape parameters ranging from 0.5−10.5.The proposed method utilizes a combination of multiple cognitive detectors and a metacognitive agent to select between them.The cognitive detectors each consist of Kelly's GLRT adaptive detection algorithm paired with a DNN trained to set the detection threshold for distinct clutter distribution regions based on order statistics observations of the support samples.It was experimentally demonstrated that these detectors were able to maintain CFAR-like behavior for the Gaussian case and across a wide range of shape parameter values for both the K-and Pareto distribution models at a target P f a = 10 −4 .The metacognitive agent was a DNN trained to identify the distribution and shape parameter region that best fit the observed clutter, again based on order statistics observations of the support samples, and select the best cognitive detector for that scenario.This agent was trained to correctly identify the clutter distribution type (Gaussian, K, or Pareto) and the shape parameter region with greater than 98% accuracy.
Performance of the full metacognitive detection system was determined using 500 000 simulations: 100 000 generated with Gaussian clutter, 200 000 with K-distributed clutter, and 200 000 with Pareto clutter.When generating the samples for the K-and Pareto distributions, 100 shape parameters were randomly selected for each from a uniform distribution spanning the regions used to train the threshold selection DNNs.It was demonstrated that the metacognitive detector drastically outperformed the standard Gaussian GLRT, AMF, and the DL-based detector described in [33] and [34].It was able to maintain CFAR-like behavior while maintaining a high and nearly constant P d across the full range of clutter distributions.In addition, its performance perfectly matched that of the Gaussian GLRT in Gaussian interference, indicating that there is no observable performance degradation to using this approach in the ideal case.When presented with a new clutter distribution that it was not trained on, the proposed detector was able to achieve a P FA much closer to the desired value than the GLRT, AMF, or the alternative DL-based detector.Additional testing was performed to characterize the performance of the proposed approach in extremely heavy tailed clutter.For these tests, an additional 500 000 simulations were run in which the clutter was modeled as K-distributed with shape parameters below 0.5.While the performance of the proposed system did degrade in this region, it did so more slowly than the Gaussian GLRT.These results provide strong evidence that a metacognitive approach to adaptive detection can largely eliminate the distribution dependence of traditional adaptive detection algorithms with no performance cost when the interference is Gaussian.Substantial work can be done to expand on the approach presented here.Additional testing will be performed to more fully characterize the detection performance, computational overhead, and sample support requirements of the proposed system.Additional testing will be used to characterize the impact of changing the correlation coefficient or using alternative covariance matrix structures on the performance of the proposed detection method.There are plans to apply the proposed detector to real world data and simulated data with sample support data containing mixed clutter distributions.In addition, the metacognitive approach could be extended to include reinforcement learning and evolutionary computing approaches to enable the system to rapidly adapt to new clutter distribution regions.

Fig. 15 .
Fig. 15.ROC curves showing the measured P FA versus the target P FA .(a) ROC curve-P FA versus target P FA -Gaussian clutter.(b) ROC Curve-P FA versus Target P FA -Clutter sweep.

Fig. 16 .
Fig. 16.ROC curves showing the measured P D versus the target P FA with 10 dB SIR.(a) ROC curve-P D versus Target P FA -Gaussian clutter.(b) ROC curve-P D versus target P FA -Clutter sweep.

Fig. 17 .
Fig. 17.ROC curves showing the measured P D versus the target P FA with 35 dB SIR.(a) ROC Curve-P D versus Target P FA -Gaussian Clutter.(b) ROC curve-P D versus target P FA -clutter sweep.

Fig. 20 .
Fig. 20.ROC curves showing the measured P D versus the target P FA for Lognormal clutter.(a) ROC Curve-P D versus target P FA for lognormal clutter with 10 dB SIR.(b) ROC Curve-P D versus target P FA for Lognormal clutter with 35 dB SIR.

TABLE III Verification
Test Results-Full Range of Clutter Distributions and Additional Lognormal Case

TABLE IV Verification
Test Results-Extremely Heavy-Tailed K-Distributed Clutter