Model-Based Data Augmentation Applied to Deep Learning Networks for Classification of Micro-Doppler Signatures Using FMCW Radar

Deep neural networks (DNNs) have become a relevant subject in the classification of radio frequency signals and remote sensing data. A primary challenge is a tradeoff between obtaining data that are suitable for DNN training and the effort that making experimental measurements requires. Hence, the quality and quantity of data used for the training and testing of models are crucial for effective classifier development. The training dataset should cover a wide range of cases that synthesize the actual scenarios being classified. This work proposes a novel data augmentation method based on a deterministic model to generate a simulated dataset of radar micro-Doppler signatures suitable for unmanned aerial vehicle (UAV) target classification, without requiring measurement data. It is shown that the DNN trained using the properly generated model-based data offers improved classification accuracy performance. Results are presented for a two-class classification of the number of UAV motors using a 77-GHz frequency-modulated continuous-wave (FMCW) automotive radar system. The effectiveness of the proposed methodology is proven: a classification accuracy of 78.68% is achieved using a convolutional neural network (CNN) trained using the synthetic dataset, while an accuracy of 66.18% is achieved by using a typical signal processing data augmentation method on a limited measured dataset.

agricultural [2], industrial [3], and defense applications [4], they are undoubtedly a serious threat to the public and to flight safety [5]. Hence, it is necessary to identify drones in order to decrease such risks [6], [7]. However, due to their small size [8], low flight speed, and low flight altitude, UAVs are easily hidden by buildings or misclassified as birds [9]; thus, the identification and classification of drones is a challenging task [10].
One of the foremost features of a UAV's radar signal is the micro-Doppler signature [11]. Contrary to a stationary object, objects in motion generate modulated Doppler components referred to as the micro-Doppler signature, which is provided as additional components of the Doppler signature of the drone's fuselage. Furthermore, micro-Doppler signatures rely on the number of motors, motor speed, and orientation of the drone; therefore, by analyzing the micro-Doppler signatures, information pertaining to the drone can be obtained [12], [13].
In the present literature about radar target classification using micro-Doppler signatures, it has been demonstrated that these techniques are capable of providing a high classification accuracy. The most relevant are based on empirical mode decomposition [14], feature extraction [15], and log-Gabor filters [16], as well as singular value decomposition [17]. Nevertheless, the major drawback of the standard approaches is their low scalability.
Recently, deep neural networks (DNNs) have gained much attention in the radar community as a means of target classification [18], [19], [20], target detection [21], static object recognition [22], and automatic target recognition [23]. Furthermore, DNNs have also been utilized in micro-Dopplerbased systems for the classification of human activities [24], hand gestures [25], and drones [26]. It should be mentioned that although the use of DNNs, such as convolutional neural networks (CNNs), for target recognition or classification was presented as having a satisfactory scalability and a high classification accuracy, it requires big databases, and one of its fundamental challenges is that the available dataset might lead to results that are quite limited.
In this study, we introduce a model-based data augmentation method to produce a simulated dataset of radar micro-Doppler signatures that is a suitable training database for CNNs.
Data augmentation is one of the most efficient techniques for preventing model overfitting during the training of a network [27]. In data augmentation, new synthetic data samples are produced from existing data by combining different data vectors and nonlinear operations [28]. Despite the effectiveness of random data augmentation [28], the training of the network requires a large number of epochs. Furthermore, the final training relies on totally hidden features, which could be related to measurement nonidealities rather than to effective class characteristics. An example of this issue is the deep learning (DL) network aging phenomenon [29] in which DL network-based classifiers trained and continuously retrained on augmented datasets show a decrease in the classification accuracy as the training time increases. Such an effect is due to the variation of hidden features in the measurement scenario that are wrongly conisdered as actual class features during training. This causes the negation of the training memory, thus causing the DL network to be unable to make accurate predictions. By replacing the physical information source with a model dependent only on actual target-class properties, it is possible to build extensive datasets that express effective features, removing the dependence on real measurement chain properties.
This work aims to prove the effectiveness of data augmentation based on a UAV micro-Doppler model, relying only on ideal parameters derived by the deterministic description of the physical scenario under consideration. We demonstrate that this dataset augmentation method, combined with CNN classification, is particularly suited for UAV classification, where this latter considers radar images acquired by well-established methods (i.e., noise reduction, focusing, and clutter removing), and because these do not represent a critical aspect of this study, they are not treated in this work.
According to the literature [26], the research on machine learning (ML)-based UAV classification by radar considers several objectives: UAV presence, UAV versus bird recognition [30], multi-UAV presence [31], and UAV characterization in terms of number of rotors classification [32]. In this work, we selected this latter classification objective as a significant case study, although it could in principle be extended to other classes and other scenarios.
This article is organized as follows. In Section II, we discuss a comprehensive treatment of the proposed model-augmenting method, while a description of the CNN is provided in Section III. Finally, in Section IV, we provide a detailed independent validation of the approach by considering two different classes of UAVs in a controllable and repeatable environment.

II. MODEL-BASED AUGMENTATION METHODOLOGY
This section discusses the data augmentation method, which is based on a mechanical model of the micro-Doppler signatures that can be applied to the classification of UAVs.

A. Preliminary Concepts
It is well known that in automatic classification, a dataset for object classification should include observations of a large number of samples for each targeted class. Automatic classification relies on a classifier, which is defined as an unknown nonlinear function that takes as an argument a vector of sample points related to the features and outputs the predicted class; such a function is determined through an optimization process in the classifier training phase. On this basis, the DNN learns about meaningful features during training by identifying the similarities between measurements related to the same class. In order to avoid an ill-posed numerical problem that would lead to overfitting issues, a comprehensive set of linearly independent equations is needed [33]. However, when one is dealing with a high-dimensional feature space, the latter can only be provided when they are associated with an exhaustive training set, so a huge training dataset must be provided.
A measurement-based training dataset can benefit from data augmentation, leading to exhaustive classifier training while reducing the amount of experimental effort. Dataset augmentation can be performed using two different approaches. Augmentation in the data space is achieved by applying random transformations to input datasets to produce new samples. It is a straightforward process but might expose one to the risk of erasing meaningful features or enhancing features that are not related to the targeted classes. Augmentation in the feature space is achieved by performing raw feature identification and then combining only measurements that appear to be similar, i.e., near each other in the feature space [34].
Data augmentation with analytical models can produce a meaningful training dataset while reducing significantly the effort required for experimental dataset acquisition. In a typical scenario, as proposed here or in [35], this approach produces a multitude of synthetic data vectors using the mathematical generator function The generator function in (1) is defined as a nonlinear transformation between the C 1×N params space of scenario parameter vectors, or rather the vectors comprising all the parameters that describe the physical scenario as following the model definition, and the C 1×N pts space of measurement data vectors.
As in [35], the augmentation in the data space for the generic class "A" resembles the generation of synthetic data vectors s "A" (i ) 1×N pts with N pts samples according to the following equation: where n i, j are the perturbation vectors, x "A" is a set of scenario parameters for class "A", and d i f is the nonlinear random transformation function applied to the data vector. The perturbation vector introduces the effects of nonidealities into scenario n i,S and measurement chain n i,M , and both can be written as Following a feature-space augmentation approach, synthetic data vector generation for the class "A" data subset is obtained as follows: In (4), as opposed to (2), the dataset variability is given only by the perturbation vector n i,P applied to the vector of scenario parameters. Random transformations or the addition of noise to generated data vectors are expressly avoided, as they could introduce new features that are totally unrelated to the meaningful characteristics of the target. Note that noise added to generated data vectors does not show impulsive auto-correlation because of its limitation in a number of samples. By this, different synthesized vectors could show unexpected correlation due to the added noise term. In (4), the parameter perturbation vector is defined as and it models the variability of the parameter vector x within scenarios related to the same classes. In the case under investigation, this refers to the slight differences in the parameters between different UAVs in the same class. The amount of noise variance, σ P , applied to the vector of scenario parameters must be carefully chosen. It must be large enough to effectively model the expected physical scenario variability within a given target class. However, it must also be limited to avoid spreading dataset samples along the feature space, which can lead to misleading class specification.
The concept described above is schematized in Fig. 1 for a simple 2-D space of the scenario parameters x i and x j ; it relies on the theory of classifiers based on support vector machines (SVMs) [36]. In this example, target classes are related to specific relations between parameter vector terms. This leads to the definition of straight lines in 2-D spaces (or hyperplanes when dealing with higher dimensional spaces inside the feature space). SVM classification is achieved by assigning to each sample the class described by the hyperplane with the minimum distance to the target. Assuming that each provided sample is assigned to the correct class, during the training phase, the network coefficients are optimized to achieve a feature-space clustering with a clear separation between classes. Therefore, the training dataset should provide samples with clear distinctions between classes, thus imposing an upper limit on the standard deviation of the parameters σ P .

B. Model-Based Dataset Augmenter
Passafiume et al. [12] introduced a fully deterministic model for the micro-Doppler signatures of flying UAVs in frequency-modulated continuous-wave (FMCW) radar echo signals (6). Such a model considers the effects of mechanical vibrations on the range distance and thus introduces a varying term due to vibrations into the range argument alongside the Doppler effects on the radar signal. By this model, the micro-Doppler effect is included in the single IF signal acquisition, given the mandatory condition that the period of range fluctuations are way lower than the time window of the single radar observation [12]. The model depends on the target's physical parameters, such as the expected radar cross section (RCS) and the mechanical and vibration frequencies of each UAV engine. In this work, we adopt such a model as a data augmenter function within a feature-space model-based dataset, as discussed in Section II-A. When the analytical model is applied to a four-rotor UAV (i.e., a quadcopter), the FMCW radar signal facing the quadcopter on boresight is given as where R 0 is the distance to the quadcopter, i is the expected RCS related to the specific kth engine, and A k , ω k , and φ k are the vibrational parameters related to each i th engine. In this model, we assume that the UAV body's RCS is negligible with respect to those associated with the rotors. From a radar signal point of view, each engine is defined by an ensemble of 13 parameters, i.e., its RCS plus four sets of amplitude, frequency, and phase values. Each set is related to a different vibrational component carried by the engine according to the mechanical model applied in [12].
The model-based dataset augmenter should completely reproduce the data-gathering process, as shown in Fig. 2; thus, the physical model output, as defined by the s IF (t) function of (6), is subjected to the same vector transformations that are expected to be caused by the FMCW radar signal processing.
Therefore, defining x as the scenario parameter vector containing all the parametric terms of the function model s IF (t), the final augmenter (generator) function f(x) related to this specific data-gathering chain is defined as According to [12], placing the reference system as shown in Fig. 3 and considering that the FMCW radar is placed on the top or bottom of the UAV, for each engine, only the microrotations on the pitch and roll axes can be considered.
The projections of these movements on the radar line of sight lead to four different vibrational micro-Doppler terms: ω α , which is related to roll axis vibrations; ω β , which is related to pitch axis vibrations; and ω α + ω β and ω α − ω β , which mix components of both types of vibrations. Thus, the parameter subvector for each engine is defined as retaining the information related to all the UAV vibrational components. The overall vector has the following form: comprising N eng × 13 scalar values, and x can be considered a short representation of each class within a first-approximation feature space.
Despite the fact that we can associate the mixing terms to the roll and pitch components, we can optimize them to consider higher order phenomena that are not considered by the basic mechanical model [12]. The parameter vector for each engine is rewritten as thus introducing as a target class parameter the skew of the mixing terms with respect to the expected mixing term frequencies.
According to (4), data augmentation for each class "A", "B", . . . is provided by spreading datasets around their related model parameter vectors x "A" , x "B" , . . . Such vectors could be initialized using an initial fit of the scenario model to one sample measurement per class. Note that the parameter vectors defined in (8) and (9) are highly heterogeneous; thus, to introduce a balanced augmentation over the entire set of parameters, the standard deviation of the noise added to each vector term needs to be proportional to the value of the term itself. Hence, for each i th synthetic vector of the class "A" data subset, the related parameter perturbation vector defined in (5) is redefined as . . .
where x " A"n = x "A"n is equal to the i th term of the class "A" model parameter vector, and σ P is a single value representing the relative amount of spreading introduced to every parameter. Fig. 4 shows the expected probability density function for every scenario parameter.

C. Training Method
Following the discussion in Section II-B, several augmented datasets are produced for different values of σ P , generating N different augmented vectors for each class. Each subset of data is produced by applying the generator function in (4), where the class-related reference vector is described in (10) and the perturbation vector is defined in (12).
The optimum σ P parameter should be estimated to avoid overfitting issues due to the possible lack of augmentation effects and to prevent the risk of class misspecification due to excess dispersion into the feature space, as shown in Fig. 1. Identifying the optimum σ P parameter is a step related to the definition of the training dataset; thus, an appropriate approach consists of introducing the optimization of this parameter during the testing phase of the neural network.
The first fit for the reference vectors [cf. (9)] related to each class, namely, "QUAD" for the quadcopter (reference vector x "QUAD" ) and "HELI" for the single-engine UAV (reference vector x "HELI" ), is achieved over one generic measured sample for both classes. The risk of overfitting due to the choice of only one measurement for the definition of the model parameters is expected to be overcome by applying the proposed augmentation method based on the perturbation of the parameters.
A flowchart showing the training process for performing UAV classification is shown in Fig. 5. The proposed method performs the neural network training using all the different augmented datasets, which are generated with different σ P values. After the training phase, each network is tested on the limited measure dataset and the accuracy on the test set is evaluated for each σ P . The entire training and testing process is completed for three different ML algorithms: the SVM, gradient booster (XGBoost), and naive Bayes (NB). The dispersion parameter that achieves the highest classification accuracy is considered the best dispersion parameter. At the completion of the procedure, the best value of σ P is given and can be used to produce wider augmented datasets, which are then used for the training of the more complex CNN.

D. Radar Data Acquisition
The entire processing flow discussed in Section II-C works on measurement data generated by the radar data acquisition block described in detail in Fig. 6. As expressed by (6), the micro-Doppler signature model introduces the effect of mechanical vibrations only with respect to the range distance, ignoring the spreading over different scattering angles. As a result, each synthetic measurement consists of a vector representing a range profile.
Dealing with a multiple-input-multiple-output (MIMO) radar system means that each radar acquisition provides a 2-D radar image I(r, ϑ) defined in the range and direction-ofsignal-arrival domains. Considering an FMCW radar system with n receiving antennas, each data acquisition leads to a set of n different baseband time readings s IFn (t). The radar-angle image is then calculated as where the inner FFT{.} is calculated with respect to time for each antenna trace, mapping the frequency domain to range applying the transform where c 0 is the speed of light, μ is the FMCW radar ramp coefficient, and f is the spectrum frequency domain. The outer FFT{.} is calculated on the orthogonal domain for the different antenna traces, thus achieving the angular domain, as described in [37]. The micro-Doppler behavior is then extracted by applying a moving-target indicator (MTI) filter [38]; thus, the radar image is calculated as Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where N MTI is the MTI frame depth and I −i (r, ϑ) is the i th frame in the stack of the last N MTI frames. Due to the actual limitations of the micro-Doppler model, in order to obtain a measurement trace that is suitable for target classification, a single-dimension range profile must be extracted from each radar image after the application of the MTI filter. Consideringθ as the estimated angular direction for the UAV target, the data trace is given as where ϑ is an approximation of the expected spreading of peaks around the signal peak direction. For the experiments provided hereinafter, we consider ϑ = 7 • . In order to produce data suitable as input of the UAV classification network, the range profile, after the application of the MTI algorithm, is passed through a target signature extraction block. The latter extracts, from the overall MTI-filtered range profile data vector, all the vector slices that could be related to a UAV kind of target. The extraction works taking a slice of 400 samples around each different peak throughout the range profile vector. As a result of the MTI application, the peaks result positioned in correspondence of targets showing high-frequency vibrations, as introduced in [12]. Each slice corresponds to a possible signature for a related UAV target and the length of 400 samples is dependent on the radar hardware parameters shown in Table I and [12]. These slices are the actual input for the DL classification network, and the latter is assumed to be able to identify the UAV target class based on them.
The application of MTI together with the extraction block allows to extract, from the radar image, only the signatures of interest for the actual classification application, overcoming the presence of artifacts as clutter or other scenario nonidealities. In this work, the signatures not related to the UAV targets are filtered out by the MTI algorithm, which removes targets with low vibrational frequencies [38], [39].
Note that the actual work is about UAV target classification, and thus, the proposed classification networks are supposed to deal with an input dataset that comprises only the signatures that must be classified. Considering the generality of radar systems and application scenarios, the classification is supposed to be applied at the highest level of postprocessing after the entire dataset acquisition chain shown in Fig. 6.

III. DEEP NEURAL NETWORKS
The architecture of a DNN depends significantly upon the complexity of the problem under consideration; it is related to how each layer is implemented, as well as the computational method used in each layer. The most widely used DL approaches are CNNs [40], recurrent neural networks (RNNs) [41], deep belief networks (DBNs) [42], and the recently introduced approach of generative adversarial networks (GANs) [43]. However, the most popular supervised DNNs for image processing, object recognition, image formation, and classification are based on CNN architectures [44]. The layers that make up a CNN and proposed CNN architectures will be discussed in detail in the following.

A. CNN Architecture
Generally, CNNs are made up of several essential layers for feature extraction. A convolutional layer, an activation function, a pooling layer, a fully connected (FC) layer, and a classification module are the main components of each CNN. Each component is briefly described as follows.
1) Convolutional Layer: A convolutional layer is the primary component of a CNN. It is like a filter bank that can be repeated numerous times. Suppose that the input array of the model is a tensor with a size of (C1 × W 1 × H 1) in which W1 is the width, H1 is the height, and C1 is the number of input channels. In the first convolutional layer, a small square filter with a size of (C1 × K × K ) in which the number of channels is equal to the number of channels of the input tensor is applied to the input tensor, and using elementwise addition and multiplication, an output tensor with a size of (1 × W 2 × H 2) is created. In each convolutional layer, there are multiple filters (M) and corresponding output tensors (1 × W 2 × H 2), which are added together to create the final feature map (M × W 2 × H 2) of that layer. Compared to FC layers, convolutional layers have fewer parameters to train. In these kinds of architectures, there are some unchangeable hyperparameters in the training process, namely, the size of each filter (K ), the number of filters in each layer (M), and stride [44].
2) Activation Function: An activation function is a critical part of designing a neural network. It is a mathematical function applied elementwise to the outputs of convolutional layers to provide nonlinearity to the network. Note that the nonlinearity property enables the network to be made deeper. Otherwise, the network would have only a single layer, regardless of the complexity of the architecture. The rectified linear unit (ReLU), defined as y = max(0, x), is the most common activation function used in CNNs for hidden layers. In contrast, the activation function for the output layer depends on the type of prediction problem being considered [44], [45], [46].
3) Pooling Layer: A pooling layer, commonly used after convolutional and nonlinear layers, is applied to reduce the dimensions of each feature map. Pooling is accomplished by sliding a window (typically a square window) across each feature map and extracting only one value from each window. The number of channels will not be modified after pooling since downsampling is only performed on the width and height dimensions. Average pooling and max pooling are the two standard pooling methods that are most used in CNNs. Furthermore, pooling helps in making the representation nearly invariant to slight changes in the input.

4) FC Layer:
The last pooling/convolutional layer's output is passed into an FC layer. An FC layer is responsible for learning nonlinear combinations of prior high-level information, and one or more FC layers can be utilized sequentially. Furthermore, all the inputs of this layer are attached to its output elements, as in a normal multilayer perceptron neural network.

5) Classification Module:
The last unit in the CNN is the classification layer, which is applied to transform the output of the last FC layer into the probability that the input belongs to a given class. For multiclass classification, the softmax classifier [47] with cross-entropy loss [48] is extensively used.

B. Proposed CNN Architecture
The proposed CNN architecture is shown in Fig. 7. The model consists of the input data, two convolutional layers, two ReLU functions, two pooling layers, and two FC layers used as a classifier. The input data generated as described in Section II-B consist of a (400 × 1) frequency-domain vector. It represents 400 samples of range profile from the output of the dataset acquisition chain shown in Fig. 6.
As shown in Fig. 7, before the convolution operation, the input is reshaped to (1 × 20 × 20); then, it is introduced into the network. Feature extraction is performed by using two convolutional layers, where the first convolutional layer was configured with K = 32 filters and the next layer was configured with K = 64 filters. Furthermore, to reduce the computational complexity, two max-and average-pooling layers with pooling sizes of (2 × 2) and (7 × 7), respectively, were applied after the convolutional layers.
To formulate a nonlinear model, the ReLU was chosen as the activation function where x is the convolutional layer output. Eventually, after the pooling layer, a flattening layer [49] is deployed to transform the 1-D input matrix into a vector. Then, all feature vectors x are given to the FC layers, which perform the following operation: where c f is the output vector of the FC layer, which is calculated by multiplying the input feature vector x, and w f , which is the FC layer weight matrix, and adding b f , which is the bias of w f . Finally, a softmax activation function and a classification module are applied to finalize the network. The softmax function is applied to the last FC layer, thereby normalizing the output of the FC layer between 0 and 1. The softmax transfer function iŝ whereŷ is the result of the softmax function. Finally, we performed class generalization by calculating the maximum output of the softmax algorithm. Furthermore, the proposed model employs an Adam optimizer [50] to make use of the adaptive learning rate's (LR) power for each parameter. In addition, we chose a cross-entropy loss function to reduce the loss (L) of the CNN whereŷ i is the class predicted by the CNN, y i is the label of the input data, and m denotes the number of classes. The parameter m is equal to 2 because there are two different classes in the UAV classification: "QUAD" and "HELI."

IV. EVALUATION AND RESULTS
This section discusses the experimental validation of the method described in Section III using a commercial millimeter-wave radar open platform and two classes of commercial UAVs.

A. Dataset Generation 1) Experiment-Based Dataset:
The measurement campaign was accomplished in a semi-anechoic scenario (Fig. 8). The adopted radar is a compact commercial open platform [51], implementing a 77-GHz FMCW MIMO radar system, with eight receiving and four transmitting antennas. The system is based on a commercial chipset, allowing a certain degree of flexibility on the most relevant radar parameters (e.g., chirp parameters, carrier, transmitter power, and analog IF path parameters). The receive antennas involve serial fed patch antennas with eight elements, while the transmitting ones use two serial fed patch antennas that are fed from the differential output signal of the power amplifier; with the latter capable to provide 10 dBm of output power at 77 GHz. The reference oscillator is capable of generating a chirp signal with a maximum frequency range of 4 GHz, from 75 to 79 GHz. This carrier frequency was selected due to the potential imaging-enabling capability due to the short wavelength, and the availability of commercial platforms normally adopted for transport and traffic telematics and radio-determination applications in vehicular communications. The radar platform is controlled at a higher level by a field-programmable gate array (FPGA), which controls the switching between transmitters and the chirp, which is capable to carry on the basic signal processing operations (e.g., sample rate reduction). The highest level of platform control is actuated by a microcontroller unit, which allows a fast connection to a remote processor unit through which the baseband RX channel signals can be acquired coherently, while the postprocessing of the radar image was carried out using a MATLAB code. Each receiver path is equipped with a single mixer device, as is common in FMCW radar architectures [37], [52]. The mixer is followed by the six-channel Analog Devices AD8283 integrated radar receive path, which is equipped with a corresponding number of analog front end and a 12-bit analog-to-digital converter capable of a maximum sampling frequency of 72 MSPS [53]. The FMCW MIMO radar parameters adopted in this study are presented in Table I.
Two different kinds of UAV targets were used for measurements; they are both shown in Fig. 9. The drone targets used for the new measurements dataset belong to the same classes of reference dataset presented in [12], leading to the   Table II, but they have different mechanical specifications than the ones previously involved. This aspect is of fundamental importance because the objective of actual work is to provide by augmentation a training dataset able to instruct a DNN to recognize targets belonging to the same classes of reference ones despite having totally different specifications.
For the single-engine UAV (class "HELI"), we selected a small plastic helicopter with body dimensions of 10 × 30 × 20 cm, while for the quadcopter UAV (class "QUAD"), a DJI Mavic Mini 2 drone [54] with body dimensions of 24.5 × 29 × 5.5 cm was adopted.
As shown in [12] and briefly discussed in Section II-D, an MTI algorithm [55] with a depth of 10 frames was used to extract the micro-Doppler signatures and remove the clutter from the acquired measurements. For each measurement, the range domain was sampled over 4096 points, but to lighten the dataset, only 400 points (or rather, features) centered with respect to the actual target range were utilized. For each target class, 45 measurements were taken for various yaw orientations (Fig. 3). Fig. 10 shows two example radar acquisitions for each target class before and after the application of the MTI filtering algorithm and after the extraction of the range profiles; being these latter evaluated as in Section II-D.
2) Synthetic Dataset: The model introduced in (6) does not consider the angular information provided by the MIMO array. UAV classification is done by evaluating only the range profiles on the line of sight between the radar system and the target. Several datasets comprising 32000 synthetic measurements of 400 features each, equal to the number of points for each real measurement, are produced by considering various standard deviation parameters (i.e., σ P = [0, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.40, 0.50, 0.60]) for the terms n "QUAD" i,P and n "HELI"i,P in (12) in the data generator function shown in (4). For each class, the model parameter vectors x "HELI" and x "QUAD" , as shown in (4), are those proposed in [12] and shown in Table II. These vectors were obtained by fitting the model parameters to one sample measurement for each target class [12].
It is noteworthy that the UAV models considered in previous measurements are different from those involved in this work. The use of the model coefficients related to different UAVs, similar only within a given class, further demonstrates the methodology's effectiveness, showing that a model-based augmenter can easily cover measurement scenarios that were not considered initially.  Fig. 11 shows several measurements related to each class of the actual measured dataset; they are compared with one measurement from the other class aligned to the target range distance and with the synthetic data vector produced using the model parameter vectors of Table II [12]. It is clear that the synthetic model is able to reproduce features related to different classes that are expressed through different trace behaviors found on both sides of the maximum peak identifying the main body of the drone target.

B. Performance Metrics for Classification Problems
Following the model implementation and the feature extraction, as well as the class estimation, the next step consists of determining the effectiveness of the model-based dataset generation method using some statistical indicator. The metrics employed to assess the DL model are important since they determine how the DL network performs. Nevertheless, various performance metrics are used to evaluate different learning algorithms. For the sake of this study, we focus on the metrics that are relevant to classification problems. Among the classification performance metrics, we employ the confusion matrix (CM), accuracy, and precision [56].

1) Confusion Matrix:
A CM is composed of predicted and actual classification information that is presented in a specific table layout. During the predictive analysis, a square CM with positive and negative rates is constructed (including both true and false outcomes). Fig. 12 shows an example of a CM for a binary classification, where TP stands for true positive and indicates that the model correctly identified the class to which a sample belongs. FP stands for false positive and indicates that the model predicted that a sample belongs to a class that it does not belong to. FN stands for false negative and indicates that the model predicted that a sample does not belong to the class that it actually does belong to. Finally, TN stands for true negative and indicates that the model correctly predicted that a sample does not belong to a class that it actually does not belong to. An appropriate model will have high values for the principal diagonal elements, TP and TN, and low values for the off-diagonal elements.
2) Accuracy: The accuracy of the classification problem is the ratio of the number of correct predictions made by the model to the total number of predictions made by the model. The accuracy (acc) is given by 3) Precision: The precision describes how many samples were correctly classified in each class, and it is calculated as

C. Experimental Results
In this section, we prove the effectiveness of the proposed methodology by presenting the classification accuracy obtained by the data augmentation method using the proposed model-based augmenter versus the classification accuracy of a typical random signal processing augmentation method.
First, the best σ P value, leading to the highest classification accuracy, should be chosen in order to obtain the optimal model-based dataset. Since a large number of tests involving network training and testing should be carried out over several σ P values, the ML algorithms, due to their reduced computational cost and lower computation time, are preferred over CNNs. Then, the final classification accuracy is further improved by applying the proposed CNN architecture. The trained network was assessed on three distinct datasets referred as it follows. The "Old" refers to the dataset obtained in [12] that includes 30 measured signatures for the UAVs mentioned in [12], the "New" refers to the measurement dataset described in Section IV-A1, which consists of 136 measured signatures for the UAVs considered in this article, and "Mix" dataset is generated by merging the "Old" and "New" datasets. Table III shows the class distribution between measurements for each dataset.
With augmentation, trained CNNs are expected to be capable of correctly classifying new measurements not related to those used to extract model parameters. Therefore, the proposed approach should provide the highest level of accuracy for the "New" dataset choosing higher values for the σ P parameter since the objective is to augment the training dataset to provide a reliable dataset that is not related to the "Old" measurements reducing the risk of overfitting. Table IV provides the classification accuracy results for these three test datasets, and Fig. 13 shows the accuracy trends with respect to σ P parameter values. Due to space limitation considerations, only a limited set of σ P values are reported in Table IV. Due to the different number of measurements considered between the "Old" and "New" datasets, as shown in Table III, the accuracy achieved for the "Mix" dataset results to be equal to the weighted averages of the accuracies for each single dataset. The relative weights for each dataset are listed in Table III.  Table IV and Fig. 13 reveal that σ P = 0.20 leads to the highest accuracy on the dataset of new measurements compared to the other σ P values. Therefore, we chose σ P = 0.20 to create our best possible input dataset of 32 000 samples, each of which has 400 features (i.e., range profile sample points). In Fig. 13, it is noteworthy that the maximum accuracy is achieved within a specific interval of σ P , i.e., between 0.1 and 0.2, and in the same interval, the accuracy is maximized for all the measurement datasets. Such behavior is expected, as discussed in Section II, because the spreading of model parameters by the perturbation vector provides a reliable augmentation until relative ratios between features for each sample are preserved. Furthermore, the provided results prove the effectiveness of the proposed methodology for achieving training datasets extended by augmentation, reliable enough to cover measurement scenarios not initially considered.
2) Proposed CNN Performance Results: Next, we initiated the training of the proposed CNN (cf. Section III-B) using the optimal input dataset produced in the previous step. In the training procedure, we first split the 32 000-sample dataset into three different parts: 1) training dataset; 2) validation dataset; and 3) test dataset, which contains 80%, 10%, and 10% of the samples, respectively. The model is then trained with a backpropagation algorithm and an LR of LR = 0.001 for 50 epochs. Finally, the best model, chosen using the validation dataset, is tested on the real measurement dataset from Section IV-A1. Fig. 14 shows the predictive performance of the proposed CNN model for UAV classification using the real measurement dataset. The columns in the matrix presented in Fig. 14 represent the predicted classes, while the rows represent the actual classes. Considering Fig. 14 and based on the precision, defined in (22), it is obvious that in the proposed network, 78% of the samples taken from the quadcopter are correctly classified as "QUAD"-class samples, while 80% of the samples taken from the helicopter are correctly classified as "HELI"class samples. Accordingly, the total network classification accuracy, as defined in (21), is equal to 78.68%.
Next, to assess the effectiveness of the proposed augmentation method, the CNN model is retrained following the procedure described previously for 100 epochs using the experimental dataset adopted in [12], which uses the random signal processing augmenter. The process of data augmentation involves transforming existing data to increase the number of samples without collecting new data. In order to accomplish this, we used the "tsaug" signal processing augmentation [57]. The "tsaug" is a Python package that augments signal data, offering a variety of augmentation methods: in this study, drift,  pool, and reversing are used to augment the signal, as they maintain the input sample size [57]. Fig. 15 shows the predictive performance of the CNN model for UAV classification on the real measurement dataset described in Section IV-A1. According to Fig. 15 and (22), 90% of the samples taken from the quadcopter are correctly classified as "QUAD"-class samples, while 15% of the samples taken from the helicopter are correctly classified as "HELI"-class samples. Accordingly, the total network acc is equal to 66.18%; however, the classifier is highly biased toward the "QUAD" class. Table V shows the classification precision achieved by the different networks: one trained with the conventional augmented dataset and one trained with dataset built through the proposed deterministic augmenter for two different σ P values. It is clear that despite achieving a similar accuracy, the network trained with the deterministic augmented dataset exhibits a far lower classification bias. We can thus draw the conclusion that conventional augmenter leads to the unreliable training dataset, even for the basic case of the two UAV classes classification considered in this work.

V. CONCLUSION
This article introduced a new data augmentation methodology based on a deterministic model of a physical scenario. This work specifically addressed the classification of UAV targets according to the number of motors using the micro-Doppler radar signatures collected by a millimeter-wave FMCW radar system.
The proposed method successfully trained a CNN for UAV classification without requiring a measurement session to generate a training dataset. The initial model parameters were obtained through a fitting procedure that took into account measurement data collected in a different scenario with UAV targets different from those considered in this work; however, these UAV targets did belong to the same classes as the targets considered in this work. A perturbation vector is introduced on vector of model parameters achieved by fitting to reduce the dependence of the augmented dataset on reference measurements, allowing the latter to cover the new targets involved in this study.
In contrast to typical random signal processing augmentation methods, the augmented dataset produced with the deterministic augmenter provides the variability required to perform successful training while avoiding the risk of overfitting and preserving physically significant features.
By the proposed methodology, a CNN trained on a synthetic dataset produced by the deterministic augmenter introduced in this work achieved a classification accuracy of 78.68% with a minimum classification precision of 78.02%. In comparison, the same network trained on a dataset built using the standard augmentation methods obtained a classification accuracy of 66.18% with a far lower minimum precision of 15.55%, thus showing a dramatic classification bias, which makes such network unusable also in the case of the fundamental classification objective considered in this work.
The proposed deterministic augmenter is based on a simple and scalable physical model of the backscattering for the expected radar targets. The achieved results have shown that despite the simplicity of the model, the latter was able to provide a meaningful of informative features for the training, as demonstrated by the lowering of the classification bias for the CNN network trained by deterministic augmenter.
Therefore, it can be expected that increasing the model complexity and introducing more parameters, i.e., extending the model to provide range profiles collected from different receiving antennas, it could be possible to produce large training datasets covering different kinds of UAVs, by specifying details about their own geometries. Following this, it could be considered to train future DL networks to provide more detailed information about the identified UAV targets, without the need of large measurement campaigns for training involving several UAV specimens.
The same approach in principle could be extended to other application scenarios, as human recognition [58], or medical imaging [59], where the collection of a large training dataset would require a significant amount of time and resources. Neda Rojhani (Member, IEEE) received the Ph.D. degree in electronics and electromagnetism engineering from the University of Florence, Florence, Italy, in 2019.
She currently works with the Department of Information Engineering, University of Calgary, Calgary, AB, Canada. She is particularly interested in artificial intelligence, machine learning, deep learning, and optimization algorithms, with the objective of enabling everyone able to take advantage of potentialities given by new emerging technologies. Her research interests include ground-based synthetic aperture radar (GB-SAR), multi-input-multioutput radar (MIMO), synthetic aperture radar interferometry (InSAR), ground-penetrating radar (GPR), compressive sensing, and Cramer-Rao bound, as well as antenna design and antenna theory. He is currently working at the Microelectronics and HF Research Group, University of Florence. His research activities cover developing embedded systems and digital systems for communication (also working with private institutes, such as Autostrade Tech., Florence, for development of new infomobility systems), highly integrated System on Chip and System in Package (SoC/SiP), and human interface devices. Actually, he is studying deeply new development technologies based on field-programmable gate arrays (FPGAs) and he covers in depth various aspects related to computer science evolution (artificial intelligence, and data and signal processing) and digital communications/software-defined radios. His scientific interests also cover the area of data elaboration methodologies with particular attention to algorithmic development. He is particularly interested in human-interface device (HID) development and artificial intelligence applied to biomedics, with the goal of making everyone able to take advantage of potentialities given by new emerging embedded and computational technologies.
Dr. Passafiume was a recipient of the IEEE Microwave Magazine 2016 Best Paper Award.
Mehdi Sadeghibakhi received the master's degree in computer engineering from the Ferdowsi University of Mashhad, Mashhad, Iran, in 2021.
His research interests include computer vision, image processing, deep learning, machine learning, and medical image processing. He is also particularly interested in explainable artificial intelligence (XAI) and data analysis in the field of healthcare.