Automatic Digital Modulation Recognition Based on Genetic-Algorithm-Optimized Machine Learning Models

Recognition of the modulation scheme is the intermediate step between signal detection and demodulation of the received signal in communication networks. Automatic modulation recognition (AMR) plays a central role in many applications, especially in the military and security sectors. In general, several properties of the received signal are extracted and employed for AMR. Selecting the appropriate features has a significant impact on increasing the efficiency of AMR. In this paper, we implement and compare digital modulation recognition via multi-layer perceptrons (MLP), radial basis function (RBF), adaptive neuro-fuzzy inference system (ANFIS), decision tree (DT), and naïve Bayes (NB) algorithms. In addition, the optimal parameters of each model are obtained by utilizing a genetic algorithm (GA). A series of studies are conducted in this work in order to determine the efficiency of different algorithms in identifying modulated signals with commonly used digital modulations. Numerous computer simulations are performed in the presence of additive white Gaussian noise (AWGN) with a signal-to-noise ratio (SNR) ranging from -10 dB to 30 dB. The simulation results and comparisons with previous studies demonstrate that applying the proposed algorithms along with the selected features leads to a significant enhancement in the accuracy and speed of the automatic determination of the digital modulation types at low SNRs. In addition, the convergence rates of the models are enhanced.


I. INTRODUCTION
A UTOMATIC modulation recognition (AMR) is crucial for detecting and demodulating a telecommunication signal [1], [2]. AMR has attracted considerable attention owing to its wide range of applications. AMR is employed in electronic warfare (EW) systems as a source of information for detecting and disrupting threats [3]. Civilian applications include software-defined radio, frequency management, transmitter monitoring, and network traffic administration [4]- [7].
Two primary phases are considered in AMR: initial preprocessing of the input signal and selection of the classifier system. There are two approaches to performing AMR. One approach is decision-theoretic, and the other is statistical pattern recognition (i.e., feature-based pattern recognition) [8]. The former entails high computational complexity and suffers from the ambiguity of the parameters. In the second approach, certain features of the signal are extracted first. Then decisions are made based on the extracted characteristics. The latter is less complex compared to the first approach. Therefore, it is more convenient to implement the second approach in practical systems. In our work, we focus on the pattern recognition approach.
Numerous studies have proposed investigating different features to identify the types of modulation [9]. Hsue et al. [10] propose to utilize the frequency and phase histograms in VOLUME 1, 2022 addition to zero-crossing features and zero-crossing variance of the received signal for modulation recognition purposes. The authors in [11] leverage frequency and amplitude variances and phase difference histogram to distinguish various digital modulation types. The work in [12], [13] use wavelet transform and wavelet features empowered by neural networks to recognize several modulation schemes. The study in [14] uses the signal spectrum's fourth power and the mean of the signal envelope and variance characteristics to classify various digitally modulated signals. Moreover, other spectral characteristics (i.e., frequency-based features), statistical attributes, e.g., higher-order statistics [15]- [17], together with instantaneous phase, frequency, and amplitude [16], [18] are further examined for AMR. We utilize six attributes, including the spectral, temporal, and wavelet-based features, to distinguish seven different digital modulation techniques.
Machine learning and optimization algorithms are continuously employed to provide accurate and reliable AMR. The feature-based study in [19] exploits support vector machine (SVM) algorithm to recognize four different digital modulation schemes, including binary phase-shift keying (BPSK), 8-PSK, 4-ary amplitude-shift keying (4-ASK), and 16-ary quadrature amplitude modulation . The work achieves an accuracy of 100% by investigating the twodimensional asynchronous sampled in-phase-quadrature histograms (ASIQ) in the presence of additive white Gaussian noise (AWGN) with signal-to-noise ratio (SNRs) ranging from 0 dB to 35 dB. Genetic algorithm (GA) is one of the most well-known evolutionary algorithms used together with other machine learning models to further enhance AMR efficiency. Artificial neural networks (ANNs) and GA are used in [20] to distinguish various digital modulation techniques. An augmented genetic programming (GP) and the k-nearest neighbor (KNN) algorithm are employed in [21] to classify digital modulations. Zhang et al.,in [22], propose GP to generate features utilized by the KNN algorithm for multi-class modulation classification. Most implementations are based on neural network (NN) methods and GAs [21]- [24]. Almohamad et al., in [25], utilize the SVM model to classify nine modulation types, including BPSK, QPSK, 8-PSK, BASK, 4-ASK, 4-QAM, 16-QAM, 32-QAM, and 64-QAM over AWGN and Rayleigh fading channels within a wide range of SNR values, i.e., 0 dB through 35 dB. Their proposed model simultaneously performs AMR and estimates SNRs values by exploring two-dimensional asynchronously sampled inphase-quadrature amplitudes' histograms (2D-ASIQHs).
On the other hand, we see many efforts in recent work to employ deep learning methods in AMR [2]. In particular, the study in [26] proposes a novel technique based on the InceptionResNetV2 network with transfer adaptation to distinguish between three types of phase-shift keying (PSK) modulation, including BPSK, quadrature phase-shift-keying (QPSK), and 8-PSK. An average accuracy of 75.99% is achieved at SNR = 1 dB. The work in [27] achieves 99.00% accuracy at low SNRs by employing a deep neural network (DNN) to extract different features of each modulation type by learning different cumulant combinations of ASK, frequency-shift keying (FSK), and PSK modulation schemes.
A long short-term memory network (LSTM) and a deep convolutional neural network (DCNN) are utilized in [28] to form an AMR system. The authors substitute the inphase/quadrature (I/Q) information by exploiting high-order statistics (HOS), i.e., I/Q and fourth-order cumulants (FOC) which result in an average accuracy of roughly 80.00% at SNR = 0 dB. Daldal et al. [29] designed an AMR system by employing CNN and the short-time Fourier transform (STFT) to recognize six distinct digital modulation schemes automatically. The system achieves an average accuracy of 99.19% for SNRs above 0 dB. A generalized CNN method is proposed in [30] to identify FSK, PSK, and QAM schemes robustly. The model is trained and utilized for both AWGN and Rayleigh fading channels.
Some of the existing AMR frameworks based on machine learning have been able to perform well in terms of accuracy. However, it has been observed that these models suffer from high computational complexity. The existing methods necessitate additional preprocessing procedures and possess more tuning parameters and classifiers, resulting in slower and larger hardware. In addition, their accuracy drops significantly at low SNRs. Consequently, their ability to generalize is limited. Therefore, our goal is to provide machine learningbased AMR frameworks with lower time complexity (i.e., the running time of the algorithms) and higher accuracy compared to existing methods, especially at low SNRs. In this paper, various digital modulations are identified and classified by employing multi-layer perceptrons (MLP), radial basis function (RBF), adaptive neuro-fuzzy inference system (ANFIS), decision tree (DT), and naïve Bayes (NB) algorithms. Additionally, a GA is employed to optimally select the tuning parameters to further enhance the proposed system. Our selected models result in significantly faster and more efficient AMR without compromising the accuracy.
In light of the above, the contributions of this paper are as follows.
• We leverage various machine learning algorithms to automatically recognize and classify different digital modulation schemes. • We propose a heuristic optimization method, i.e., GA, to optimize the tunable parameters of presented machine learning algorithms that are employed to classify various digital modulation techniques. • We investigate different features to classify different digital modulation techniques employing feature extraction-based approaches. • We examine the accuracy of the employed models and perform exact and comprehensive comparisons based on different criteria, including accuracy, complexity, and time. Our model achieves an accuracy of 100% at SNR of -2 dB with significantly lower complexity (i.e., the running time of the algorithm) compared to existing techniques. The obtained classification accuracy is justified by the utilization of 10-fold cross-validation.
The remainder of the paper is organized as follows. Section II introduces the system model and features exploited in this study. The machine learning algorithms employed for the AMR are presented in Section III. Simulation analyses are performed in Section IV. Finally, Section V provides a summary, conclusion, and avenues for future work.

II. SYSTEM MODEL AND SELECTED FEATURES
Without loss of generality, this study performs communication and signal transmission over the AWGN channel. The transmitter modulates the desired signals, which are then transmitted. After passing through the channel, the signal is mixed with white Gaussian noise. The transmitted signal enters the receiver block while no information is available from the sender. The input signal is modulated as follows [31]: where f c indicates the carrier frequency, ϕ c is the carrier phase, n(t) represents the AWGN, ands(t) designates the envelope of baseband signal. In (1b), a(t), f (t), and ϕ(t) represent the instantaneous amplitude, frequency, and phase of the signal, respectively. Fig. 1 illustrates the modulation scheme classification system based on the pattern classification approach consisting of three subsystems. The preprocessing sub-block, upon the arrival of the intercepted modulated signal, prepares the  received signal for the succeeding sub-block. The preprocessing operations include filtering to diminish the noise level, median filtering, estimating symbol length, signal power (i.e., SNR), carrier frequency, and balancing the received modulated signal. Besides, the instantaneous amplitude, frequency, and phase extraction are other parts of preprocessing framework. The processing operations enhance AMR performance. The selection of each task involved in preprocessing stage depends on the classification process.
One of the most critical tasks and perhaps the challenges when employing machine learning for AMR is selecting appropriate features. Equipping the system with suitable features allows us to identify and separate various modulation schemes accurately. In addition, given the instantaneous operation of AMR, it is essential to utilize features that significantly improve the operating speed of the model. It is crucial to use features that are robust against different signal and channel conditions such as SNR, frequency, etc. We employ the features of [32] and [9] to reinforce the presented machine learning algorithms for AMR to achieve superior robustness against low SNRs. Previous studies demonstrate that the selected features are among the most authentic attributes employed in the existing robust modulation recognition systems [8], [9]. In the following, we briefly examine the selected features and provide the mathematical expression of each.
The second-order moment of the non-linear component of the instantaneous phase is the first feature that we utilize in our AMR framework. The exact mathematical formula of this attribute is stated by [8] where N s is the number of symbols, φ NL (i) indicates the normalized-center non-linear component of the instantaneous phase, φ(i) denotes the instantaneous phase, and φ represents the mean phase. The instantaneous phase of the signals modulated by ASK contains no information, which causes the M φ NL values computed for the ASK modulated signals to be the lowest compared to that of other modulation schemes. Consequently, the intended trait accurately differentiates binary amplitude-shift keying (BASK) and 4-ASK modulations from QAM, PSK, and FSK modulations. The spectrum-based feature is designated as the second characteristic employed in this work and is expressed by [8] where Z(i) represents the discrete-time Fourier transform (DTFT) of the received signal. This feature can accurately distinguish ASK and QAM modulations from other digital modulations schemes, including PSK and FSK modulations that possess no amplitude-related information.

VOLUME 1, 2022
The third attribute used in the proposed AMR system is the mean value of the power spectral density of the normalizedcentered instantaneous amplitude of the intercepted signal segment, which is defined as [8] where A cn denotes the DTFT of the normalized-centered instantaneous amplitude, a cn (i) denotes the normalizedcentered instantaneous amplitude, and m a represents the mean value of instantaneous amplitude. The proposed AMR models of this study utilize this trait to detach 4-ASK modulation from BASK. As our fourth feature, we select the standard deviation of the normalized-centered non-linear component of the direct instantaneous phase, containing phase-related information. The exact mathematical expression exploited to extract the attribute is [8] The given characteristic is employed to distinguish PSK modulation from QPSK in hierarchical-based classifiers. The continuous wavelet transform (CWT) forms our fifth and sixth attributes. CWT is based on the time-frequency analysis/conversion and is as follows [32]: where τ is the transition parameter, s indicates the scale parameter, and ψ * (t) represents the complex conjugate of ψ(t). Accordingly, the correlation between the received signal's wavelet transform and the patterns stashed in the system is computed. The wavelet transform-related simulations exploit the Haar function. By comparing the computed CWT to BASK and BFSK templates, our system can distinguish between BPSK, QPSK, BFSK, and 4-FSK modulations. Table  1 summarizes the role of the six presented features. The effectiveness and roles of the selected features are illustrated in the simulation results in Section IV. In the following, machine learning models and algorithms used in this work are examined.

III. MACHINE LEARNING ALGORITHMS
As mentioned in previous sections, the main step in AMR is using machine learning-based classifier methods to accurately identify various modulation techniques. In our work, GA is utilized to acquire the optimal values of the tuning parameters of each method. The flowchart for the proposed model is illustrated in Fig. 2. Different artificial intelligence (AI) and machine learning [33] approaches based on ANNs and fuzzy logic (FL) systems are utilized for AMR. Trialand-error approaches consume a vast amount of time and do not guarantee to achieve the optimal values. The proposed models require different and unique chromosome coding according to their structures and parameters. We define the tuning parameters of each algorithm to be the genes forming the chromosomes. Our presented network architecture is flexible and can vary in contrast to previous studies in which the same topology is considered for all machine learning methods. The MLP, RBF, ANFIS, DT, and NB models are examined in the following subsections.

A. MULTI-LAYER PERCEPTRON (MLP) ALGORITHM
MLP is one of the well-known and most widely used feedforward subclasses of ANNs in which the learning process is accomplished by employing multilayers of neurons. MLP models utilize the error back-propagation (EBP) technique [34], one of the well-known training schemes utilized to maximize network accuracy and achieve a superior outcome. The exact mathematical expression for calculating the error  is

Type of Modulation
where p denotes the number of training data, M represents the number of output neurons, Z i is the actual output, and Y i indicates the model output. Fig. 3 depicts the overall structure of the MLP network. It is worth noting that the hidden layers are not constrained to a specific number.

B. RADIAL BASIS FUNCTION (RBF) NETWORK
The RBF algorithms are another subclass of ANNs that possess a structure with three connected layers using feedforward connections. The existing neurons in the hidden layers of an RBF network employ radial basis functions such as Gaussian functions as the activation function [35]. The multilayer structure of the RBF models is illustrated in Fig. 4. The mapping function that the RBF networks utilize is as follows [36]: where N represents the total number of hidden neurons, w j denotes the weight allocated to the j th node, H j (x) is the activation function of node j, x ∈ [x 1 , x 2 , . . . , x j ] represents the input given to the algorithm, c j denotes the center of the j th activation function, and σ j indicates the smoothing parameter.

C. ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM (ANFIS)
ANFIS is well recognized as an outstanding neuro-fuzzy model that concurrently utilizes FL and NN methods [37].
In fuzzy systems, the main features of the inference system and the tuning parameters have to be adjusted, which usually results in high computational complexity and is very timeconsuming. Therefore, NNs are employed to prevail over the existing complications by tuning the adjustable parameters. This cooperation of the NNs and FL forms the neuro-fuzzy systems. Fig. 5 shows the five-layer structure of the Takagi-Sugeno-based ANFIS algorithm. The ANFIS models acquire their actual outputs using the following formula: where n denotes the total number of nodes, w i represents the firing strengths of the rule layer, f i indicates the first order polynomial with consequence parameters set of {p i , q i , r i }, and w i is the output of the normalization layer.

D. DECISION TREE (DT) ALGORITHM
DT is one of the most widely used data mining algorithms [38]. The DT is used in problems that can be posed in such a way that they provide a single answer in the form of a group or class name. There are many algorithms for constructing DTs; one of the most popular methods in this field is Iterative Dichotomiser 3 (ID3) [33]. In this algorithm, the tree is constructed from top to bottom. Entropy (E) and information gain (IG) criteria (using entropy to calculate the desired criterion) are used to find the best feature. The entropy or uncertainty of one system is calculated as follows: where S is the set of instances, p + represents the positive samples in S, and p − indicates the negative samples in S. The IG or the expected reduction in entropy by splitting the data on one attribute is defined as where the set Values(A) contains all possible values for attribute A. The set S v is a subset of S whose characteristic A is equal to v. The general structure of the DT for a hypothetical data set containing two properties, A and B, is depicted in Fig. 6.

E. NAÏVE BAYES (NB) ALGORITHM
In machine learning, the NB algorithm is a group of simple classifiers based on probabilities. A simple Bayesian classifier can be considered as a model based on conditional probability. Suppose X = (x 1 , x 2 , . . . x n ) is a vector expressing n properties that are independent variables. Therefore, the probability of the occurrence of C k , i.e., p(C k |x 1 , x 2 , . . . x n ), can be represented as one of the states of various event classes for distinct k, as calculated below Now, if each variable is assumed to be independent of the other variables, provided that the C k category is independent, then the following equations are obtained: and An example of a two-dimensional data set is illustrated in Fig. 7. The presented methods possess various tuning parameters that significantly impact the convergence, accuracy, and speed of the AMR system. Different approaches exist to determine the proper values of the tuning parameters. However, it is worth mentioning that some of the utilized algorithms are single-parameter, and therefore there is no need to use optimization algorithms. Trial-and-error is the most primitive technique to adjust the tuning parameter. Nevertheless, employing various optimization methods such as GA [39] is more effective to obtain the appropriate values of the parameters. In the following subsection, various GA coding schemes are proposed and examined to determine the optimal parameters of each model.

F. GENETIC ALGORITHM (GA)
Optimization techniques consist of the two general categories of classical and modern approaches [40]. The classical optimization methods benefit from the damped least-squares (DLS) algorithm, while modern ones employ natural evolution processes. In addition, the former attains the local optimum, whereas the latter always aims to acquire the global optimum. As stated earlier, we employ GA in this study to optimize the tunable parameters of the presented machine learning algorithms. GA is a well-known modern method of optimization which Holland introduced in the 1970s [41]. Inspired by the principles of natural and gradual evolution (Darwin's theory), GA tries to discover an optimal solution in a vast searching space.
This algorithm conveys inherited traits through genes; however, these transitions of genes from one generation to another always face some variations. The crossover and mutation operations are two chief modifications that the algorithm applies to the genes and chromosomes. The presented GA leverages the crossover and mutation operators to merge and produce new chromosomes in order to achieve the global optimum [42], [43]. Additionally, a fitness (merit) function is defined to detect or acquire the best chromosome according to the requirements and demands. Fig. 8 illustrates the process flow and general structure of the GA. Providing proper coding to define chromosomes is a critical step in GA. The presented algorithms, in addition to the weight coefficients, possess other adjustable parameters, including the type of fuzzy inference system (FIS), the number of membership functions, and the number of epochs in the ANFIS algorithm, learning rate, the number of epochs (iteration), and the network topology for the MLP model, the number of neurons in the hidden layer, the spread value, the mean squared error (MSE) factor of the RBF network. Improper designation of the tunable parameters of these models directly affects their convergence rate and computational load; therefore, proper values must be detected and assigned to them.
In light of the above, we first design appropriate codings for each of the presented models separately. The chromosomes and their constituent genes are determined according to the adjustable parameters of machine learning models in the proposed codings. Next, the model evaluates each chromosome by assessing the accuracy obtained for the algorithm employed in the AMR system. Ultimately, the fittest chromosomes, i.e., optimal parameters of the algorithms, are acquired. It is worth noting that the length of the chromosomes for all the models is constant; however, the types of genes forming a chromosome differ from one to the other. Fig. 9 illustrates the genes and chromosomes defined for each of the presented models, i.e., the parameters that the GA intends to optimize. Once the chromosome type is defined, the algorithm proceeds to elect the primary population. If a small initial population is assigned for the system, the GA will be impotent to investigate the entire search space. In contrast, a large initial population decelerates the model. Consequently, we must select this quantity attentively. Given the restrictions associated with each component, the initial generation of the model is shaped by randomly generating 30 chromosomes [39]. The chromosomes of our GA have a structure similar to Fig. 9, the values of which for the first generation can be selected based on prior knowledge. As mentioned earlier, we randomly choose the initial values of the genes in this work. Random selection of the initial population empowers the algorithm to avoid getting trapped on a local optimum. Afterward, the GA reproduces the next generation by executing the crossover and mutation operations. Once the crossover and mutation processes are over, the model evaluates and picks out the best chromosomes among the offspring and parents in the selection phase to produce the next generation.
Our proposed algorithm sets the number of new descendants greater than the number of parents. The elitism strategy is employed in the selection stage to elect the fittest children and parents. This work utilizes the roulette wheel selection operator [44], which first calculates the fitness function value of each chromosome and then normalizes the computed results. The fitness function considered for each algorithm is the accuracy value that the algorithm calculates for each chromosome. The algorithm assigns a random number between zero and one to each chromosome and calculates the probability of selecting that specific chromosome (P i ). Eventually, the most appropriate chromosome is chosen. Applying the above technique facilitates the selection of the chromosomes with a greater fitness level. The exact mathematical formula for the corresponding chromosome P i value is where F i denotes the relevant fitness value of each chromosome, and n indicates the population number. We appoint appropriate values to the other two variables parameters: the VOLUME 1, 2022 occurrence probabilities of crossover and mutation operators (p c and p m ). The performance of the presented GA and the relative efficiency of the MLP, RBF, ANFIS, DT, NB algorithms in AMR is investigated.

IV. SIMULATION RESULTS
We perform several experimental studies in this section to evaluate the performance of the proposed methods. The simulations are executed in MATLAB 2020a equipped with NN toolboxes. The AWGN channel is considered for the communication system. Therefore, once the transmitter modulates and then conveys the signal through the channel, the transmitted signal is mixed with white Gaussian noise. The transmitted signal then arrives at the receiver block, where no prior information from the transmitter exists. Information regarding the proposed algorithms and utilized signal parameters is summarized in Table 2. The efficiency and performance of the proposed models are investigated by considering different criteria. We first examine the selected features and their role by performing various simulations in the following subsection.

A. SELECTED FEATURES
The employed features are briefly studied earlier in the system model and selected features in Section II. As illustrated in Fig. 10, the ASK-modulated signals obtain the lowest values after considering the second-order moment of the nonlinear component of the instantaneous phase (M φ NL ), i.e., our first feature, which helps to separate ASK modulations from other schemes. Fig. 11 depicts that the spectrum-based feature (σ 2 z ) can accurately distinguish ASK and QAM modulations from other digital modulations schemes, including PSK and FSK modulations that possess no amplitude-related information. The proposed AMR models of this study utilize the mean value of the power spectral density of the   normalized-centered instantaneous amplitude of the intercepted signal segment (γ) to detach 4-ASK modulations from BASK, as demonstrated in Fig. 12. As shown in Fig.  13, the standard deviation of the normalized-centered nonlinear component of the direct instantaneous phase (σ dp ) is employed to distinguish PSK modulation from QPSK in hierarchical-based classifiers. Figs. 14 and 15 illustrate how the CWT-based features empower our model to recognize BPSK, QPSK, BFSK, and 4-FSK modulations schemes.

B. EVALUATION AND COMPARISON METRICS 1) Accuracy
The accuracies of the presented models are calculated as where i is the SNR value, m indicates the number of iterations per SNR, n i represents the total number of testing   instances, and n k denotes the total number of correctly classified digital modulation schemes.

2) Speed Comparison and Evaluation
One of the essential evaluation criteria is the algorithm's speed in real-time applications, i.e., time complexity. Therefore, the computation times of the selected machine learning models are reported to provide a detailed comparison.

3) Parameters Evaluation and Confusion Matrix
Most of the studied algorithms have an adjustable parameter. Imprecise selection of tuning parameters, i.e., extremely small or large, diminishes the accuracy of the model. Therefore, the values of these parameters must be chosen appropriately. GAs are utilized in algorithms with a large number of parameters. In algorithms that have only one parameter, two approaches can be used. The first approach is to use the trial-and-error method. Furthermore, in the second tactic, the 10-fold cross-validation technique is employed to find values for the parameters.
In the M-fold cross-validation approach, the dataset samples are distributed into M different subsamples of the same size. In each step/iteration, the technique utilizes one of the M sets for evaluation purposes, while the rest are employed in the training process. Accordingly, the average error per p iterations is calculated as [45] We compare the results of our proposed models with other studies, in particular [32], [9], and [46]. The work in [32] and [9] propose a hierarchy-based support vector machine (SVM) classifier. The authors investigate the two approaches, namely one against one (OAO) and one against all (OAA). The study in [46] employs probabilistic neural network (PNN) and KNN algorithms for AMR. In these VOLUME 1, 2022 work, a total of six features are exploited to recognize seven different digital modulation schemes. Fig. 16 compares the performance of the proposed methods versus different SNRs. Our proposed models provide robust modulation recognition with high accuracy at very low SNRs, i.e., extremely noisy environments. In particular, it is seen that the MLP and DT methods achieve an accuracy of above 95% at SNR of -6 dB. Moreover, all the utilized algorithms provide accuracies greater than 90% at SNR = 0 dB.
In addition, to better compare the performance of the models, Fig. 17 shows the related accuracies of all the algorithms at different SNRs ranging from -10 dB up to 10 dB. Most of our employed algorithms, including MLP, DT, and NB, achieve an accuracy of 100% at the SNR of 0 dB.
The average accuracy obtained from 30 realizations of simulation per SNR is given in these simulations. Table 3 provides a comparison made for a specific SNR equal to 0 dB. Our proposed models result in significantly faster and more efficient AMR without compromising the accuracy. In particular, the presented DT offers 96.75% speedup over SVM-OAO and 89.13% over SVM-OAA. Other employed models of this study provide an average speedup of 89% and 65% over SVM-OAO and SVM-OAA, respectively. As observed, the DT algorithm consumes the least time to provide a suitable output. Providing the correct result in the shortest possible time facilitates the use of the proposed algorithms in real-time applications. Table 4 lists the associated accuracy for various SNRs. In addition, this table contains the results of some other related studies. In the methods proposed in the comparable studies, various machine learning and AI techniques are employed. As can be observed, the presented techniques, even at low SNRs, successfully offer very high accuracy. In particular, the proposed DT, MLP, and NB algorithms provide 100%, 99.66%, and 96.66% accuracy at SNR = -2 dB, respectively.
One of the most significant achievements of the presented study is using single-parameter methods in which trial-and-   error methods obtain the optimal parameter value. Using algorithms with a minimum number of parameters will increase the speed of machine learning algorithms in the training and testing process. Moreover, for multi-parameter models, it can be mentioned that the specific GA presented in this work is utilized to find the optimal values of the parameters for these approaches. The GA exploiting the 10-fold cross-validation technique is employed to obtain the optimal values of the adjustable parameters. Figs. 18 and 19 examine the accuracies of the MLP and RBF models with respect to changes in the values of the tunable parameters, respectively. We observe that the appropriate selection of adjustable parameters values significantly improves the performance of the proposed models. It is clearly seen that the optimal selection of these parameters significantly improves the performance of the employed models (up to 70%).
Furthermore, it is observed that the DT classifier has the least computation time. Accordingly, to further study the model, Table 5 lists the results obtained for this algorithm at SNR = -6 dB in the form of a confusion matrix. The confusion matrix reports the prediction results of our model. The number of correct and incorrect predictions are listed for each modulation scheme. As can be concluded from the figures and tables above, the proposed models provide excellent accuracy.

V. CONCLUSION
MLP, RBF, ANFIS, DT, and NB models are leveraged in this study to intelligently detect and classify numerous digital modulation schemes based on the presented key features. Additionally, we exploit a heuristic optimization method, namely GA, to determine the optimal values of the tunable parameters for the machine algorithms. The simulation re-sults illustrate that the proposed algorithms have superior accuracy while employing significantly fewer classifiers, parameters, and computations. It is observed that the selected models successfully identify and classify the received signals that have been modeled in different ways in a short time and with extreme accuracy. Furthermore, the presented approaches substantially elevate the speed of the AMR system, in particular up to 96.75% improvement compared to previous studies. These classical methods can compete reasonably well with deep learning algorithms. It can also be stated that algorithms can accurately and robustly identify the signal modulation schemes received at extremely low SNRs. Various channel types, including Rayleigh and Rician fading, can be investigated in the future to extend the work. In addition, it should be noted that the hardware aspect of the methods should also be evaluated as future work.