ML Algorithm Performance to Classify MCS Schemes During UACN Link Adaptation

This research classifies the modulation and coding rate for link adaptation in Underwater Acoustic Communications Networks (UACNs). Recently, the UACN has become a promising technology for military, commercial, and civilian applications, as well as scientific research. However, we should minimize the dataset dimension for real-time implementation due to the sensor nodes’ energy limitations in the underwater environment. We used an Incheon sea trial’s measured dataset of 18 features, applying Principal Component Analysis (PCA) to select the dominant eigenvalue components in order to reduce the curse of dimensionality, and then selected 11 parameters. After that, we applied Machine Learning (ML) algorithms with different combinations of the parameters to separately classify the modulation and the coding rate and measured both individual and overall classification accuracy. The findings are compared with two Taean sea trial datasets with 11 features to finalize the selected parameters for link adaptation. For modulation classification, we observed 96.83% accuracy with the K-nearest Neighbors (KNN) algorithm in three-parameter and two-parameter cases. In coding rate classification, we found 100% accuracy with the KNN algorithm using the same three-parameter case. However, for the best fit among the three datasets, we finalized another three parameters at the expense of accuracy. To find the optimum threshold values for all modulation and coding rate labels, we used Rule-based (RB) 2D and 3D analysis. However, with a hard limit on non-overlapping data, at best, 35.51% classification accuracy was found for a 1/3 coding rate (Turbo code) with QPSK modulation, which showed much less reliability for RB analysis in a UACN, so it is not useful in this regard. Besides, our analysis shows data independence in the Doppler Spread (DS) and the Frequency Shift (FS), mitigating the time-variability channel’s challenge. We use the Gaussian distribution plot, a confusion matrix, multi-dimensional scatter plots, interpolated plots to analyze the data.


I. INTRODUCTION
Nowadays, the underwater sensor network (USN) is a promising technology among all sorts of next-generation wireless networks. A wireless link can be established with an acoustic signal or an optical signal, or by using OFDM MIMO systems. With minimum energy consumption, the USN is a promising technology for applications like communications between submarines; navy surveillance; disaster management The associate editor coordinating the review of this manuscript and approving it for publication was Arun Prakash . after, for example, a tsunami, a cyclone, or an rip tide; research on marine life; finding undersea minerals; offshore oil exploration by autonomous underwater vehicles; and data exchange via undersea sensor networks for environmental monitoring, water level measurement in flooded areas or for rivers at risk; observation of water flow during bridge construction; nuclear power generation on rivers or the sea; deep seaport construction and maintenance; and data exchange among sensor nodes [1]. The conventional AWGN channel is critical for dealing with an underwater acoustic channel's challenges, such as long delays, a larger Doppler spread VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ due to the low propagation speed of sound (resulting in frequency and time-selective fading), temperature, viscosity, and streamflow direction. Still channel modeling is an ongoing art due to such complexities. Parnish et al. showed that the year-long time-varying nature of sound and speed, and the different weather conditions, cause a ''bending'' of acoustic pressure waves. In underwater acoustic communications (UAC), both attenuation and noise strongly depend on frequency, even though a transmitter-receiver (Tx-Rx) has a fixed distance [2]. Due to the inhomogeneous nature of UAC, modeling is usually either deterministic and physics-based or stochastic (or a combination of both). Socheleau et al. [3] studied the validity of stochastic modeling and inferred some statistical properties, such as the distribution of scattered random components, as well as time variation in shallow-water UACs, with a set of measured data in a coastal environment. Due to the extreme limitations on the available bandwidth in a UAC network, frequency reuse and cellular concepts are more appealing when trying to enhance a UAC network's coverage and capacity [4]- [6]. A simulation-based link adaptation (LA) approach used a 12-path Rician fading channel for strategical LA [7]. In [8], the authors proposed a modem equipped with direct sequence spread spectrum signals of various coding rates and modulation orders. Adaptive selection of signals is achieved based on BER prediction via boosted trees. It speeds up communication by 10 to 20 times, compared to fixed-rate transmission. However, this proposed receiver showed vulnerability to high time-variable (high Doppler Spread) channels. A software-defined OFDM-based underwater acoustic (UWA) communications system demonstrates the superiority of adaptive transmission, where both modulation order/type and power on each subcarrier are selected based on channel conditions to maximize throughput. However, it is experimentally limited to a short-distance transmission set-up [9]. Using new methods such as sparse adaptive convolution cores, time-domain turbo equalization and frequency-domain turbo equalization still cannot solve high computational complexity, and they have a low success rate for modulation classification. By applying Deep Learning, classification accuracy was found to be around 99% [10]. Link adaptation optimizes the reliability of communication networks through AMC. Previously, maximum likelihood techniques were used to classify MCS levels, but they have more computational and circuit complexities, mostly owing to the need to calculate natural logarithms in the likelihood function, and to the increased demand for additional signal samples. If the data size is not too large, a Machine Learning (ML) algorithm is a better option as a classifier [11].
In our previous work, we used all 11 parameters and 18 parameters of Taean and Incheon datasets, respectively [12]. In that case, with ML analysis the Boosted Regression Tree (BRT) algorithm showed the best performance at 99.9%. However, we cannot verify all 11 or 18 parameters for AMC classification in a real-time environment due to the lower energy consumption of buoy and underwater sensor nodes. This research's main objective is to minimize the required number of implementation parameters while maximizing classification accuracy. We should choose, at best, two-dimensional or three-dimensional solutions with a maximum of four or five parameters for analysis. Due to lower accuracy by using the Taean dataset in previous work, here, we first take into consideration the Incheon dataset [13] with a 2 km Rx-Tx distance. Along with three types of modulation classifications (QPSK, 16QAM, 64QAM), we also consider two types of coding rate classification (1/2, 1/3) of Turbo code. At first, with Principal Component Analysis (PCA), we chose 11 to 15 parameters with dominant eigenvalues. These parameters are mostly like the available parameters at the Rx. Then, we analyzed those parameters multidimensionally (2D 3D, and more) with ML. Among the five algorithms like those used in [12] (also applied here), both Boosted Regression Tree and K-nearest Neighbors (KNN) algorithms performed outstanding classification in terms of accuracy in both modulation and coding rate classifications. Due to the fixed value for Doppler Spread (DS) and Frequency Shift (FS) during the Incheon sea trial, these parameters showed an irrelevant accuracy percentage, even with a single parameter. However, an analysis found that after training and validation sessions with any new input of these parameters, the accuracy measure will be zero. After further analysis, we found more relevant and variant parameters for AMC, which showed the system's independence from these two parameters. The rule-based (RB) strategy has also been applied to classify MCS levels. With a [−1.2σ , 1.2σ ] interval Gaussian data distribution, although the Incheon data do not precisely follow this distribution, we used the similarities, and 2D and 3D RB showed acceptable overall accuracy. But the drawback is that lots of the data overlap among all modulations and coding rate levels. Then, we tried to find the thresholds for both cases of three types of modulations and two types of coding rate. However, with 2D and 3D rule-based analysis, we do not get an acceptable classification accuracy percentage with the defined thresholds. Figure 1 shows an example UAC network with such an arrangement of 2D, 3D, or more dimensional ML and RB classifiers. The analysis of 4D (and more) dimensional data achieved a better accuracy percentage in some cases, but we preferred to finalize 2D and 3D results. After that, the findings were compared with the 1 km and 3 km Taean sea trial datasets. Finally, the results are optimized as suitable for a real-time UACN with lower energy sensor nodes.
The paper was organized into eight sections. Section II describes the Underwater Acoustic Communications Network architecture and offers an Incheon dataset overview, while PCA analysis is described in Section III, and the significance of ML algorithms is described in Section IV. Modulation classification with Machine Learning analysis and 2D rule-based analysis is discussed in Section V, with Section VI presenting the analysis of ML and threshold findings from using 2D and 3D rule-based strategies for coding rate classification. Section VII shows the comparative results from the Taean dataset. Finally, in Section VIII, we conclude our findings.

II. UAC NETWORK ARCHITECTURE AND DATASET OVERVIEW
Different kinds of network architecture concepts are available for underwater acoustic signal communications, such as wireless networks, sensor-based networks, optical wireless networks, MIMO-OFDM-based networks, green wireless networks, and 3D wireless networks [14]- [17]. The UAC network is still in the development stages, and that is why much attention and meaningful research is ongoing in this area. Because of bandwidth limitations, the cellular-based network structure for a UAC network is also a promising technology to implement the frequency reuse concept. Many literatures have been studied based on this concept, as used in the femtocell network, PS-LTE, LTE-R, 3GPP LTE-A networks, and so on [18]- [20]. We also consider this kind of frequency reuse scheme in a cellular UAC network for our research. Figure 1 shows the UAC network architecture. Our primary concern is the underwater base station controller (UBSC), which is connected to three underwater base stations (UBSs) via interface 2, and these three UBSs are further connected to underwater equipment (UE) through interface 1. Interfaces 1 and 2 are wireless links where the 100 Hz acoustic wave is used as a carrier. We use the OFDM baseband signal from the transmission. The UBSC is connected to the core network (CN) via interface 3, which is LTE. Table 1 in [12] showed a list of system operating frequencies and bandwidths where A-DL to A-UL1 and A-UL2 connections are allocated to the UBSC and the UBS, respectively, with B-DL to B-UL3 to the UBS and UE, respectively. The UAC network layout is based on the basic cellular concept of spatial frequency reuse. Figure 1 from [7] showed the architecture considered for a one-tier UAC network layout and its details. The UAC network channel model has severe effects on system performance due to its challenging characteristics. So, it is vital to consider transmission loss, ambient noise, and multipath fading effects when analyzing the LA in order to increase bandwidth efficiency in a UAC network [21]. Indeed, the main concern with considering the cellular type of architecture under a UAC network channel model is to maximize the coverage and capacity within limited bandwidth resources. The OFDM frame used in the UAC network system is seven seconds in length with a 5 kHz bandwidth (BW), and it contains nine physical resource blocks (PRB) with two preambles. Each PRB contains six OFDM symbols, and each symbol is 0.1024 seconds in length and has a total of 512 subcarriers. In our experimental dataset, uplink data are measured (A-UL0); i.e., the transmitters are UBSs, the receiver is the UBSC, and FDD is used. The uplink OFDM symbol details are in Table 1.
The analysis has primarily been done based on two types of measured datasets; details are described in [12]. Here, we extend the analysis further with only the Incheon dataset because it showed better performance in the previous analysis. The Incheon dataset was measured during a 24-hour experiment conducted from July 5, 2017 [Starting Time:1438 (HM)], to July 6, 2017 [Ending Time: 1458 (HM)]. The experiment was carried out over a 2 km distance between a Tx and Rx in the Yellow Sea (Deokjeokdo, Incheon). The detailed parameters at the transmitting end are shown in Table 2.
The dataset was organized into 10,105 rows comprising the number of experiments, and 23 columns, where 18 columns held numeric values (considered features). One column is for modulation level, two columns are for channel coding level and coding rate description, one column is for uplink information, one is for time of the observation, and finally, one is for the BW of the OFDM baseband Tx signal's date/time in the experiment. Each experiment was observed for four to six minutes, changing the combination of VOLUME 8, 2020   (FS). For any experimental duration, we have 36 sets of observations and a total of 282 experiments. The symbols F and T in PS and RP mean Frequency (Space) and Time. The threshold values for the measurements during the sea trial for different parameters were −10 dB, −15 dB, correlation (corr) 5, and correlation 9, etc. In [12], Figure 2 showed the statistically varying nature of parameters like RMS Delay Spread, and Coherence BW in the Taean sea trial dataset, which resemble the channel quality measure. The same experience is also valid for the Incheon dataset and for another parameter, such as MED [3]. Table 3 shows the available estimated parameters at the receiving end, by which (after finding threshold values using rule-based analysis) we can carry out link adaptation for underwater acoustic communication networks. Using these parameters like features, we can classify modulations and coding rates at the receiving end with Machine Learning algorithms.

III. UNSUPERVISED ML: PRINCIPAL COMPONENT ANALYSIS
Here, our main objective is to find the essential parameters with the highest variances among all features that carry most of the information regarding modulation and coding rate classification. PCA is an unsupervised, non-parametric statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The primary application of PCA is dimensionality reduction. We start with 18 parameters and determine the principal components (PCs) using the power method, the covariance method, and the Singular Value Decomposition (SVD) method.
Power Method: Power iteration, or the power method, is an eigenvalue algorithm that, given diagonalizable matrix A, will produce a number, λ, that is the greatest (absolute value) eigenvalue of A, and a nonzero vector, v, which is a corresponding eigenvector of λ; that is, Av = λv. The algorithm is also known as the Von Mises iteration. Despite its simplicity, it converges slowly. The most time-consuming operation of the algorithm is the multiplication of matrix A by a vector, so it is effective for a massive sparse matrix with appropriate implementation. The algorithm details were described in [22].
Observations: In the power method, the parameters are arranged according to the corresponding eigenvalues. According to our analysis, the parameter sequence, from top eigenvalues to lower ones, is Data Rate (bps), CB, MED, SNR, RMS, RP, etc. It is notable that, here, Coded BER has a very low eigenvalue! Covariance Method: The dataset's covariance matrix is always a symmetric matrix with the variances on its diagonal and the covariances, off-diagonal. It defines both the spread (variance) and the orientation (covariance) of the data. The most significant eigenvector points the direction of the highest variance of the data where the eigenvalue represents its magnitude, equals the variance. The second-most significant eigenvector is always orthogonal to the largest eigenvector, and points in the direction of the second largest variance of the data [23].
Observations: In PCA with the covariance method, we linearly transform the Incheon dataset features into another principal component set. The component with the most significant eigenvalue is PC1, and the component with the second-highest eigenvalue is PC2, and so forth. We use the built-in pca function of MATLAB 2019a to do this. This function returns five parameters as outcomes. One of the important outcomes is the latent parameter, presenting the eigenvalues of the covariance matrix of the Incheon dataset that represent the variances of the principal components. Another is the explained parameter. It shows the rational variances of each principal component compared to the total variances. The term ''total variances'' means the summation of all the features' variances, which is also the same as all the PCs' total variances. A higher percentage for the rational variance of one component means it carries more information. Generally, the summation of rational variances for two or three PCs from the beginning of the list of all sequenced PCs covers 90% of the total variances, shown in the explained  percentage row values, or visualized from its scree plot. Figure 2 shows the scree plot of the rational variances of each PC. In this simulation, only about 35% of the variances are covered by PC1 and PC2, and too little information is carried by these two components, so dataset mapping by this method is not highly reliable. We need to take into consideration [8][9][10][11] PCs to get more classification accuracy on the modulation and the coding rate. Table 4 summarizes the values of explained and latent parameters.
Using the biplot function, we can plot a two-dimensional orthonormal covariance graph aligned with the x-y axes of either PC1-PC2 or PC3-PC4, and so on, in order to present the original features from the first dataset. It shows the direction of the eigenvector, and the lengths of the vectors indicate the corresponding eigenvalues. Figure 3 presents such a plot for PC1 and PC2, where Data Rate (bps), Coded BER, and SNR have positive covariances (proportional relationships for PC1 and PC2), and CB, MED, and RMS have negative covariances (inverse relationships for PC1 and PC2). Table 5 summarizes the significant findings of all PCA methods. Some of them have positive covariances, and others have negative covariances, but both types of covariance impact classification accuracy. In both methods, Doppler Spread and Frequency Shift have very low eigenvalues until we find the highest values in PC8 and PC9 [ Table 5, columns 2, 4]. We count nine PCs in total because of the greater than 90% contribution in the total variance of these components. According to the eigenvalues, we finalize a few parameters as consistently key parameters. Those are RMS, MED, Data Rate, Coded BER, CB, Pilot, SNR, RP, etc. Another notable point is that PCs are not really in chronological order in our analysis with respect to variances [ Table 5, column 8], so reliability is not 100%.
We followed the same procedure for the 1 km and 3 km Taean datasets. The parameters found in both datasets are summarized in Table 1 in the Appendix. For the Taean datasets, we only implemented the covariance method. The other two methods were not used for these second datasets. We sequenced the parameters from the highest value of the variances in PC1 up to PC9. The key parameters are almost similar, just as they are in the Incheon dataset; for example, Coded BER, RMS Delay Spread, MED, Pilot, EVM, CB, RP, SNR, Data Rate, etc.
SVD Method: The PCA process reflects computational complexity. The real-time data matrix contains thousands of pieces of data, so it is hard to compute the covariance matrix's eigenvalues, which may sometimes cause errors, such as rounding errors [24]. According to the mathematical definition of the Singular Value Decomposition method, let the Incheon dataset is X , denoted as a 10,104 x 18 matrix of real-valued data and rank r, where (without loss of generality) m ≥ n, and therefore, r ≤ n; x ij is the element of each i th observation row and j th feature column. The elements of the i th row of X form the n-dimensional vector g i , and the elements of the j th column of X form the m-dimensional vector a j . The equation for singular value decomposition of X is the following: where U is an m×n matrix, S is an n×n diagonal matrix, and V T is also an n × n matrix. The columns of U are called the left singular vectors, {u k }, and form an orthonormal basis for the feature columns, so that u i ·u j = 1for i = j, and u i ·u j = 0, otherwise. The rows of V T contain the elements of the right singular vectors, {v k }, and form an orthonormal basis for all VOLUME 8, 2020 received parameters in the row. The elements of S are only nonzero on the diagonal and are called the singular values. Thus, S = diag (s 1 , . . . , s n ). We can express the SVD of X as which is the closest rank-l matrix of X . There is a direct relationship between PCA and SVD in cases where principal components are calculated from the covariance matrix.
If each row of X is centered, X T X = j a j a T j is proportional to the covariance matrix of the variables of a j . In this case, left singular vectors {u k } are the same as the principal components of {a j }. So, s 2 k is again proportional to the variances of the principal components. The matrix SV T again contains the principal component scores, which are the coordinates of the features in the space of principal components [25].
Observations: In the SVD method, we have a separate covariance matrix and eigenvalue. We can select these parameters roughly from the covariance matrix: Pilot in both space and time, RP in both space and time, Data Rate (bps), and MED, RMS, and CB, etc., based on their eigenvalues. In the simulation, we choose the number of top eigenvalues, topeig = 12, and count the first 10 PCs (k= 10), and then, it covers 95% of the variances.
In real-time applications, 2D or 3D data dimensions are feasible to implement due to sensor node (UE) energy limitations in the undersea environment. Due to the power constraint in a UAC on the sensor node (or UE), on the UBSs, and on the UBSC, we finally selected two or three parameters. For less energy consumption by sensor nodes in the UAC network with either a MIMO or an OFDM system, a lot of research has been done [26]- [29]. In this study, via PCA analysis, we sort out the major parameters from among all the received parameters that have the most significant roles in identifying the modulation and coding rate levels. Now, we are going to measure classification accuracy with combinations of the above-mentioned selected parameters.

IV. MACHINE LEARNING ALGORITHMS FOR LINK ADAPTATION IN UAC NETWORKS
Feature parameters have too many overlapping values and are random due to the fast-fading channel. Limited research has already been done on the stochastic model, statistical characterization of channel modeling, the eigenpath UAC channel model, the statistical time-varying model, etc. [12], [3], [30]- [34]. Again, considering the sea environments, such as heavy tides and windy conditions, some analysis has been done [35], [36]. Few works have been completed using optical or magneto-inductive wireless channel links instead of low-frequency acoustic links [37], [38]. However, it is still more challenging for accurate formulation of the UAC channel model. The challenge of transmission-CSI feedback timing mismatch was already discussed [12]. On the other hand, due to the fixed distances of Tx-Rx pairs, the Doppler Spread and Frequency Shift were fixed for every observation period of four, six, or 10 minutes, but were changed to different values for every trial. With multivariate data features, we need to make a classification decision based on a multistage decision tree where each stage utilizes a different feature. This increases the multi-dimensional complexity. Using a rule-based strategy to find appropriate thresholds at every stage for modulations, along with coding rates, is very difficult. Besides, to get LA from measured data may be possible, but the reverse is impossible, which is the case in LTE or on any other known wireless channel. We observed that the relationship between SNR vs. Data Rate shows inverse random behavior. As a result, the statistical ML approach is a good option to test and validate a huge amount of measured training data and to classify the modulation and coding rate more precisely.
Here, we follow the same set of algorithms, without loss of generality [12]. Those algorithms showed comparatively better performance over there. Nevertheless, after a lot of trial and error, we finalized three of them: K-Nearest Neighbors, Boosted Regression Tree, and Support Vector Machine (SVM) algorithms. Other algorithms did not perform well in this research. Zhu et al. [11] described algorithm details of the KNN and SVM classifiers. In the KNN approach, both training signal and testing signal share the same signal source for the generation of reference feature values; thus, reference feature space is an accurate representation of the feature distribution of the testing signal. The classifier requires evaluating distances between the testing signals and reference signals, so the Euclidean distance is used for this purpose. Although KNN suffers from dimensionality, after applying PCA and with a reduced selection of 11 features, the performance was better than in previous research [12]. Although KNN is capable of doing multi-class classification, it suffers from the complexity in the distance calculation. Another disadvantage of the KNN classifier is that the classification decision-making features are not weighted. Therefore, one feature with a relatively sparse distribution between different modulations may dominate the distance evaluation.
BRT uses the boosting method in which the input data are weighted in subsequent trees. The weights are applied in a manner so that the previous trees' poorly modeled data have a higher probability of being selected in the new tree. That means that after the first tree is fitted, the model will measure the prediction error of that tree to fit the next tree, and so on. By measuring the fit of the previous trees built, the model continuously tries to improve its accuracy. This sequential approach is unique to boosting [39], [40]. Boosting is a numerical optimization technique for minimizing the loss function by adding, at each step, a new tree that best reduces (steps down the gradient of) the loss function. For our data analysis, BRT uses a logit boost when using two parameters; with three parameters, AdaBoost2 is used as a booster. This algorithm is also robust to missing values and outliers. SVM achieves classification by finding the hyperplane that separates data from different classes. The hyperplane is optimized by maximizing its distance to the signal samples on each side of the hyperplane. Depending on the nature of the signal being classified, the SVM classifier uses either linear kernel (linear programming) or non-linear RBF Kernel versions (quadratic programming). Nevertheless, AdaBoost only corresponds to linear programming. Gaussian kernels are universal kernels, i.e., their use with appropriate regularization guarantees a globally optimal predictor, minimizing both estimation and approximation errors of a classifier. Here, approximation error refers to the error incurred by limiting the space of the classification models over which the search is performed, and estimation error refers to an error in estimating the model parameters. The linear SVM classifiers have linear kernels. The reason for the effectiveness of SVM and AdaBoost is that they find linear classifiers for too-high dimensional spaces. When overfitting is addressed by maximizing the margin, the computational problem associated with operating in high-dimensional space is dealt with by the SVM through the method of the kernel. Kernels allow the algorithms to perform low-dimensional calculations that are mathematically equivalent to inner products in a high-dimensional ''virtual'' space. In this case, the boosting approach employs a greedy search; the weak learner is an oracle for finding coordinates with a non-negligible correlation to the label [41].
Compared with the KNN classifier, the SVM only needs to use the training signal when establishing the separating hyperplane. Once the hyperplane is optimized, there is no need to involve the training signal in any further calculations. The benefit is that the computation needed in the testing stage is relatively inexpensive, compared with KNN. The Gaussian kernel's choice is better than the other two types of available kernels, like linear and polynomial kernels. With an increase in the order of the polynomial kernel, the size of the function class increases. The Gaussian kernel is nonparametric. A polynomial kernel is parametric. In a way, with a non-parametric kernel, we can put more and more data into the complexity cost. In contrast, a parametric model's size is fixed, so the model is saturated after a certain point. So, asymptotically, for extensive dimensional data with no prior information about the function, invariances, data distributions, and so on, a non-parametric method is always better [42].

V. MODULATION CLASSIFICATION
Any classification can be done by applying either Machine Learning or rule-based algorithms. Here, we apply both for individual specific purposes. A new set of rules is generated by the ML process for learning the given training data with specific learning algorithms. The process predicts the classes in the given Incheon or Taean dataset. It makes a model for every new dataset. The classes are often referred to as targets, labels, or categories; in this case, the modulation techniques are QPSK, 16QAM, and 64QAM. The ultimate objective is for the machine to identify the target/label of the newly given data (the test data). On the other hand, a rule-based strategy can predict the classes by using threshold values sequentially. Here, making an analogy with the Gaussian distribution curve, applying the sigma interval concept to the parameters, and after several trials for shrinkage and expansion of each parameter, we set the parameters' threshold values for every modulation label. The same procedure will also be repeated for coding rate classifications, as explained separately in Section VI.

A. MACHINE LEARNING APPROACH TO MODULATION CLASSIFICATION
Now, we apply ML algorithms to the different combinations of both PCA parameters and also PCA-excluded parameters. VOLUME 8, 2020 We trained the algorithms with 70% of the Incheon dataset of 10,104 observations, with 30% of the data used as test data. This arrangement was used for both modulation and coding rate classifications. For three types of modulation label, the Boosted Regression Tree used AdaBoostM2, and for two types of coding rate classification, we used LogitBoost. All algorithms showed good performance, but from among them, the K-Nearest Neighbors algorithm performed best.
We observed that the combination of these parameters showed extremely good overall accuracy for modulation classification One point should be made: we only discuss the overall accuracy percentage. The individual modulation and coding rate classifications are described later, for ML with the confusion matrix, and for rule-based analysis, numerically. However, we never measured the false alarm rate in the rule-based approach. In the ML approach, it is automatically measured by the confusion matrix.

1) 1D ANALYSIS
To check an individual parameter's impact on classification accuracy, we conducted a simulation with one parameter. Here also, FS and DS showed a 100% contribution with the Boosted Regression Tree algorithm. The reason is that when we train with 70% of the data using 280 different values for FS and DS, then in the 30% test data, the BRT algorithm correctly detects the modulation labels 100% of the time. However, for new future measured data, those parameters should have another new, constant value, and then, classification accuracy will be 0%. For that reason, we can avoid further consideration of DS and FS, and show a minimum eigenvalue in PCA analysis. In both approaches, irrelevance was proven for classification. However, with only one parameter, KNN showed satisfactory performance with any delay parameter, such as RMS Delay, MED, CB, and SNR. Among them, MED showed the highest contribution in classification accuracy at 96.14%, and the second highest was 95.94% with SNR. SVM and Naïve Bayes algorithms cannot provide acceptable performance with a single parameter. To find more reasonable accuracy with the ML approach, we combined two parameters differently, as well as three parameters, and four or five parameters, etc. Because SNR shows better performance singly, we tried the maximum number of combinations with SNR. We observed that several combinations showed an up-to-the-mark percentage in overall accuracy for modulation classification. However, Data Rate and Coded BER performed the worst in every combination, even with SNR, except for DR with CB, which resembled outstanding performance from all ML algorithms. It was about 96.83% with the KNN algorithm and 86.3% with BRT. This result ensures the credentialism of Coherence BW (CB). Figure 4 is an example of a confusion matrix that shows individual modulation classification accuracy, false alarm rate, and overall accuracy for the combination of SNR and CB with KNN. There are more false detections as 16QAM; for example, false detection of 64QAM as 16QAM is 0.7% and false detection of QPSK as 16QAM was also 0.7%. This result indicates that 16QAM data has too much overlapping with both QPSK and 64QAM. Here, QPSK achieved highest percentage of accuracy.
However, in many cases under all the experiments, 16QAM was falsely detected the most, and it had a much lower percentage in classification accuracy due to its narrow non-overlapped range. This will be clear when we examine the Gaussian curve under rule-based analysis in the next section.

2) MULTI-DIMENSIONAL ANALYSIS
The combination of SNR and CB performed best due to the significant contribution by both parameters. The performance from the combination of DR and Coded BER jumped to the highest in three cases when CB was added to these parameters.  9), RP Space, SNR] generated 91.52% accuracy in modulation classification, but increased dimensional complexity. For five parameters, that was also true. We tried to confine our findings to those from two or three parameters. Now, we compare the primary findings with the parameters in Table 4, which are estimated at the receiver. We can omit parameters that are not on the list. In addition, because we use the Turbo code signal, the Uncoded BER (Linear) parameter is irrelevant. We summarize the acceptable parameters as CB (−10 dB) ( Besides the ML approach, we also tried to find the threshold values for contributor parameters; those combinations had good percentages in overall accuracy from ML analysis. We start this journey with a one-dimensional rule-based strategy. We conducted a Gaussian distribution analysis to find the mean, variance, and plot of the probability density function for a normal distribution, as illustrated in Figure 5. From the plot, according to the six-sigma graph [43], we want to calculate the percentage of data coverage. The definition for the percentage of data coverage is given in Section A of the Appendix to within ±1σ , ±1.5σ , and ±2σ intervals, and so on. Not all plots resemble an exact Gaussian distribution. First of all, each parameter is summarized with individual data size, minimum and maximum values, mean, variance, standard deviation, ±σ truncated data size, and the lower bound and upper bound values for every modulation scheme: QPSK, 16QAM, and 64QAM. The MATLAB pdf function with a normal distribution was used to plot the six-sigma graphs. Finally, the percentage of data coverage within the ±σ interval was counted. Figure 5 shows the overlapping points from the three plots for QPSK, 16QAM, and 64QAM, and the threshold of each modulation for every parameter was found. The threshold is an essential measure for LA. For Coherence BW(−15 dB), RMS Delay Spread (−15 dB), MED (−15 dB), we have only one threshold for each. We cannot differentiate between QPSK and 16QAM using Coherence BW(−15 dB), RMS Delay Spread, and MED. There are two threshold values due to three types of modulation classification labels. The definitions of thresholds are as follows.
Threshold1: determine the upper truncation value for QPSK level detection and the lower bound for 16QAM.
Threshold2: determine the upper truncation value for 16QAM level detection and the lower bound for 64QAM.

Threshold Value for Data Rate:
We drew the pdf plot of the normal distributed original data for [−σ , σ ], [−1.5σ , 1.5σ ], and [−2σ , 2σ ] intervals, and found the threshold (crossing) points for every modulation. The percentage of data coverage within the ±σ interval seems to be an outstanding result of above 90%, but because of data overlap, no real-time implementation of LA is possible with this result, and it shows the ambiguity of the real-time scenario, which we found in the LTE case. So, we further tried to find the optimum threshold values by observing all Gaussian plots (possible crossing points of QPSK-16QAM and 16QAM-64QAM) for the other two uniform sigma intervals. Then, we defined the optimum threshold value based on equation (3) by using the minimum value as the lower bound for QPSK and the maximum value as the upper bound for 64QAM. Then, we calculated the individual modulation classification accuracy and the overall classification accuracy. The thresholds that reflected more accuracy in all cases were selected as the final threshold pair. For the above three intervals in 16QAM and 64QAM, the percentage of data coverage went down to much lower values: 13.82% and 19.24%. Overall accuracy was similar in all three cases. That means most of the data for these two modulations overlap. However, for QPSK detection, performance remained the same as the [−σ , σ ] interval: 90.15%! We need to account further for the sigma interval increasing the data coverage percentage and the overall accuracy. We tried the non-uniform intervals of [0, σ ], and [0, 0.7σ ], getting similar results. For the latter, the detection of 16QAM and 64QAM slightly increased by 15.29% and 23.43% at the expense of decreased QPSK detection performance. But overall accuracy went down. So, we finalized the ±σ interval result as the threshold-pair for Data Rate. In the 2D analysis, we used these thresholds for DR.

Threshold for Coherence BW (−15 dB):
QPSK and 16QAM pdf plots overlap too much. We cannot distinguish these two modulation labels by using CB. Furthermore, at the overlap point of 100 with 64QAM, there is little accuracy. So, we tried to set the threshold point manually, and after several trials, set it to 30, where overall accuracy with CB was 56.12%. Although it is only one threshold point, this parameter showed competence in modulation accuracy through the ML and RB strategies.

Threshold for SNR:
In ML analysis, SNR with the KNN algorithm showed extraordinary performance at 95.96%. By changing different sigma interval ranges, we tried to fix its threshold values. Threshold1 and Threshold2 at 7.6 and 5.7, respectively, showed the highest overall accuracy (49.44%) among all trials. However, it is notable that the threshold values are opposite, i.e., Threshold1 for QPSK is the upper truncated value of 16QAM, and Threshold2 for 64QAM is the lower truncated value 16QAM. Within this interval, again, 16QAM had lower accuracy.

Threshold Value for MED:
Like RMS Delay Spread, MED also has one threshold value and a parameter very similar to RMS Delay Spread. So, MED has the same impact on modulation classification accuracy. It has an individual classification accuracy as same as the overall classification percentage. The threshold value is 28.9.

Threshold Value for RMS Delay Spread:
There is only one threshold value for RMS, so we cannot differentiate QPSK and 16QAM with it; we could only differentiate 64QAM. We found the optimum threshold values from different sigma intervals, the classification accuracy for each modulation, and the overall accuracy. The [−σ , σ ] interval provided a better detection percentage for QPSK and 16QAM, but a very much lower percentage for 64QAM. The overall accuracies were also poor here. Finally, we chose a threshold value of 7.35 for the RMS Delay Spread parameter.

Threshold Value forCoded BER (Linear):
QPSK detection showed better performance in all types for the proposed sigma interval. It was maximized under the trial with a half interval. For [−1.5σ , 1.5σ ] and [−2σ , 2σ ], the detection percentage for 16QAM went down to 8.5% and went up to 18.62% during the half sigma interval. It showed a nature opposite to DR and RMS Delay Spread detection, i.e., a too narrowly non-overlapped interval for 16QAM. Detection of 64QAM increased well enough with a larger sigma interval. The total accuracy was still inadequate! The threshold values for Coded BER are 0.2 and 0.31. In Table 6, all threshold values are summarized.
One parameter is not sufficient to accurately classify the modulation, because the percentage accuracy for every parameter was very much less than the system efficiency level. We might try a 2D or 3D combined analysis for checking performance further. Our main objective is for real-time implementation of the underwater communications system; we need two or three of the most reliable parameters that decrease the data volume while increasing the accuracy. That means we will have a more reliable real-time link adaptation. From Figure 5, we can also see that the truncated interval cuts most of the 16QAM data.  for real-time implementation. Another three-parameter combination considered later is [Data Rate (bps), CB (−15 dB) (corr 5), Coded BER (Linear)], selected by the ML algorithm analysis. For 2D analysis, we followed this algorithm to cover a higher percentage of the data and improve accuracy in the threshold range.
There is no direct relationship between Data Rate, SNR, and Coherence Bandwidth, in theory. Coherence Bandwidth has an inverse relationship with RMS Delay Spread. Besides, in our Incheon dataset, SNR showed a random inverse relationship with Data Rate. The 2D scatter plot in Figure 6(a) shows this abnormal nature. Now, from the 2D ML analysis in which combinations had a better overall accuracy, the percentage of classification accuracy and the percentage of overall accuracy were also verified with the rule-based strategy. Here, we primarily used the [−σ , σ ] interval data, where every modulation has an overlap. By using the algorithm in  The first pair had 90.63% in the overall classifications; 64QAM had a maximum classification accuracy of 99.23% with the same combination of parameters. One noticeable point is that the CB data distribution has an almost similar overlap for the three modulations, but from 2D analysis of the sigma interval, combined with Data Rate, we can classify the modulation with a high value of 90.63%! The second pair, [SNR and CB(−15 dB) (corr 5)] achieved an accuracy of 82.24%. Two other combinations had a lower performance than an acceptable level for system efficiency. Figures 6(b) and 6(c) show the 2D scatter plots of these two combinations. We see that we have some separation and some overlapping of data in the two-dimensional rule-based scatter plots. Data Rate (bps) and SNR have data separated well enough for modulation classification, but CB shows too much overlap. Another important point is that SNR has a low value for higher-order modulation, and vice versa, which is impractical! Still, SNR is valuable for modulation classification.
Later we will apply fine-tuning with an exact threshold value to find the real accuracy percentage. Before that, we need to sort out which parameters have more significance on the coding rate, and then, we will finalize the parameters and apply threshold values to those parameters for the final output. The algorithm of the 2D rule-based strategy is presented in Table 7. Table 8 accumulates all percentages of classification accuracy for every modulation after 2D RB analysis, and all overall accuracies found from ML and rulebased analyses are summarized in Table 9. Figure 6 shows the 2D scatter plots for all combinations. Figure 6(a) shows the ambiguity in SNR with the Data Rate. In communication theory, there is no relationship between Data Rate and CB. Because of SNR, every plot has an inverse adaptation. VOLUME 8, 2020   For the lower value of SNR, we have 64QAM, and for the higher value of SNR, QPSK. These plots are drawn with the [−σ , σ ] interval. For a more separable plot, we further analyzed it after coding rate classification.

VI. CODING RATE CLASSIFICATION
In the Incheon dataset, there are two types of coding: Convolution code and Turbo code. Moreover, coding rates are 1/2 and 1/3. In this research, we only consider Turbo code. So, the total data size was reduced to 1728. By taking 30% as test data, the size of the test data for coding rate was 518. For coding rate classification, we also used the same two techniques: the ML approach and the RB approach.
Finally, after 1D RB analysis, we applied multidimensional analyses for both modulation and coding rate classification.

A. MACHINE LEARNING APPROACH TO CODING RATE CLASSIFICATION
At first, we tried the eleven parameters available at the receiver (see Table 4) for every single modulation case. We noticed that Data Rate (bps) is not available at the receiver, even though this parameter had the greatest impact on modulation classification with CB in the analysis from the previous section. We obtained 100% coding rate classification accuracy with the Boosted Regression Tree algorithm with QPSK modulation. KNN also classified accurately at a good percentage (97.1%) with same modulation case. SVM, Gaussian, and Naïve Bayes were also able to classify at a good percentage. Figure 7(a) shows the confusion matrix for the eleven parameters in case of QPSK modulation. However, we should minimize the required number of parameters to classify the coding rate with maximum accuracy, considering the realities of energy constraints. We already found a significant impact from two or three parameters in modulation classification. So, we decided to start with three parameters here. At first, we fixed QPSK. We display the confusion matrix for four parameters and three parameters in figures 7(b) and 7(c). With the same percentage, all combinations do the same: 1/3 coding rate was classified at 100%, but 1/2 code rate had some false alarms (1.7%).

B. THE RULE-BASED STRATEGY FOR CODING RATE CLASSIFICATION
Here, we conducted 3D and 4D rule-based graphical analysis with the selected parameters found in the previous section for coding rate. We generated 3D and 4D scatter plots with the same combinations of parameters for visual perception of the 1/2 and 1/3 coding rate data distributions (see Figure 8). Figure 8(a) shows QPSK data with a smaller SNR; all data for 1/2 and 1/3 code rates are spread over the whole RMS Delay Spread and MED parameter ranges. The MCS level at a lower value for RMS also laid down a lower MED value. The SNR value of approximately 2-10 dB covers all MCS levels. For the SNR range of 20-25, a 1/2 coding rate MCS level was found. In Figure 8(b), all MCS levels of QPSK data are in the lower SNR region of 2-10 dB, the 0-400 range for Coherence BW, and distributed over almost the whole range of MED values. A few 1/2 MCS levels were at the higher SNR value of 20 dB, and a few 1/3 MCS levels were at a higher value of 600-1400 with CB. Figure 8  relationship with SNR. In the figure, SNR is the fourth coordinate. The highest SNR value of 10.743 showed a very much lower value for RMS Delay Spread: 1.78. Another value of 10.31 for SNR showed an RMS value of 10.74. We could say that sometimes a higher SNR value shows a comparatively high value for RMS, and sometimes shows a relationship with a low RMS value. Besides, the lowest SNR value shows a comparatively higher value for RMS. So, SNR also has a random relationship with RMS Delay Spread. In communication theory, SNR and MED have no direct relationship, but RMS Delay Spread and MED showed a similar trend. This fact is also reflected in this figure. The lowest value for SNR relates to the highest value for MED, but in the whole distribution, the relationship shows a lot of randomness, not linearity. In addition, 16QAM and 64QAM show a pattern similar in data distribution to a coding rate like QPSK. One remarkable point is that two types of coding rate overlap too much. Finding a natural threshold is a challenging job here! We will do every single-parameter analysis numerically with a rule-based strategy for coding rate classification, as was done for modulation classification. We will also do a multi-dimensional RB analysis.

1) 1D ANALYSIS
[QPSK]SNR: Because both code rates have mean values and distribution approximately similar to the 1/2 code rate, and the distributions of 1/3 code rates overlap each other, we can select any mean value, or the average of those, as threshold points to classify. Within the [−sigma, sigma] interval, the 1/3 code rate has a lower percentage of data coverage. So, the trial threshold was chosen at 7.4 (the mean of the 1/2 code rate). It increased the percentage of data in the 1/3 coding rate portion. If another trial threshold is 7.67, which is the mean of the 1/3 code rate, it decreased the percentage of classifications for this code rate. Then finally, SNR's threshold value was chosen (at 7.5), which has an optimum accuracy percentage for both coding rates, and a maximum overall accuracy of 61.11%.

RMS Delay Spread(−15 dB), MED(−15 dB), CB(−15 dB)(corr 5):
Based on the logic in the maximum percentage of overall accuracy obtained with a specific threshold value, we held several trials for each parameter surrounding their mean values, and primarily selected threshold values are summarized in Table 10.

2) MULTI-DIMENSIONAL ANALYSIS FOR MODULATION AND CODING RATE CLASSIFICATION
With ML algorithm analysis, we achieved 98.26% with BRT algorithm in case of QPSK with three-parameter [MED, CB, SNR] combination at best during coding rate classification. With two-parameter combination of [SNR, CB], 16QAM data performed mostly of 99.6% in the same classification.
As modulation classification, if we follow the [−σ , σ ] interval for multi-dimensional coding rate classification in case of Rule-based analysis, we do not obtain better accuracy. With QPSK for the MED(−15 dB), CB(−15 dB)(corr 5), SNR combination, the accuracy was 56.67% (1/2), 70.65% (1/3), and 63.36% (overall). For the MED(−15 dB), RMS Delay Spread(−15 dB), SNR combination, the accuracy was 65% (1/2), 69.2% (1/3), and 67.01% (overall). However, two types of coding rate data overlap here. If we use the hard limit threshold values (no overlapping data), we have an overall accuracy of 23.26% and 24.82% corresponding to the above two sets of three-parameter combinations. After that, we expand the interval to [−1.2σ , 1.2σ ], and we held a trial for different sets of three-parameter combinations. We achieved a reasonable overall accuracy: 75.69% for MED, CB, SNR, and 78.64% for MED, CB, RMS. By repeating a similar process with 16QAM and 64QAM, 75.53% and 73.50% overall accuracies, respectively, were achieved for MED, CB, SNR, and 78.64% and 76.81%, respectively, were achieved for MED, CB, RMS. Table 11 presents all these findings.
With the same interval, [−1.2σ , 1.2σ ], we also verified several two-parameter combinations, which showed At the final stage, when finalizing the parameters, we verified the performance of all three combinations of two parameters, both combinations of three parameters obtained from coding rate classification, along with the available four parameters and 11 parameters at the receiving end, for modulation classification with ML algorithms and rule-based strategies. The results were as sound as the previous ML analysis of modulation classifications. The best accuracy of 96.83% was achieved for both two-parameter [SNR, CB] and three-parameter [MED, CB, SNR] combinations with the KNN algorithm, and with four parameters for the BRT algorithm. Although 11 parameters showed 99.9% competence with the BRT algorithm, we should remember the real-time system implementation constraint. All 2D, 3D, 4D, and 11-dimensional results for both modulation and coding rate classification are summarized in Table 11.
From the confusion matrix in Figure 10 for MED(−15 dB), CB(−15 dB)(corr 5), SNR parameters, observe that 16QAM has comparatively more false alarms at 4.6%, and QPSK and 64QAM had more accurate classification at 96.4%.
Finally, we need to establish the thresholds of every selected parameter for adjusting the performance percentage. For the Incheon data, the combination of CB(−15 dB) (corr 5), SNR, MED(−15dB) was primarily chosen as the optimum three-parameter combination from among all of them. We maximized or minimized the threshold values in a way (for both classifications) to get the best result. The main objective is to minimize the overlap among all types of modulation and coding rate labels. The results with finalized threshold values are shown in Table 12 (parts I and II). The scatter plots of these parameters for QPSK modulation data are in Figure 11, where green dots show the 1/2 coding rate data value after setting the interval or the thresholding, and red dots show the 1/3 coding rate data distribution after setting the calculation or the thresholding.

VII. TAEAN DATA ANALYSIS
For comparison purposes, we checked the results from the Incheon dataset analysis against the Taean dataset. In the Taean dataset, there are two types of experimental data. One is with 1 km between the Rx and Tx, and the other is with 3 km. The detailed dataset overview was described in [12], and the dataset is available from [13]. In the previous analysis of the Incheon data, the rule-based strategy showed comparatively poor performance, so in this case, we only analyzed the dataset with ML algorithms. There are 11 parameters for the different combinations. Here, we only verify the combinations of two and three parameters that are available at the receiver. There are four types of modulation in this dataset: BPSK, QPSK, 16QAM, and 64QAM. For channel coding, only Turbo code is used, and as before, there are the same two types of coding rate: 1/2 and 1/3.  100%. All results are summarized in Table 1 of the Appendix. Figure 12 shows the confusion matrix for modulation and coding rate classifications of the Taean dataset.
If we compare these findings with the Incheon dataset findings, then we see that [MED, CB, SNR] is a common combination for both the Incheon dataset and the Taean 1 km dataset, showing the highest accuracy of 98.26% in coding rate classification of the Incheon dataset. However, this combination did not show such competence with the Taean 3 km  If we compare the performance summary in Table 4 from our previous work [12] with Table 11 for the Incheon dataset and tables 13 and 14 in the Appendix for the Taean datasets, we see that Boosted Regression Tree best performed with 18 features of the Incheon dataset [12] at 99.97%; then, SVM with the default kernel was the second highest performing algorithm. The Pseudolinear Discriminant algorithm came in just behind the SVM algorithm. None of the algorithms is good for the Taean datasets, even with a full set of 11 features.
On the other hand, in the present scenario, after PCA analysis, we get a better choice of feature selections. For the Taean datasets, we also achieved 100% accuracy in coding rate classification with several modulation types. For 4D and higher parameter combinations in the Incheon dataset, the Boosted Regression Tree algorithm showed better performance at 99.9%. With higher dimensions, the SVM with a Gaussian kernel also showed competence. Sometimes, Naïve Bayes and pseudolinear algorithms showed some competence, but this is not usually the case, and they had not so high a performance classifying the modulation and coding rate, achieving only marginal percentages. We discarded those results, either due to high dimensionality or due to the marginal results. However, this research reflects the highest potentiality of the K-nearest Neighbor algorithm in most of the 2D and 3D cases. With the Taean datasets, KNN performed best for all 2D and 3D combinations of parameters. This is also true for the Incheon dataset with a lower dimensional analysis. We can conclude that, with just a few parameters, KNN is the best option for Underwater Acoustic Communications Networks.

VIII. CONCLUSION
In this study, primarily the Incheon sea dataset in a 2 km-distance trial was thoroughly analyzed for link adaptation of underwater communication networks where uplink data are measured by using an acoustic signal at the UBSC. After that, the Taean sea trial datasets were compared to verify the findings. First, PCA was used to minimize the data volume for both datasets. Both Machine Learning algorithms and the rule-based strategy were used to classify modulation along with coding rate. Here, the accuracy of individual classifications of every transmitted modulation and coding rate type have been measured. Also, the minimum number of two-and three-parameter combinations were chosen for real-time implementation due to energy constraints at the receiver. Those chosen parameters were compared with Taean dataset parameters at both 1 km and 3 km distances. Finally, the set of MED(−15 dB), CB(−15 dB)(corr 5), RMS Delay Spread, was selected for the three-parameter set, and SNR, CB(−15 dB)(corr 5) was selected for the two-parameter set. For two parameters, a 96.83% overall accuracy was found for modulation classification, and 100% accuracy was found for coding rate classification. For the Incheon dataset analysis, BRT and KNN performed better than all other ML algorithms. For the Taean dataset analysis, KNN performed distinctively. With three parameters, coding rate classification achieved 100% accuracy with the KNN algorithm for both Taean datasets. Using two parameters in RB achieved a maximum 87.15% in coding rate classification accuracy. With [−1.2σ , 1.2σ ], approximately 78.64% accuracy was found in coding rate classification with the above-mentioned three parameters. But with a hard limit on threshold values, poor performance reached 35.51% as a maximum for both classifications of the 1/3 code rate (Turbo code) with QPSK modulation, which showed a lot less reliability in the RB analysis. This research obtained similar performance with a four-parameter set, and 100% with 11 parameters, but those results were discarded due to the reality of energy constraints. Finally, both PCA and ML analyses showed the data independence of Doppler Spread and Frequency Shift on AMC, which mitigates a major challenge for high Doppler Spread due to fast fading of channels in UAC networks.

APPENDIX A TERMINOLOGIES
In this article, the following terms are used to measure the modulation and coding rate classification accuracy in both rule-based analysis and Machine Learning-based analysis.
a. Percentage of Data Coverage: This term is used in rule-based analysis. It describes the truncated data percentage for any sigma interval. It contains overlapped data for different modulation labels.
-Formula for percentage of data coverage of any feature within the ± kσ interval, where k = ± 1, ± 1.5, and ± 2: For modulation, (4), as shown at the bottom of the next page. Two other definitions are also the same. For coding rate, (5), as shown at the bottom of the next page. The other definition is the same. b. Percentage for Classification Accuracy: This term is used in both rule-based and ML-based approaches. It describes the data percentage considering threshold values within different modulations or coding rates. So, there are no overlapping data. It resembles the true detection of modulation labels or coding rates. For the rule-based strategy, -Formula for percentage of modulation classification accuracy using any feature within the ± kσ interval: ± 1σ , ± 1.2σ , ± 1.5σ , or ± 2σ (6), as shown at the bottom of page 21. -Formula for percentage of coding rate classification accuracy using any feature within the ± kσ interval: ± 1σ , ± 1.2σ , ± 1.5σ , or ± 2σ (7), as shown at the bottom of page 21. For the ML approach, it measures how many test data could be correctly classified from the given test data for each modulation by using information about the training data. The formula is in equation (6), but the terminologies are as follows: n QPSK = no.of correctly classified test QPSK data n QPSK = no.of given test QPSK data (8) The other two definitions are the same. For coding rate classification, we can also use equation (7) as follows:   c. Overall Accuracy (for modulation): This represents the percentage of total rational data coverage or the total rational level of detection.
-Formula for Overall Accuracy: For rule-based analysis, Overall Accuracy = n QPSK +n 16QAM +n 64QAM n QPSK +n 16QAM +n 64QAM ×100 For Machine Learning approaches, it measures the ratio of total correctly classified data for each modulation from the given test data and the total data size, including all training and test data. The formula is as seen in (9), but the terminologies are as follows: Two other definitions are the same. Overall Accuracy (for coding rate): This represents the percentage of total rational data coverage or the total rational level of detection.
-Formula for Overall Accuracy: Overall Accuracy = n half _codingrate +n onethird_codingrate n half _codingrate +n onethird_codingrate × 100 (11) For an ML approach, it measures the ratio of total correctly classified data for each coding rate from the given test data and the total data size, including all training and test data. The formula is as seen in (11), but the terminologies are as follows: n half _codingrate = no.of correctly classified test half _codingratedata n half _codingrate = Total no.of half _codingratedata (12)