Covert Channel Detection: Machine Learning Approaches

The advanced development of computer networks and communication technologies has made covert communications easier to construct, faster, undetectable and more secure than ever. A covert channel is a path through which secret messages can be leaked by violating a system security policy. The detection of such dangerous, unwatchable, and hidden threats is still one of the most challenging aspects. This threat exploits methods that are not dedicated to communication purposes, meaning that traditional security measures fail to detect its existence. This review has introduced a brief introduction of covert channel definitions, types and developments, with a particular focus on detection techniques using machine learning (ML) approaches. It provides a thorough review of the most common covert channels and ML techniques that are used to counter them, as well as addressing their achievements and limitations. In addition, this paper introduces a comparative experimental study for some common ML approaches that are commonly used in this field. Accordingly, the performance of these classifiers was evaluated and reported. The paper concludes that our information is still at risk, nothing is said to be secured and more work on the detection of covert channels is required.


I. INTRODUCTION
A covert channel is a way to initiate communication between two parties to covertly leak information. This communication violates the established security policies of an organization. This illegitimate communication was initially defined in 1973 by Lampson [1], [2], after which Grilling extended this concept to computer network platforms [3], [4], enabling the initiation of covert channels over computer networks. The advanced development of computer network techniques has presented a rich environment in which to establish many scenarios of covert channels that are complicated enough to be detected and therefore pose many challenges for those seeking to establish secure communication [5]- [8]. Network-covert channels have proven to be effective in supporting many malicious activities. The creation of covert channels is a popular and effective way of information hiding that provision insecurity concerns [9]. Moreover, with the emergence of covert channel tools and techniques, hackers The associate editor coordinating the review of this manuscript and approving it for publication was Tyson Brooks . and attackers are capable of avoiding detection by network security devices [10].
A covert channel is unlike traditional secret message transfer methods in which not only the transmission content is hidden, but the transfer path itself is also protected [11]. In particular, network-covert channels maintain two aspects to secure the transmission of secret messages. These aspects include the security of communication content and connections. Network-covert channels effectively improve the security of both aspects [12].
Covert channel techniques are being rapidly developed owing to the influence of advanced communication technology. Some factors that play major roles in developing covert channel techniques are summarized in [13]. These factors include the advanced developments in network and communication technologies, switching techniques and internal control protocol technology.
The authors in [14] highlighted that continued work to counter this type of ongoing threat is urgently needed. In addition, there is a lack of countermeasures that focus on multiple types of covert channels. Mostly, each covert channel countermeasure is dedicated to countering one type VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of covert channel, instead of dealing with multiple types. Although there have been many attempts to develop methods capable to counter many types of covert channels, however, these methods are either inefficient or cause considerable overheads. Long-term research efforts to increase the awareness of developers and engineers regarding the risks caused by network-covert channels are required to find common ground and avoid duplicated efforts and overlapping solutions [9]. In the early design phases of protocols, services, etc., awareness helps to avoid the weaknesses that can be exploited by these attacks.
This research paper focuses on covert channels as security threats that breach our networks and data; however, some research papers have presented useful uses of covert channels [15]- [21], which is normal because many techniques have a double-edged sword effect [22].
This section provides a brief introduction to covert channel attacks and shows how this type of threat can be rapidly developed to cause real challenges that need to be considered. The next section provides a very short introduction to covert channel types with greater focus on the two main types of covert techniques, and Section III discusses the wide spread of covert channel techniques among new technologies including the Internet of Things (IoT), IPv6 protocol and VoLTE technologies. It reflects how these technologies and techniques are vulnerable to being exploited by covert channel attacks and provides a rich environment in which to establish different covert channel techniques that pose many security challenges. Section IV is the core of our work, providing a thorough review of covert channel detection using machine learning (ML) approaches. ML classification models have ensured their ability and efficiency in the field of information security, as well as their importance and benefits in the general area of computer science. This section discusses the achievements and limitations of these techniques in detail and is especially focused on recent research to provide state-of-the-art information in this area. This is followed by Section V, which introduces an experimental comparative study of eight classification models to demonstrate their performance in terms of accuracy and error. A dataset was developed by constructing a packetlength-based covert channel that exploits network packet length to convey secret messages. A thorough discussion is provided in Section VI and the paper is concluded in Section VII.

II. COVERT CHANNEL TYPES
A covert channel is a communication channel used to transmit information by exploiting system resources that are not designed to convey data [23]. Commonly, there are two types of covert channels: timing and storage. In timing channels, the covert message is modulated into the timing behavior of an entity on the sending side to be retrieved by the receiving side [24], whereas in storage channels, a sender writes a covert message directly or indirectly into storage objects to be read by the receiver side [25]. The timing channels can be further divided into two subtypes: active and passive [26]. Some researchers refer to the combination of storage and timing channels in one approach as a third type, known as a hyper-covert channel [27]. This type can pose significant challenges, making it difficult to detect [28].
Tian et al. [12] highlighted that the construction of covert channels, which divides them into timing and storage channels, does not involve covert channels that are constructed based on changing the transmission network architecture. Therefore, they proposed dividing the key technologies for network-covert channel construction into two levels or aspects: the transmission network and communication content. For more details on this classification, interested readers can refer to [12].
Moreover, the classification of covert channels based on their behaviors, techniques, similarities, patterns, protocols, etc. has recently received the attention of the research community to help develop countermeasures that are capable of targeting multiple covert channels instead of having a countermeasure for each covert channel technique. Studies on this trend are presented in [6], [7].

III. COVERT CHANNELS & NEW TECHNOLOGIES
This section highlights the widespread use of covert channel techniques among some new technologies such as Internet of Things (IoT), IPv6 protocol, and VoLTE technologies. We also discuss how these technologies represent enriched environments to promote the construction of many covert channel techniques that pose real challenges.

A. COVERT CHANNELS OVER IoT
IoT applications and associated new technologies have enriched the spread of covert channels. Many covert communication methods have been introduced that exploit IoT protocols, either in the form of storage or timing channels.
It has been highlighted that covert channel threats against security and privacy in the IoT have been recently recognized and have raised the attention of security professionals; however, research in this field has not been significantly explored [29]. Most Internet of Things (IoT) devices have network interfaces that expose them to the public. These devices are characteristic, with limited resources, such as batteries, memory, and processing power, and lack appropriate security measures. Therefore, they are vulnerable to exploitation by different types of attacks.
Cabaj et al. [29] stated that most of the published papers regarding covert channels in IoT utilized data-hiding techniques in some IoT protocols. For example, some storage covert channels exploit the extensible messaging and presence Protocol [30], one timing covert channel and two storage covert channels use the building automation and control networking protocol [31] and two timing covert channels and six storage covert channels exploit the constrained application protocol (CoAP) [32], an extended work which includes power consumption analysis of these covert channels of CoAP, which is given in [33]. Moreover, Smith [34] indicated that while CoAP is widely used in IoT, it has mostly been ignored in covert channel research. They pointed out that distributed covert channels are a new technology that requires research attention. It spreads a covert message over many hiding techniques which makes detection more difficult. Accordingly, the authors presented two covert channels, one of which exploited unverified fields of the CoAP protocol and another used domaingenerating algorithms (DGAs) for the virtual distribution of hidden messages to create a timing-based distributed covert channel [34].
The authors in [35] demonstrated the possibility of hiding data in a cyber physical system, such as a smart building, by making slight modifications to its components (e.g., controllers, sensors, etc.) or by exploiting unused registers to store secret data.
Moreover, the study in [36] aimed to demonstrate the vulnerability of IoT environments to the covert timing channels over mobile networks. They investigated different types of covert timing channel construction approaches to examine their ability to build covert timing channels for the IoT. This study classifies five types of covert timing channel construction approaches for IoT over 4G/5G mobile networks. These five timing covert channels include a packet reordering-based covert channel, retransmission-based covert channel, rate switching-based covert channel, schedulingbased covert channel, and a packet loss-based covert channel.
A recent study aimed to discover whether the message queuing telemetry transport (MQTT) protocol is subject to exploitation by covert channel attacks, as MQTT has become a popular protocol in IoT applications. This is a lightweight and publish-subscribe protocol. Practically, the authors investigated MQTT version 5 and reported that the number of covert channels can exploit this protocol but are not feasible for previous versions, as these covert techniques are based on some features of MQTT version 5. This reflects the ongoing development of covert channels and their ability to be deployed, even with the advanced development of network techniques [37], especially the IoT, which has become a common platform of communication. Moreover, Vaccari et al. proposed a tunnelling system capable of encapsulating messages over MQTT by exploiting its features that allow cyberattacks to be executed. They indicated that this protocol is a good choice over other protocols [38].
This section demonstrates the spread of different covert channel techniques among IoT protocols to reflect the amount of work required to counter these developed threats by considering them during the design phases.

B. COVERT CHANNELS AND IPV6
Although the IPv6 security issues have been addressed and improved, some issues remain and require further investigation. These issues concern inherent design vulnerabilities of the IPv6 and its incomplete implementation in all operating systems. Moreover, the successful deployment of the IPsec protocol within this protocol does not provide any guarantee or additional security against hidden channel attacks [39]. Lucena et al. introduced and analyzed twenty-two different covert channels that exploit the IPv6 protocol [40]. Interested readers for more information in IPv6 covert channels can refer to [12], [40]- [44].

C. COVERT CHANNELS AND VOLTE
In covert timing channels, a secret (covert) message is modulated into the IPDs of normal traffic; however, this is not applicable for VoLTE because the inter-packet delays of VoLTE traffic are fixed, and thus it is not possible to be modulated. This motivated the authors in [45] to introduce a covert channel in VoLTE traffic by adjusting periods of silence, in which a covert message can be modulated by extending or postponing periods of silence. To decrease the packet loss impact, the authors employed the grey code to encode the covert message. The authors demonstrated the undetectability of their proposed covert channel using statistical tests. In terms of robustness, the covert channel outperforms other IPD-based covert channels, as indicated by [45].
By exploiting the real-time interactive feature of VoLTE, in which data packets are sent in both directions (receiver side and sender side), Zhang et al. [25] constructed a twoway covert channel to ensure the receipt of a secret message so that the sender receives feedback from the recipient. The constructed covert channel involves two channels: timing and storage. In the timing channel, a secret message is modulated into the number of Silence Insertion Descriptor (SID) packets during the silence periods, while the storage channel is used to send feedback to ensure the receipt of the secret message. It exploits the real-time transport control protocol (RTCP) to inject the feedback message. The authors discussed the robustness and undetectability of the proposed covert channels. Moreover, a video packet reordering covert channels over VoLTE supported by ML algorithms was developed in [46] to confirm the construction of reliable covert communication over complex networking constraints.
This example shows that even with a technology that is difficult to exploit by covert techniques, attackers can find ways to establish covert attacks.
The advanced development of covert channel methods is undoubtedly noticeable, so the process of developing effective countermeasures still requires more attention. The presence of new ideas for constructing network-covert channels, such as reversible network-covert channels, has prompted the development of new detection methods. Reversible network covert channels can restore overt data without leaving any proof of their appearance [47]. In addition, prevention mechanisms should be considered in the early phases of designing protocols and services.

IV. COVERT CHANNELS DETECTION
Covert channel techniques use network resources that are not designed for communication purposes (e.g., timing infor-VOLUME 10, 2022 mation and packet headers) to leak information; therefore, conventional security measures fail to detect their existence. In addition, the available detection methods are dedicated to discovering specific covert channels and cannot be extended to include more covert channels [48].
This section mainly focuses on detection approaches that are based on machine-learning classification models to investigate their achievements and limitations. This paper gives more attention to recent work over the last five years with more focus on the papers that have been published in high-impact journals and conferences, as well as those that are highly cited.
The importance of ML techniques in supporting the security and privacy of several applications is notable. ML can contribute effectively to fulfilling the current real-world requirements in the security field. However, attackers can evade ML approaches by committing adversarial attacks. Therefore, assessing ML approach vulnerabilities in the early phases of development to deal with such attacks is critical. Sagar et al. analyzed different types of adversarial attacks that target ML approaches and represent defense strategies against them [51].
The authors in [52] presented a literature review on ML and deep learning techniques in network security, with focus on recent research. Their study introduced the latest applications in the field of intrusion detection. They indicated that each detection approach has its advantages and disadvantages; however, the most effective approach has not yet been established. A dataset is important, as no ML or deep learning approach works without data; however, the creation of an intrusion detection dataset is not easy and can be time consuming. Existing datasets suffer from many problems, such as outdated content and unevenness [52]. Moreover, many researchers have created their own network covert channel datasets for research purposes; unfortunately, these datasets are not publicly available [53].
Shaukat et al. indicated that ML techniques are advanced methods for the detection of cybercrime. They play an important role in fighting cybersecurity attacks and threats, such as malware detection, spam detection, intrusion detection, fraud detection, and phishing detection. However, they addressed some of the limitations listed in Table 1. These are limitations of certain ML models that are frequently used in cybersecurity [49].
ML for covert channel detection has been widely studied [54]. Nafea et al. indicated that the SVM algorithm is the best approach for detecting covert data [55]. However, the success of ML approaches depends on the availability of the traffic samples that represent many types of covert channels, and not only traffic samples that represent specific types of covert channels. Having a separate solution for every type of covert channel is not practical, as it may cause more overhead in network performance and capacity. Therefore, more research is required to obtain a standard dataset to imitate many types of covert channels for an effective ML solution instead of a separate solution for each technique. In addition, ML algorithms will not be effective unless there are statistical variations between normal and covert traffic. In other words, if covert traffic imitates normal traffic behavior, such detection techniques fail.
Sagar et al. reported that ML has a significant role to play in many areas e.g., real-time decision making, the processing of huge data, etc.; however, attackers can exploit ML vulnerabilities to commit many adversarial attacks, such as when a malicious user minimizes false positive (FP) rates and increases false negative (FN) rates in a way that does not affect the total error rate. This provides some leverage for attackers to commit sophisticated attacks [51].
Based on the comparison of the common classifier approaches presented in [49], Figure 1 shows the accuracy achieved by these classifiers using the NSL-KDD dataset to work as an anomaly based intrusion detection method. The figure shows the performance of these classification models in terms of their achieved accuracy, all of which had a high accuracy of more than 95%. It is believed that decision tree DT classifiers outperformed the other models by reaching an accuracy rate of 99.64%, followed by DBN and NB. SVM and ANN took the second line, whereas Random Forest (RF) showed the lowest accuracy.
Caviglione indicates that the detection of network covert timing channels considers that some statistical indicators or performance metrics can be used to evaluate the regularity of the time-based evolution of network traffic flow. This is true when the deviation in the timing statistics of the traffic flow is too large. However, it is too difficult to spot such channels when the attacker modifies the encoding approach or protocol or injects an appropriate amount of noise [9]. Qu et al. [16] stated that, in covert timing channels, when the threshold of packet delays to hide the covert message is equal to or less than a quarter of the mean of the interarrival times of overt traffic, distinguishing between covert and overt traffic will be difficult because of the overlaps in the time range of both overt and covert traffic. There is no overlap when the threshold is greater than or equal to the double mean interarrival time of overt traffic, and it is easy to distinguish between overt and covert traffic [16]. This reflects the difficulty in predicting covert timing channels with a threshold of packet delays that is equal to or less than a quarter of the mean of the interarrival times of overt traffic, which indicates more challenges in obtaining adequate detection methods for such scenarios.
To provide a clearer and more readable picture regarding the use of ML techniques to counter covert channel attacks by focusing on their cons and pros, Table 2 provides a thorough and in-depth review of recent ML methods and approaches to discover covert channels. Table 2 reflects the spread of ML techniques to predict the existence of covert channel attacks and focuses on their achievements and limitations. It can be noticed that most of the recent work has focused on DNS covert channels and most datasets were collected by the researchers themselves from real traffic of their investigated network. This addresses the challenge of having a common dataset that sufficiently reflects the normal behavior of DNS traffic, considering all network types. Interested readers can find more information on DNS insecurity [56] and DNS tunnel detection [57].

A. METHOD
This section presents a comparative scenario that includes the most commonly used classification methods in the areas of information security and covert channels. Eight classification models were used in this study. One of these is an ensemble classification model based on a stacking technique that takes the outputs of the other classifiers with the expectation of improving classification accuracy. A packet-length covert channel was selected for investigation by this work. This type of covert channel exploits the variation in the network packet lengths to modulate a covert message, i.e., odd length refers to 1 and even refers to 0 or vice versa. When an attacker wants to send a message, he or she modifies the network packet lengths according to the message, and the receiver watches the packet lengths to retrieve the encoded message.
Because of the lack of a public dataset for this type of covert channel technique and the fact that most research depends on a self-made dataset that considers a specific situation, a dataset of 180 instances has been developed. These instances included 90 instances of overt traffic and 90 instances of covert traffic. Wireshark, Python, and Scapy were used to construct the aforementioned dataset.
The test mining tools offered by the Orange software were used for dataset pre-processing. Orange is an open-source machine learning and data visualization tool which is a powerful platform for data analysis and equipped with diverse toolbox. Some feature selection methods have been applied to improve the classification accuracy. Feature selection is an important process for considering only the features that have a strong influence on the classification results and ignoring other features. This process improves both classification accuracy and performance in terms of computation overhead. The classification models were trained and tested using two different training-testing sets: 70%-30% and 90%-10%. For each experiment, a random validation method was used to repeat the training and testing phases to ensure valid and reliable results.

B. IMPLEMENTATION AND RESULTS
The eight classification models which include Stack (multiclassifier approach), neural network (NN), naïve bayes (NB), logistic regression (LR), random forest (RF), SVM, decision tree (DT), and KNN were trained and tested using our enhanced dataset described in the previous section. Random validation techniques were repeated 20 times for each experiment to obtain reliable and valid results. Each classifier was trained and tested using two different sets of trainingtesting. The classification accuracy, recall and precision of each model were computed using Equation 1, Equation 2, and Equation 3 respectively. In addition, the confusion matrix (error matrix) was computed for each classification model, which indicates the classification performance in terms of the FP and FN classification errors.       The experimental results show an outstanding performance of some classifiers and moderate performance of others. As expected, the multi-classifier approach achieved better performance in both scenarios when the training sample size was 70% and 90%; however, when using 90% as the training size, some single classification models reached the same accuracy rate as the multi-classifier model. These classifiers include NB, NN, and LR. In this case, the single classifier is preferable as multiclassification approaches cause higher computational costs and are more time-consuming compared to single classification approaches. Table 3 and Table 4 show the obtained results of all experiments based on two scenarios: the training size in the first scenario was 70%, whereas in the second scenario, the training size was 90% of the dataset. Table 3 shows the results of the first scenario, whereas Table 4 shows the results of the second scenario. The computed performance indicators were accuracy, recall and precision. Based on the obtained results, we classified these classifiers into four groups according to the aforementioned performance indicators: very good, good, moderate, and poor. Stack, NB, NN, and LR performed well, achieving very good accuracy rates above 97.5% followed by the second group, which involves RF and SVM. They achieved accuracies of 96.4% and 96.9%, respectively. This was followed by the DT classifier, which recorded a moderate accuracy of 88.3%, whereas the KNN classifier lagged behind with a poor accuracy rate of 68.6%. Figure 2 shows the accuracy rate achieved by each classifier over the two different trainingtest sets. It can be seen that the four classifiers achieved a considerable accuracy rate, while the stack classifier topped them as it achieved a high accuracy rate over the two scenarios of the training-test sets.
To evaluate the classification errors, the confusion matrix was computed for all classifiers over the two training sets. The confusion matrix shows the FN and FP rates; FN indicates the number of covert instances that are classified incorrectly as overt instances, whereas FP indicates the number of overt cases that are classified incorrectly as covert instances. Table 5 lists the FP and FN values for all the classifiers over the two training sets. It can be seen that the stack classifier performed better by casing the least classification errors in terms of FN, with 0.017% FN and 0.026% FN for the training sets of 90% and 70%, respectively. It is noteworthy that NB, NN and LR also performed well by causing error rates that were closely related to those caused by the stack classifier. Next are SVM and RF, whereas DT and KNN caused a considerable amount of error.
The ROC curves of the experiments conducted throughout this work for eight classifiers over two sets of different VOLUME 10, 2022 training-testing data were presented in Figure 3 and Figure 4. These curves reflect the performance of the investigated classifiers in graph form. Figure 3 shows the ROC curves of all classifiers when using 70% of the dataset to train the classifiers and the rest of the dataset for testing, whereas Figure 4. shows the ROC curves for the same classifiers when using 90% of the dataset for training and 10% for testing.
The ROC curves show the performance of the eight classifiers and support the aforementioned findings and results.

VI. DISCUSSION AND RECOMMENDATIONS
Machine learning algorithms work when there is some variation between normal and covert traffic; therefore, any attempt from an adversary to imitate normal traffic, the ML algorithm will either fail to detect or its detection accuracy will be poor.
For a ML algorithm to be effective and monitor a network life traffic, it needs to be trained periodically to maintain its performance; otherwise, its efficiency will gradually decrease. In other words, the rapid development of both covert and overt traffic requires classification models to be updated and periodically retrained to be capable of countering these attacks. However, periodic retraining affects network performance and quality of service as it causes more overhead.
Ongoing competition between security professionals and attackers requires a self-trained approach that automatically updates itself.
For the datasets developed by the most covert channel detection approaches presented, an important question is raised on how researchers make sure that normal traffic, on which they are basing their work, is really overt. It may be that an undiscovered type of covert channel exists; therefore, researchers have to validate their findings using multiple traffic obtained from different networks under different situations to generate trusted normal traffic and then construct their covert traffic on this basis. Additionally, the creation of covert traffic requires validation.
Many researchers have generated their own network-covert channel datasets for experimentation purposes; however, they are not available for public use. Moreover, existing datasets suffer from many problems, such as outdated content and unevenness.
Having a separate solution for every type of covert channel is not a practical solution because it may cause more overhead to network performance and capacity; therefore, the quality of service (QoS) will be degraded. In most of the available detection methods, each detection method focuses on discovering specific types of covert channels and cannot be extended to involve more covert channels. Therefore, developing multi-detection approaches that are capable of detecting different types of covert channels is highly recommended. However, developing such approaches requires careful design to ensure a high detection accuracy rate with minimum overheads, as multi-detection approaches are subject to more overheads that may breach network performance and therefore QoS. However, this balance is a challenge. In addition, the lack of publicly validated datasets describing a covert channel or group of covert channels aims to assist research and base their proposed solutions on them.
To increase the knowledge and understanding of covert channel techniques, the authors encouraged similar efforts to the work presented in [94], which introduced a network security laboratory on data analysis to detect TCP/IP covert channels. This laboratory is for teaching purposes; therefore, similar work that covers different types of covert channels is highly recommended.

VII. CONCULSION
This paper investigates the efficiency of machine learning approaches to discover covert channel attacks. The paper provides a brief introduction to covert channel attacks, highlighting the widespread use of covert channel techniques among new technologies such as the Internet of Things (IoT), IPv6 protocol, and VoLTE technologies. This reflects how these technologies and techniques are vulnerable to being exploited by covert channel attacks and how they provide a rich environment in which to establish different covert channel attack techniques that pose many challenges. This review article has mainly contributed by examined the efficiency of machine learning techniques to counter covert channel attacks, with a deep focus on their pros and cons. In addition, it introduced a comparative study of eight ML classification approaches and the associated experimental results in terms of their performance and detection accuracy were reported.
The paper concluded that ML algorithms make a significant contribution to detect covert channel attacks and can effectively fulfil current real-world requirements in the security field; however, any attempt to imitate normal traffic, ML algorithms either fail to detect the existence of covert channels or their detection accuracy decreases. In addition, ML algorithms have many vulnerabilities that allow attackers to commit sophisticated attacks. Therefore, assessing ML approach vulnerabilities in the early phases of development to deal with such attacks is urgently needed.
It is difficult to have ML algorithms that work efficiently with multiple covert channels; if this happens, then this approach will certainly be computationally costly and lead to increased network overhead, thus diminishing the quality of service (QoS).
In the long term, if an attacker successfully avoids the statistical analysis of a covert channel, the extracted features of the employed detection model will fail. Therefore, it is noteworthy to mention that the research doors are widely open to more contributions in this area.