Comprehensive Survey of Machine Learning Approaches in Cognitive Radio-Based Vehicular Ad Hoc Networks

Nowadays, machine learning (ML), which is one of the most rapidly growing technical tools, is extensively used to solve critical challenges in various domains. Vehicular ad hoc network (VANET) is expected to be the key role player in reducing road casualties and traffic congestion. To ensure this role, a gigantic amount of data should be exchanged. However, current allocated wireless access for VANET is inadequate to handle such massive data amounts. Therefore, VANET faces a spectrum scarcity issue. Cognitive radio (CR) is a promising solution to overcome such an issue. CR-based VANET or CR-VANET must achieve several performance enhancement measures, including ultra-reliable and low-latency communication. ML methods can be integrated with CR-VANET to make CR-VANET highly intelligent, achieve rapid adaptability to the dynamicity of the environment, and improve the quality of service in an energy-efficient manner. This paper presents an overview of ML, CR, VANET, and CR-VANET, including their architectures, functions, challenges, and open issues. The applications and roles of ML methods in CR-VANET scenarios are reviewed. Insights into the use of ML for autonomous or driver-less vehicles are also presented. Current advancements in the amalgamation of these prominent technologies and future research directions are discussed.


I. INTRODUCTION
Machine learning (ML) is an artificial intelligence (AI) technique used to teach a system about the unknown and make efficient and effective decisions. The use of ML in nearly all aspects, such as robotics, business, arts, automated systems, biotechnology, and intelligent automated transportation systems, has become popular due to the availability of low-cost and highly capable (i.e., high computational power and huge data storage) machines and the presence of massive amounts of data. ML provides smart and fast decision making for The associate editor coordinating the review of this manuscript and approving it for publication was Yan Huo .
improving system performance, including reliability, energy efficiency, and quality of service (QoS) [1].
Traffic congestion and safety have become vexing and complex issues in many urban areas due to the rapid increase in population and the proliferation of vehicles. Approximately 1.25 million people die every year worldwide due to road accidents, which are the leading cause of death among people aged between 15 and 29 years [2]. Congestion causes expensive delays, stress, pollution, and wasted fuel. In the U.S., the congestion cost was $305 billion in 2017 [3]. A smart and efficient transportation system can provide smooth traffic flow, reduced road accidents, and a green environment, which in turn improves economic competitiveness. Vehicular ad hoc network (VANET) is designed to improve traffic safety and ameliorate traffic congestion for reducing the travel time of commuters, particularly during peak hours.
The exponential growth of wireless devices has led to the need for a vast spectrum to support high-volume data transmission. However, spectrum scarcity (inadequate allocation compared with the demand) has become a hindrance to the deployment, support, and scaling of next-generation applications for commuters, including the Internet of Things (IoT), smart cities, virtual reality, augmented reality, and high-definition 3D video streaming services. The two main factors that cause spectrum scarcity are as follows: (a) frequency bands are allocated to licensed users based on the traditional fixed spectrum assignment policy and (b) a huge volume of real-time data is generated and transferred over a wireless medium in a dynamic environment. The authors in [4] showed that most bands are still vacant and suitable for secondary usage.
Cognitive radio (CR), which was introduced by Mitola and Maguire in [5], is a key enabling technology for spectrum sharing that allows devices to sense and use underutilized licensed channels (e.g., TV bands) dynamically in an opportunistic manner and for spectrum mobility that allows users to vacate licensed channels re-occupied by licensed users [6]. CR can play a vital role in solving the spectrum scarcity issue of VANET. Hence, CR-based VANET or CR-VANET is a promising technology to tackle road safety, congestion, and infotainment issues, and it serves as a basic building block for next-generation transportation systems, especially autonomous-driving vehicles.
This study focuses on the applications of ML in CR-VANETs to ensure that decision making is fast, highly reliable, secure, and energy-efficient. ML helps CR-VANETs become increasingly intelligent to adapt to uncertain radio environments rapidly and efficiently and reduces complexity. This study reviews the recent advancements and future directions of ML used in CR-VANETs.

A. MOTIVATION: NEED FOR ML IN CR-VANET
Numerous new vehicles are expected to appear on roads in the coming years, and they would cause serious traffic congestion that can paralyze urban areas and adversely affect the economies of countries. Apart from contributing to economic losses, poor management of transportation systems can cause stress to people, reduce working efficiency, and increase the number of accidents and casualties. To solve these issues, the smart transportation system or VANET must be improved to obtain an automated smart traffic system that provides useful information on road and traffic conditions and automated driving vehicles [7].
For a successful implementation of VANET, a massive amount of live data must be exchanged. According to Intel, a phenomenon called ''flooding of data'' is expected to occur, whereby each smart autonomous vehicle (AV) would generate and consume approximately 4 terabytes of data on the average per day of driving [8]. This amount is many times larger than the current amount of data that an average person currently generates.
An actively operating vehicle can generate an amount of data that 3000 people currently generate on average. These data, which can be gathered by sensors, cameras, and crowdsourcing, include road and traffic conditions, personal data, and application data (e.g., marketing, societal, and entertainment data). Therefore, data are the next ''oil'' in the transportation system. However, the bandwidth required to accommodate such massive real-time data exchange is scarce, resulting in network congestion, especially in urban areas.
Based on the traditional fixed spectrum assignment policy, two types of bandwidth or frequencies, namely, licensed and unlicensed, are available. Unlicensed frequencies, such as the industrial, scientific, and medical (ISM) band, are free to use and thus prone to interference [9], which can degrade QoS. Moreover, existing allocated frequency bands are insufficient to handle large amounts of data. For example, IEEE 802.11p (or IEEE 1609), which is also known as the dedicated short-range communication (DSRC) standard, has reserved 75 MHz of bandwidth in the frequency range of 5. 85-5.925 GHz for vehicular networks; however, this bandwidth is insufficient to accommodate massive amounts of data [10]. Meanwhile, licensed bandwidths or frequencies, such as TV or military radio bands, are not highly utilized [11], thereby rendering these bands idle and inactive. In other words, spectral efficiency is lacking, and the CR-based wireless communication system is the best solution in these situations. In the CR system, an unlicensed user (or secondary user, SU) identifies any vacant or unoccupied licensed frequency owned by licensed users (or primary users, PUs). Upon identifying a vacant frequency band, the SU is allowed to use it providing that it does not interfere with any PU. Thus, the SU must release the frequency band when the PU's activities reappear. The SU must ensure that its transmission power does not interfere with the PU's activities in the neighborhood [12].
High-speed mobility and a dynamic environment have brought about additional complexities and challenges to CR-VANET compared with other wireless networks such as WSN (wireless sensor network). ML methods can ease these complexities and provide tremendous improvements in terms of network performance enhancement (e.g., reduced delay, increased reliability, secure performance, and energy efficiency) to CR-VANET. Although the energy capacity of vehicles is generally sufficient, the cumulative energy requirement of vehicles can be very high; thus, energy efficiency must be achieved in consideration of the huge carbon emission that can pose a threat to the green environment [13]. Another important issue is to improve the QoS and quality of experience (QoE) of the network because the conventional spectrum sensing, transmission adaptation, and handover in the CR system (see Section III.A for additional description) increase the delay, overhead, and energy consumption [14]. ML is an excellent candidate to enhance the network VOLUME 8, 2020 performance of CR-VANET [15]. Security enhancement is one of the major issues in CR-VANET. Here, a vehicle can pretend to be a PU and propagate false information to obtain spectrum access selfishly. ML can be used to detect such actions and enhance security [16], [17]. ML also provides an optimum route to CR-VANET users to avoid traffic jams and road accidents. ML can also play a vital role in the • Various types of ML techniques, including their overview, limitations, and applications, are presented.
• Various usages of ML in CR-VANET, including spectrum sensing, spectrum switching, routing, congestion control, and security enhancement, are surveyed in detail.
• Several technological advancements in the aspects of this integration are described.
• The applications of ML to reduce road accidents and traffic congestion are presented.
• The usage of ML and CR for autonomous or driverless vehicles is described.
• The open issues, challenges, and future research directions of ML in CR-VANETs are discussed. The performance of CR processes depends on the quality of the spectrum sensing (the process of finding out the vacant spectrum). Good spectrum sensing means it has to be faster, highly accurate, robust to interference and noise, low complex and low energy consumption [19]. However, there are many challenges to achieve such good spectrum sensing such as vehicle's speed and direction, the effect of multipath fading, shadowing problem, heterogeneous QoS requirement and so on [19]. To enhance the performance of the CR process and to solve these issues, ML can be applied in CR-VANET [23], [24]. From this survey, readers can relate the necessity of CR in VANET and ML in CR-VANET. They can get insight into the applications of several ML techniques in CR-VANET.
Moreover, few open issues and research directions have been provided, this will help the readers to do more research in this field. As we have mentioned, this is the first kind of such a survey, the readers can get knowledge about ML, CR, VANET and their amalgamation with several challenges and issues in a single article. They can recognize the spectrum scarcity issues for the practical implementation of autonomous vehicles and know-how ML can be helpful to solve several challenges associated with the implementation.

C. RELATED WORKS
The areas of ML, CR, VANET, and CR-VANET and their amalgamations are presented. Several surveys [25]- [50] of these techniques are available; however, they are either presented separately or with limited amalgamations (refer to Figure. 1). To the best of our knowledge, no comprehensive survey that describes the integration of ML in CR-VANET has been conducted.
A comprehensive survey of ML was conducted in [25]. The applications of ML in various areas, such as traffic prediction, routing, and classification of different networks, were discussed. A survey on deep learning was presented in [26]. In [27], Gosavi discussed the basic concept of the applications of reinforcement learning, which is an ML technique. A comprehensive survey of ML techniques in CR was conducted in [28]. Various CR implementations with the use of AI were presented in [29]. Various applications of ML in CRNs were discussed in [30]. Comprehensive details regarding the usage of various AI techniques in CRNs were discussed in [22]. The usage of various ML methods in dynamic spectrum access (DSA) was elaborately described in [15]. The recent advancement and applications of ML in VANETs were discussed in [31]. Detailed discussions of various ML methods used in VANETs were presented in [21], and the applications of various AI techniques in VANETs were discussed in [32].
A brief survey of CR was performed in [33]. Here, the fundamental concepts of CR and its various steps, taxonomies, challenges, and issues were discussed. Comprehensive details on CR were provided in [34], [35]. In [36], a description of the CR cycle, which consists of four steps of CR processes, namely, spectrum sensing, analysis, reasoning, and adaptation (Section II.C describes these details), was provided. Various spectrum sensing techniques were surveyed in [37]- [40]. The details of spectrum mobility and its issues were discussed in [41]. A survey on spectrum management was conducted in [12].
A comprehensive survey of VANETs was performed in [42]. The security, trust, and privacy issues of VANETs were surveyed in [43]. A tutorial survey of VANETs was presented in [44], and various routing issues of VANETs were surveyed in [45]. The motivations of VANET toward a green environment can be found in [46]. Various approaches and challenges, along with the open issues of CR-VANETs, were described in [10]. Several taxonomies, recent advancements, and security and privacy issues were also discussed in [10]. Various aspects of CR-VANETs were surveyed in [7], [47]- [50].
The current survey provides a review of ML-based CR-VANETs, including architectures, applications, taxonomies, and various networking issues in spectrum sensing, management, handover, energy, and security to reduce road accidents and congestion. Current issues and research directions toward intelligent CR-VANETs are also outlined. Figure 1 shows a summary of related work on these technologies and the position of this paper. VOLUME 8, 2020

D. ORGANIZATION OF THIS PAPER
The acronyms used in this paper and their full forms are listed in Table 2. The remainder of the paper is organized as follows. Section II provides a basic overview, applications, and limitations of various types of ML techniques with their taxonomies. It also elaborately describes VANET and its relevant issues and presents a detailed overview of CR and its taxonomies, types, and other important issues. Section III describes the issues and applications of various ML methods used in CR-VANET. Section IV outlines the current challenges and future research directions, and Section V concludes this work. Figure 2 shows a thematic view of the arrangement of this paper.

II. OVERVIEW OF ML, VANET, AND CR
This section provides a detailed overview of ML, VANET, and CR. The taxonomies and advantages of these technologies are also provided.

A. ML
ML, which is a member of the AI family, enables a system to learn and increase its knowledge and experience with minimal human involvement. Similar to a human, a machine or a system can make appropriate decisions based on learned knowledge, experiences, and data after appropriate learning by using ML. ML is applied in multidisciplinary sectors. A few of these applications are listed in Table 3. In 1950, Alan Turing's revolutionary Turing test [51] inspired the world's researchers to consider the ML process, although the term ''machine learning'' was first coined in 1959 by Arthur Samuel, who wrote the first computer learning program [52]. Extensive studies have been conducted since the late 1990s; currently, the world is witnessing significant developments in ML.
ML has three main categories of learning methods, namely, supervised, unsupervised, and reinforcement. Other ML methods, which are variations of the three major learning methods, include semi-supervised learning, deep learning, online learning, transfer learning, and case-based reasoning [21]. Figure 3 shows the taxonomy of various ML approaches.

1) SUPERVISED LEARNING
The most frequently used ML method is supervised learning, in which a machine learns from the training (or labeled) dataset, which includes data on examples or observations tagged with the right answers. The machine is trained using this training or labeled dataset. The testing dataset is used for testing purposes to predict outcomes based on the training dataset.
The relation of the input and output in supervised learning can be written simply as where x and Y denote an input and output variable, respectively. The algorithm trains the system to learn mapping function f appropriately; thus, for any new data x, the system can reliably predict the outcome or value of Y. For example, the basic linear regression (a type of supervised ML) can be written based on Eq. (1) as: where w[i] and b are the parameters that would be developed by training, x[i] is the feature of the data andŷ is the predicted VOLUME 8, 2020 FIGURE 3. Taxonomy of machine learning (based on the discussions of [21], [105], [31]).
value of data. The performance of learning depends on the size and quality of the training dataset. Supervised learning is of two types, namely, classification and regression. In classification, the system learns from the input consisting of training data, and by using this learning, it classifies new observation or simply categorizes the labeled data. These data may be bi-class (for example, whether a frequency band is vacant or occupied) or multi-class in nature. Classification is used in speech recognition, face detection, handwriting recognition, and other areas.
While classification algorithms are used in discrete space, regression algorithms are used in continuous space. The regression algorithms map function f from input variables X to predict a continuous output variable Y. For example, in linear regression, a type of regression algorithm aims to fit with the best line, which goes through the data points. It is used to forecast or predict weather and risk in finance and various aspects of economics, trend analysis, drug response modeling, and other areas.
Classification and regression have several renowned algorithms. Table 4 presents an overview of the algorithms and their applications and limitations.

2) UNSUPERVISED LEARNING
In supervised learning, a large amount of data is required to train the system. In practice, providing the training dataset is difficult. Unsupervised learning has emerged as a solution to this situation. The system learns from unlabeled data, which are uncategorized or unclassified in nature. The idea is to find similarities or differences in data and act based on those similarities or differences. Unlabeled data are sorted based on their similarities and differences. Hence, in general, unsupervised learning has a more complex job than supervised learning. Unsupervised learning has been applied in self-driving cars, spectrum sensing in a distributed CRN, chatbots, facial recognition, social network analyses, market or customer segmentation, and speech recognition.
Unsupervised learning can be further categorized into two types, namely, clustering and dimension reduction. The aim of clustering is to segregate similar samples into clusters. Data samples are grouped in a way that a group has similar samples that are dissimilar to other groups' samples. Clustering is used in customer segmentation, separation of books in libraries, classification of species, and grouping of similar objects in search engines. Meanwhile, the aim of dimension reduction is to reduce the number of dimensions in order to improve the system's performance and provide optimal solutions. In short, it reduces the number of random variables by finding a set of principal variables. Dimension reduction is used in data summarization and compression, customer segmentation, trend detection, and analyses of multimedia, biological, and social network data.
Clustering and dimension reduction employ several algorithms. An overview of these algorithms and their usages and limitations are provided in Table 5.

3) REINFORCEMENT LEARNING
In reinforcement learning (RL), agents (or decision-makers) select appropriate actions by using mathematical approaches 78060 VOLUME 8, 2020 and receive rewards in an unpredictable environment [104]. RL is neither supervised nor unsupervised learning [105]. The main aim of RL is to exploit and maximize the long-term rewards to be received in the future. It is an individual learning process that interacts with the random environment. It is also considered a trial-and-error learning process; thus, it does not require any environment model and dataset for training in many cases. It can learn from the current data and environment, so it is suitable for real-time applications. For these reasons, it has been widely used in CRNs. In RL, an agent or a learner (such as a CR-based vehicle) interacts with the radio environment (comprising everything outside the agent). As shown in Figure 4, at each time step t, the agent observes the state of its surrounding environment s t S, where S is a set of possible states. On the basis of state s t , the agent selects an action a t A, where A is a set of actions. At the next time step t + 1, the environment transits to a new state s t+1 , and the agent achieves a reward r t . The target is to obtain an optimal policy (agent behavior) π: S → A that can maximize the reward at state S [105]. VOLUME 8, 2020  RL is used in many scenarios, such as in teaching robots, various schemes of CR (e.g., spectrum sensing and security issues), self-driving cars, industrial automation, finance sector, content optimization, and various applications of VANETs.
In model-based RL, an agent acts in the Markov decision process (MDP) and models the environment (given the reward function and transition probabilities) by using some experience or supervised learning. The agent learns the model and the policy value π that can provide the maximum reward. It involves minimal interaction between the agent and the environment. It is capable of rapid convergence to the optimal solution, and the accuracy of the transition models has a significant impact on the learning process [106].
In model-free RL (e.g., Q-learning), an agent does not require to learn a model of the environment (or simply does not know the MDP) to find the optimum policy for reward maximization. It acts as direct evaluations [107]; thus, it does not require prior knowledge on transitions and can be easily implemented. However, it has a slow convergence to the optimal solution [106]. Various categories of RL algorithms are briefly described in Table 6. 78062 VOLUME 8, 2020 a: Q-LEARNING Q-learning, which is the most frequently used type of RL, is an online algorithm that enables an agent to learn in an interactive manner with its surrounding environment. The main aim of Q-learning is to exploit the long-term rewards to be received in the future. It does not require any environment model and dataset for training. For these reasons, it is the most suitable option for the dynamic CR-VANET scenario, especially for addressing spectrum sensing issues. In Q-learning, an agent or a learner (e.g., a CR-based vehicle) interacts with the radio environment (comprising everything outside the agent). On the basis of the reward table, the agent selects the next action (which may be beneficial or harmful), updates a new value called Q-value for state-action pairs Q (s t , a t ), and stores the Q-values in the Q-table. For example, in CR-VANET scenarios, an action might be choosing and accessing any frequency band, and the state might be the location and time of the vehicle. If the sensed frequency band has interference from PUs, then the agent receives a negative reward; otherwise, it receives a positive reward.
Specifically, after taking every action, the agent receives the reward and updates its Q-value based on Eq. (3).
where α: Learning rate α determines how much the new Q-value overrides the previous Q-value. The α value ranges from 0 to 1. A high value of α indicates high learning speed, which may lead to fast convergence, although stability can be affected and could thus cause convergence failure. A low value of α indicates smooth learning, but the convergence rate can be slow. γ : The discount factor implies how much importance is provided to future rewards. r: The reward received by the agent. It consists of a short-term reward called delayed reward and a future reward called discounted reward. The two policies for taking action are exploitation and exploration. When the agent selects exploitation (i.e., uses existing knowledge to select the best possible action), it uses an optimal policy. When it selects exploration (i.e., learns more knowledge), it uses a random policy. The agent receives positive delayed rewards when it selects an appropriate action for a particular state. A positive value increases the respective Q-value and vice versa [6]. Therefore, Q-learning aims to obtain an optimal policy (or agent behavior) π : S → A that can maximize the reward at state S [105].
The optimal Q-value for a particular state can be written as Therefore, the optimal policy can be written as Evidently, the convergence rate depends on the quality of the Q-table and the values of α and γ . The more reward an agent accumulates, the better the Q-table becomes. Therefore, the convergence speed is increased. However, the issue is that the Q-learning algorithm learns completely by itself and does not receive any help from others. For improved performance and convergence, it must achieve a balanced tradeoff between exploration and exploitation. Increased exploration provides enhanced learning (i.e., sacrifices immediate rewards in the hope for more future rewards) but slow convergence. Meanwhile, increased exploitation provides faster convergence that may lead to reduced performance.

4) OTHER ML TECHNIQUES
ML has other popular variations, such as transfer learning, online learning, case-based reasoning, semi-supervised learning, and deep learning. They can be incorporated into supervised learning, unsupervised learning, and reinforcement learning. For example, deep-reinforcement learning is an algorithm where the deep-learning concept is used in RL; similarly, deep neural network (DNN) is an advanced version of neural network incorporated with a deep learning approach [129]. A brief overview of these variant ML methods is presented in Table 7.

a: DEEP LEARNING
Deep learning is a member of the ML family. This learning method is based on learning data representations (rather than task-specific algorithms). In deep learning, learning can be performed using supervised, unsupervised, and/or RL. It is inspired by information processing in the human neuron system. It has significant advancements compared with other ML methods. Deep learning has been applied in various fields, such as computer vision, natural language processing, audio recognition, and various issues in CR-VANETs. A typical architecture of deep learning is shown in Figure 5. The illustration and discussions were adopted from [21].
Deep learning consists of multiple layers of nonlinear processing units, and they are connected in a cascaded form, as shown in Figure 5. Each layer is used for feature extraction and transformation, where input data are transformed into a near-abstract and composite representation. The leftmost part is the input layer in which every node denotes a dimension of the input raw data. The subsequent layers are called ''hidden layers'' (i to l). The rightmost part is the output layer (m). Each node performs a nonlinear transformation on the weighted-sum of a subset of nodes in its previous layer.
The nonlinear function can either be a sigmoid function f S (a) = 1/(1 + e −a ) or a ReLU function f R (a) = max (0, a) The input I and output z relation can be written as 78064 VOLUME 8, 2020  the advancement of technology, VANET can provide several enhanced vehicular experiences and applications, such as road safety, traffic and road conditions, and comfort and entertainment. VANET is a type of mobile ad hoc network (MANET) that consists of vehicles with high-speed mobility. VANET emerged from the motivation of the intelligent transportation system (ITS) and wireless access in vehicle environment (WAVE) [48].
In VANET, vehicles are equipped with sensors, a global positioning system (GPS), multimedia systems, wireless connectivity, and navigation systems. A vehicle can sense the surroundings, such as obstacles and objects ahead (e.g., front vehicles), by using sensors to avoid collisions for emergency stops or slowdown. It can use immediate information on road conditions, such as congestion or accidents that occur ahead, from the network infrastructure. Onboard wireless connectivity with the network can provide users with entertainment and other social applications on the road. In other words, VANET provides improved user experience and reduces road accidents and congestion.
VANET provides network connectivity among vehicles and pedestrians and to the network infrastructure. The communication in VANETs can be categorized into the following types [143], and they are illustrated in Figure 6.
a. Vehicle-to-vehicle (V2V) communication takes place during data exchange from one vehicle to another without using any infrastructure, and it is mainly used for collision control and congestion avoidance to enhance vehicular safety and data relay.
b. Vehicle-to-infrastructure (V2I) or infrastructure-tovehicle (I2V) communication and vehicle-to-RSU (V2R) communication occur during data exchange between vehicles (i.e., onboard unit or OBU) and with infrastructures, such as BTS, routers, AP, and roadside unit (RSU). Specific traffic information, such as the location, identification, and speed restriction (e.g., driving speed is more than the speed limit) of vehicles, are exchanged. The communication between a vehicle and an RSU is referred to as vehicle-to RSU or V2R communication.
c. Infrastructure-to-infrastructure (I2I) communication takes place during data exchange between network infrastructures, such as BTS and RSU, for real-time traffic updates and important information exchange.
d. Vehicle-to-person or vehicle-to-pedestrian (V2P) communication takes place during data exchange between vehicles and pedestrians to ensure their safety on roads.
The other types of communication schemes in the vehicular network include vehicle-to-barrier (V2B) and vehicleto-cloud (V2C) communication. V2B is a type of wireless communication between vehicles and the roadside barriers in VANET. This type of communication is required to mitigate run-off-road crashes that account for more than 50% of roadside crash fatalities [144]. The functions and motivation behind V2B and related practical experiments were provided by [144]. In VANET, V2C is communication between RSU and the base station with the cloud for various purposes, such as data analysis, decision making, and traffic prediction [145].
VANET is an extremely large-scale wireless network that expands on entire road systems. The network topology of VANET is extremely dynamic due to the high-speed mobility of vehicles. For instance, on a highway, in a rural area, and a congested area, a vehicle moves at a speed of approximately 40 m/s, 25 m/s, and 15 m/s, respectively. The network is also very dynamic due to a diverse range of applications with various QoS requirements. Several applications require immediate data exchange (i.e., very low end-to-end delay and very high reliability), such as the exchange of road safety messages, whereas other applications require high throughput, such as the transmission of infotainment (e.g., video or audio streaming) messages [21].

2) WIRELESS ACCESS STANDARDS OF VANET
The two wireless access standards for VANETS are dedicated short-range communication (DSRC) and wireless access in vehicular environments (WAVE). DSRC is used for short-range communication, such as toll collection at toll plazas, and for V2V and V2R communication. Specifically, 75 MHz bandwidth within the frequency range of 5. 85-5.925 GHz is allocated for DSRC by the Federal Communications Commission (FCC). This bandwidth is divided into seven channels, in which the first three and the last three channels (these six channels are known as the service channel or SCH) are used for exchanging safety and non-safety messages; the middle channel (known as the common control channel or CCC) is used only for high-priority safety messages [146]. The first 5 MHz is used as the guard band, and all channels are 10 MHz [147].
DSRC was specified in 2003, and it is based on the IEEE802.11a standard for wireless local area networks (WLANs). DSRC generates a large overhead and high latency due to the high speed of vehicles and the dynamic change in a network topology. Thus, DSRC is unsuitable for high-speed VANET. To make it adaptable and acceptable, a new modified version of DSRC was introduced and is called WAVE; it consists of two protocol suits, namely, IEEE802.11p and IEEE 1609 [147].
There are other access technologies for VANET are found in the literature. Figure 8 provides some examples of those standards and the descriptions are in the following subsection.

3) OTHER ACCESS TECHNIQUES FOR VANET a: WI-FI FOR VANET
Wi-Fi is one of the widely used wireless technologies. It is very much popular due to its low cost, higher data rate, and easy installation. DSRC and WAVE are specified based on Wi-Fi technology. However, simple Wi-Fi or WLAN standards such as IEEE 802.11(a/ac/b/e/g/n) can be used by VANET as the access technology for V2V and V2I communications [148]. These techniques are also used for tracking service of the vehicle. Wi-Fi operates in 2.4 and in 5.4 GHz frequency band with the data rate of 11 Mb/s, 54 Mb/s and even 1 Gigabit/s (data rate of IEEE801.11b, IEEE801.11a, and IEEE802.11ac respectively). However, as the number of vehicles increases, the requirement of the access point will also increase and that leads to the complexity of the deployment as well as the cost increment. Moreover, it's shorter coverage area (around 100 meters), low user mobility and slow hand-over lead this standard very challenging to cope up with the fast-fading condition of the high-speed mobility in the VANET environment [149].

b: VISIBLE LIGHT COMMUNICATION (VLC) FOR VANET
VLC (IEEE 802.15.7) is a promising technology that can help solve spectrum scarcity. It generally works at infrared (IR), visible light, and ultraviolet (UV) bands and spectrum ranges of 430-790 THz [150]. It has several advantages compared with DSRC's RF technology; for example, it has no adverse electromagnetic interference (EMI) effect, possesses low latency, has additional complementary bandwidth for RF, and is less susceptible to security breaches [151]. The applications of VLC in VANET can be found in several V2V communication types, such as lane change information, V2B, sensing before any crash, and traffic signaling. It can be used as an alternative solution of DSRC when overcrowding occurs [152]. The main disadvantages of VLC in VANET are as follows: it requires line-of-sight (LOS) communication, it has a very short range capability, it exerts a shadowing effect, suffers from interference with direct sunlight, absorption, and scattering, and depends on weather conditions [151].

c: LTE AND DEVICE-TO-DEVICE (D2D) COMMUNICATION FOR VANET
3GPP (the third generation partnership program) introduced LTE for the V2I and the D2D communication for the V2V communication [153]. LTE is one of the potential wireless access technologies for VANET. It has a high data rate, low latency, large coverage area, high penetration rate, and VOLUME 8, 2020 high-speed terminal support [154]. It can provide high bandwidth and required QoS for the infotainment of vehicles on the road. The main constraints of LTE for VANET are as follows: it requires infrastructure for communication that limits the communication only to V2I, (V2V communication via LTE is unsuitable because of the high latency) and in a dense area, it faces the challenges of network capacity issue because vehicles and the traditional application of LTE create heavy traffic load. Araniti et al. in [154] provided a detailed discussion on the applicability of LTE in VANET. Ucar et al. also described the potentiality of using LTE in VANET in [155].
The D2D works as an ad-hoc approach. It's reusing gain, proximity gain and hop gain can increase the spectrum efficiency and also reduces the communication delay [153]. But due to the high-speed mobility and frequent topological changes, it creates a challenge to get reliable V2V communications [153]. Therefore, a single communication standard of VANET cannot meet the complex QoS requirements. For this reason, mixed multiple communication standards such as the combination of LTE and D2D for VANET are necessary to improve communication efficiency [153].

d: MILLIMETER WAVE (mmWave) COMMUNICATIONS FOR VANET
The above-discussed standards cannot provide high-speed network connectivity to vehicles due to the high-speed mobility of the vehicles. LTE or D2D provides not more than 100 Mbps while DSRC provides 3-27 Mbps [154]. On the other hand, the mmWave can provide more than 1 Gbps for V2V communications [156]. The bandwidth of mmWave ranges from 30 GHz to 300 GHz. Recently mmWave based Giga-V2V (GiV2V) has been attracted to the researchers for VANET communication. They have found mmWave suitable for the requirement of the rich data of high definition camera, LiDAR sensors and so on. This communication standard has limited range, high penetration loss, and poor diffraction, interfere with nearby electric poles, cellular towers, WiFi/cellular hotspots [157].

e: 5G COMMUNICATIONS FOR VANET
Spectrum scarcity, poor scalability and less supportive to provide the required QoS in a dense network are some of the main concerns of DSRC. As we have discussed earlier, LTE is one of the major key technologies that is going to undertake in vehicular communication. Therefore, it is foreseen that 5G would take place as the successor of LTE for vehicular communication [158]. 5G adds some additional features to the network, such as Proximity Service (ProSe), Mobile Edge Computing (MEC), and Network Slicing. ProSe provides the location information and the trends in communication; these provide low latency, enhanced resource utilization, and less congestion. In VANET, the latency for the safety message communication should be up to 100 ms (though it is 1 ms for the AV cases). 5G's MEC feature can help to get such low and ultra-low latencies. Management of the network by logical separation is known as network slicing. As we know, there are mainly two types of applications are there in VANET communication: i) safety-related applications, and ii) infotainment (non-safety) related applications. For the safety application, low latency and high reliability are two major QoS requirements while for the infotainment high bandwidth is the QoS requirement. By performing the network slicing, a vehicular network can be divided into two logical networks, one for the safety applications and another for the infotainment applications as they have different QoS requirements.
To promote the use of 5G for the vehicular environment, 5G Automotive Association (5GAA) was established on 27 September 2016 [159]. They have introduced and developed Cellular-V2X (Vehicle to everything) (C-V2X) in which cellular network (like 5G) will be used for the vehicular communications. They showed that 3GPP-based cellular technology provides better performance and more robust radio access than IEEE 802.11p [159]. 5G's (5G New Radio (NR)) first specification came as Release 15 of the 3GPP. In its upcoming Release 16 which is scheduled to be published in June 2020. It has planned and target for 5G V2X which would be the advanced use cases beyond LTE V2X [160]. They have also planned to release the enhancement of Ultra-Reliable (UR) Low Latency Communications (URLLC). Nevertheless, large-scale deployment of 5G C-V2X might take few more years as the technology needs to be more matured and also it is a matter of large investment [161].
Cellular networks (LTE/ 5G) have some advantages over DSRC such as higher bandwidth, large coverage area, and higher data rate. However, cellular networks have some drawbacks compared to the DSRC standard. They are given below [161], [162]: • V2X communication in DSRC is a peer to peer communication, it does not need any intervention of a network operator, but in the cellular network, it needs.
• In a cellular network, data is sent through the uplink and downlink channels to reach their destinations, but in DSRC, data can be sent directly to the destinations.
• DSRC can operate in any place by sending messages directly into the air, but for the cellular system, it needs the network coverage.
• In terms of cost, DSRC is much cheaper than the cellular network.
• In a cellular network, along with V2X communication, there are other competitor users (such User Equipment, UE or mobile users) to share the bandwidth, but in DSRC, it is completely dedicated to the vehicular communication.  [164]. Bluetooth technology is mainly used in intra-vehicle applications such as for infotainment applications, phone calls and navigation service and so on. Due to some features such as low cost, low power consumption, robust and low delay of this matured technology, it can also be used for V2V and V2I communication [165]. However, the low data rate and the low range of communication are two major constraints of Bluetooth for the deployment in VANET environment.

g: SATELLITE COMMUNICATIONS FOR VANET
Another potential access technology for VANET is the satellite radio. It has a very wide coverage area. In general, it can be used for broadcasting purposes and as the backup technique if the cellular network cannot cover the area. Satellite radio's S-band operates in 2.3 GHz and Ku bands in 12 GHz and 18 GHz range. Satellite augmentation can be used to improve the GPS system's performance or it be used for the V2V communication [166]. Other applications of satellite are reported as sensor data exchange, control center communication, vehicle tracking, real-time communication, safety related information exchange and so on [167]. SafeTRIP is one of the successful projects where satellite radio was used for VANET communication [167]. Satellite radio has higher bandwidth (90 MHz), wide coverage range, and higher scalability, but it faces severe delay and large antenna size requirements. These issues lead to unsuitability for the VANET environment, especially for the safety message exchange. It can be used with the integration of other techniques such as 4G/5G or LTE. Table 8 shows the comparisons of the access techniques discussed earlier.
Nevertheless, more researches need to be carried out in these areas, especially in the combination of the standards into a single platform (several standards to compensate each other). CR is one of the major techniques to enable such integration. It is discussed in the next section.

C. CR
This innovative concept was presented by J. Mitola in [5]. Later, S. Haykin extended the concept with the excellent insight of CR, which serves as an intelligent wireless communication system, in [170]. The first standard of CR in wireless communication is IEEE802.22.
Basically, CR is an intelligent wireless communication system in which a transceiver can intelligently adapt to the surrounding radio environment. The limited spectrum resource is efficiently utilized in CR. The main concept of CR is to use the under-utilized frequency bands opportunistically by changing the transmission parameters learned in the surrounding environment. The learning or CR process includes obtaining information on communication parameters and detecting any unused spectrum by sensing the environments. Appropriate utilization of the spectrum is achieved by adaptive and dynamic reconfiguration of the transmission VOLUME 8, 2020 parameters, such as transmission power, SNR value, and modulation scheme [171].
CR consists of software defined radio (SDR) technology. SDR is a communication system in which software is used instead of conventional hardware such as mixers, filters, amplifiers, modulators/demodulators, detectors, etc [172].
An important concept of CRN is the spectrum hole. It is a band of frequencies allocated to PUs (i.e., users who are authorized and assigned to use certain licensed channels); however, at a certain time and in a specific location, this band may not be used by PUs. In CRNs, the SUs (i.e., users who use unlicensed bands and temporarily unused or underutilized licensed bands) can utilize the spectrum hole. When any PU uses back the spectrum hole, the SUs must release the respective frequency bands.
CR enables SUs to sense the spectrum holes or the unoccupied spectrum (or vacant spectrum), select the best available frequency band, coordinate with other users and the spectrum requirement, adjust to the current situation, and vacate from the frequency band when PUs reclaim it. Then, the SU must 78070 VOLUME 8, 2020 sense for other unoccupied licensed bands, and the process goes on. This CR cycle is illustrated in Figure 9.
Sensing the radio environment includes obtaining information, such as channel characteristics, available spectrum, power consumption, and local policies. Through spectrum sensing and analysis, the SU detects the spectrum holes (or white spaces) or the unoccupied portions of the licensed spectrum and utilizes these holes. Through sensing, the SU also gains knowledge of the interference level, by which the SU can ensure that interference does not harm the PUs when they start using the same spectrum holes. By proper spectrum management and handoff activity, the SU selects the best possible frequency bands and routes to achieve QoS requirements. When the PU uses its frequency bands again, the SUs must release the occupied licensed bands and identify other available bands. The last phenomenon is known as ''spectrum mobility.''

1) SPECTRUM SENSING
Spectrum sensing (SS) is the process of obtaining spectrum usage information in a specific time, frequency, and location by observing the surrounding radio environment. The main task of SS is channel selection and vacant primary spectrum identification. Spectrum sensing is performed via three primary approaches, namely, cooperative, non-cooperative, and interference-based detection [173]. They are described as follows: It is also known as primary receiver detection; it is a type of SS where SUs or CR users share their spectrum information with each other to obtain a combined decision, which is more accurate than an individual decision [174]. CSS can be classified into three categories, namely, centralized CSS, decentralized CSS, and relay-assisted CSS. CSS provides improved sensing performance but requires the exchange of extra overheads that result in energy inefficiency and extra time in sensing. CSS information becomes obsolete rapidly due to mobility and rapid changes in the environment.

b: NON-COOPERATIVE SS
Here, every SU individually performs SS and decides the presence or absence of the PUs' activities in a frequency band. It is also known as primary transmitter detection. Non-cooperative detection methods include energy detection, matched filter detection, cyclostationary feature detection, wavelet-based detection, and covariance matrix-based detection [173]. Non-cooperative detection methods incur a small overhead; however, they depend on the network infrastructure, which may not be available at all places and may be affected by noise, interference, and the problem of hidden PUs.

c: INTERFERENCE-BASED DETECTION
FCC imposes a threshold value of interference to PUs. CR users must limit their transmission power, along with estimated noise power, to conform to the threshold value of the interference temperature level [38]. In many cases, measuring the interference temperature and comparing it with others is practically infeasible.

2) SPECTRUM ANALYSIS AND DECISION
After sensing and learning about the vacant primary spectrum, the best frequency band is selected based on interference, path loss, wireless link error, and link-layer delay [33].

3) SPECTRUM SHARING
Spectrum sharing is the management of spectrum distribution among CR users by maintaining QoS. Spectrum sharing has several classifications. On the basis of spectrum utilization, it can be classified as unlicensed and licensed. All users have the same priority in unlicensed spectrum sharing, whereas PUs have higher priority than SUs in licensed spectrum sharing. SUs can access both types of spectrum sharing only when PU is absent. Spectrum sharing can also be classified as centralized and distributed. A central node controls spectrum allocation and access in centralized spectrum sharing, whereas every single node controls the same in distributed spectrum sharing. Cooperative and non-cooperative are other types of spectrum sharing used in CR.
Spectrum sharing based on access technology is of three types.

a: INTERWEAVE SPECTRUM SHARING (ALSO KNOWN AS OPPORTUNISTIC SPECTRUM ACCESS (OSA))
The SUs find spectrum holes that are not occupied by the PUs and then use the vacant frequency bands restrictively. Thus, the co-existence of PUs and SUs is not allowed here. The SUs must vacate the frequency bands as soon as the PU reappears [175].

b: UNDERLAY SPECTRUM SHARING
The SUs are allowed to use licensed frequency bands together with the PUs as long as the SUs' signal power remains below the predefined acceptable threshold value of interference VOLUME 8, 2020 temperature at the receivers of all PUs. In general, the SUs utilize spread spectrum techniques to keep their transmission power lower than the interference temperature threshold [175].

c: OVERLAY SPECTRUM SHARING
PUs and SUs can transmit over the same spectrum simultaneously on the condition that the SUs must help the PUs' transmission via cooperative communication, such as cooperative relaying or coding techniques [176].

4) SPECTRUM MOBILITY
To ensure seamless communication, the SU must switch from one frequency band to another vacant band. This spectrum switching is known as spectrum mobility. It is required when the PU reappears in a frequency band and the link becomes broken (e.g., a user moves out of the transmission range due to mobility). Spectrum handoff and connection management are the two main processes in spectrum mobility. Several handoff strategies, such as non-handoff, pure reactive handoff, pure proactive handoff, and hybrid handoff, are available. CR technology can adapt to the surrounding radio environment by adjusting the operating parameters, such as carrier frequency, transmission power, and modulation scheme [5].
VANET provides a wide range of applications, such as road safety, congestion control, self-driving, ubiquitous connectivity, and entertainment. CR is expected to become an integral part of VANET in the coming years for solving the spectral scarcity issue due to the rapidly increasing number of vehicles. Various challenges and obstacles must be addressed by this promising combined technology. ML is expected to be applied in this amalgamation to solve such challenges and issues.

III. APPLICATIONS OF ML IN CR-VANET
ML can be used to address several issues of CR-VANET, such as ensuring road safety and reducing congestion, improving security and privacy, and enhancing routing and infotainment. This section discusses such applications of ML in CR-VANET. Figure 10 shows the taxonomy of the applications of ML in CR-VANET. Figure 11 shows how ML can be used in a CR-VANET scenario. Except for DSRC, which is allocated for VANET, other frequency bands (or channels) from TV, cellular, or Wi-Fi networks may be freely used by vehicles. Every vehicle must obtain information on vacant channels. Suppose that n channels are available in TV networks, and m and p channels are available in Wi-Fi and cellular networks, respectively. A car must sense the spectrum to identify the best available vacant channel. For example, after sensing all the channels in the spectrum, a car user finds that the k th channel is the best possible vacant channel. Thus, this channel is selected and used for data exchange in this car. Figure 11 also shows that different data requirements are needed by different vehicles. Several vehicles need to exchange real-time data, such as data on GPS, radar, camera, LIDAR, and sonar, whereas other vehicles need to exchange entertainment data or safety messages. Hence, different data types have different QoS requirements, such as safety messages that need to be exchanged without unacceptable delay. In summary, multiple vehicles on the road have different types of QoS requirements and exchange different volumes of data.
Sensing all these channels every time is highly inefficient because it consumes much time and energy. ML can be applied here for fast and improved cognitive processes. By using ML, the car can learn about the vacant channel at a specific place and time and hence does not need to sense all the channels again when it passes the same area at the same time. This approach reduces network overhead, delay, and energy consumption. ML also helps adjust CR-VANET based on the heterogeneous QoS requirement, the data volume requirement, and the priority of various data types.
A vehicle may provide wrong information to other vehicles or pretend to be PU. This security issue can be solved by using ML. In a minimally congested zone, a vehicle can obtain ubiquitous connectivity for infotainment. However, in a seriously congested zone, a vehicle may not obtain the required information due to bandwidth scarcity; consequently, road accidents could occur. For example, a vehicle that is in front must send an immediate safety message to the vehicle behind it. However, the bandwidth is scarce, and the message is not sent; as a result, accidents occur. ML can accelerate such action by vehicles. The movement patterns of pedestrians can also be learned to avoid accidents. ML is the best tool to learn such patterns. Moreover, a vehicle can take the best route from the source to the destination, thereby avoiding congested roads, by determining the pattern of traffic conditions. ML can be used here for fast and improved decision making. Further details on these issues will be discussed in the following subsections.
Several issues (shown below) should be addressed in CR-VANET.

A. ML IN SPECTRUM SENSING AND MOBILITY MANAGEMENT IN CR-VANET
This subsection discusses the usage of ML for spectrum sensing and spectrum mobility management in CR-VANET.

1) SPECTRUM SENSING
Only 75 MHz bandwidth is allocated for VANET. The same allocation applies to European countries. Japan has allocated 10 MHz bandwidth for ITS in 700 MHz bands along with 10 MHz for DSRC at 5.8 GHz bands. An advantage of CR-VANET over other traditional CRN is that it has a DSRC channel in which stable CCC can be formed for sharing spectrum sensing information among vehicles [18]. Nevertheless, this bandwidth is insufficient to accommodate the huge demand for growing VANETs. Dynamic spectrum access (DSA) of CR is a promising solution to overcome spectrum scarcity. Several TV channels are currently underutilized, such as ultra-high frequency (UHF) that ranges VOLUME 8, 2020 between 300 and 3 GHz and very high frequency (VHF) that ranges between 30 and 300 MHz, due to the aggression of the digital TV. Therefore, FCC allows these TV white spaces (TVWS) to be used by SUs (i.e., vehicles) through CR (e.g., IEEE802.22) when these channels are not in use. Wi-Fi's 2.4 and 5.8 GHz are the two other options for unlicensed users. However, the coverage range is small, and they are unsuitable for vehicles moving at more than 20 m/s [177]. FCC also proposed Citizens Broadband Radio Service (CBRS) to overcome spectrum scarcity issues. CBRS is a three-tiered spectrum-sharing scheme for the 3550-3700 MHz band [178].
Several studies on CR-VANET have dealt with TVWS and Wi-Fi signals for DSA [179]. Figure 11 shows the typical CR-VANET scenario. Vehicles can communicate with each other or with RSU by using the dedicated DSRC link or TVWS, Wi-Fi, or even cellular network links with the help of CR. Figure 12 shows some of the challenges of SS in CR-VANET where ML can be applied. One of the challenges of CR-VANET is that channel availability depends on the presence of PUs and vehicle speed. The functions, such as spectrum sensing and spectrum switching, occur frequently in CR-VANET. Therefore, spectrum availability can change dynamically, and spectrum sensing must be performed continuously to detect spectrum holes. The network environment, which is characterized by wireless propagation channels, network topologies, and traffic dynamics, for a vehicle, can change rapidly due to high-speed mobility. ML can be used to learn about the network environment so that rapid changes would not cause any problem to vehicles. Several Bayesian models (e.g., hidden Markov models), RL, DNN, or artificial neural networks can be used for vehicles' adaptation to the dynamicity of the complex environment [180], [171]. A vehicle might need to exchange reliable safety-critical messages that are strictly delay-sensitive. Meanwhile, a vehicle might exchange entertainment data that are delay-tolerant. Here, priority is provided to the first vehicle. In summary, different vehicles have different QoS requirements. ML approaches can be applied to determine the requirements of services by vehicles and provide priority in spectrum sharing accordingly.
Another challenging issue in spectrum sensing is security threats or attacks. Various security attacks, such as jamming, SS data falsification, primary user emulation, or bias attacks, affect spectrum sensing in CR-VANET. Further details on these attacks and their mitigations are described in the following subsection III.B. ML methods are useful in detecting malicious attacks and contribute to mitigation [6]. Another challenge in SS is the dynamicity of PU activities that affect the performance of spectrum sensing [19]. The high-speed mobility of vehicles and PU spectrum occupancy activities exert considerable effects on the probability of detection. ML can be a powerful tool to model PU activity and solve these issues. The authors in [181] used deep learning to predict PU activities. This learning is utilized by the SUs for appropriate spectrum sensing, which reduces the false detection of PUs' presence. In CSS, several vehicles provide sensing information to the fusion center, to the RSU, or to other vehicles. Appropriate synchronization should exist between these sensing data. The three types of fusion techniques are as follows: (i) hard fusion (AND rule, OR rule, etc.), (ii) soft fusion (maximum ratio combining), and (iii) learning-based fusion. Learning-based fusion by using ML outperforms other fusion techniques due to the rapid adaptation to the environment and high predictive capability [182]. In the same study, the authors used K-means clustering, Gaussian process, SVM, and weighted KNN learning methods to implement CSS. They found that learning-based fusion is better than conventional CSS. Therefore, ML can be very useful in synchronizing various sensing results provided by several vehicles.
ML can be utilized to provide dynamic information exchange with minimum overhead and delay and for real-time resource allocation (RA) with low complexity. Instead of sensing the entire available channel, a portion of the channel can be sensed to find the spectrum hole, thereby providing energy efficiency to the network. By using ML, a vehicle can learn the portion to be sensed even without sensing the channel that can be accessed opportunistically [183]. ML reduces the spectrum sensing time and increases the probability of PU detection while reducing the probability of false alarm (inaccurately assuming the presence of PU). Through collaboration, vehicles can exchange spectrum availability information to improve their spectrum knowledge. ML can help proliferate this learning as well.
Several studies were conducted on spectrum sensing in CR-VANETs by using ML methods. The authors in [15] discussed the use of various ML approaches in spectrum sensing issues of CR-VANETs. They discussed RL, case-based reasoning, SVM, and ANN in terms of spectrum sensing along with several challenges and opportunities.
In [183], the authors showed how ML can be used to obtain energy-efficient spectrum sensing. When the number of channels is sensed, more time and energy are required. In full sensing (without using ML), CR senses every target 78074 VOLUME 8, 2020 channel on a random basis. In restricted sensing, CR only senses the best available channels learned using RL, thereby providing enhanced bandwidth efficiency. Minimum sensing is a scheme in which sensing can be stopped if the available spectrum is fully partitioned by learning. Here, after 1900 events, the total energy consumption was only approximately 1.72% of the full sensing scheme, assuming that energy consumption increases with the number of channels sensed.
In [18], the authors used data mining with historical data and ML approaches of the Dirichlet process for spectrum sensing in CR-VANETs. They used AP at the start and end of the road to collect and update sensing data from vehicles for improved spectrum sensing. In [66], a Bayesian classifier was used for centralized spectrum sensing, and in [97] and [184], non-parametric Bayesian was used for efficient cooperative spectrum sensing. Game theory approaches were utilized in [185] and [186] for the channel selection issue. In [177], the author proposed an architecture by using RL and case-based reasoning for VANET to enable automatic learning of the radio environment by vehicles. By using several ML tools, the authors in these studies obtained very good performance in spectrum sensing as evidenced by a high probability of PU detection and low probability of false alarm.
In [187], the authors used deep Q-learning to design an optimal data transmission scheduling scheme in CR-VANET for minimizing transmission costs. They used cache memory for the decision. Their scheme's convergence took place after 13,000-20,000 iterations at 28 m/s vehicle speed. Morozs et al. [188] proposed a scheme that integrates distributed Q-learning and CBR to facilitate several learning processes running in parallel. The RL method was considered for the CR network with RF energy harvesting in [189]. The proposed scheme was used for optimum switching between the transmit mode, energy harvesting mode, and receiving mode of the SUs. (To know more about energy harvesting and related technologies, refer to [190]). In [191], the authors proposed a two-stage learning algorithm to reduce the channel sensing period. They used RL and the Bayesian method for learning. Their algorithm selected the best spectrum by using RL and multi-armed bandit and then identified the interval duration between two sensing operations by using the Bayesian method. They aimed to reduce the overall sensing time by determining how much time can be skipped without sensing the channel again for any PU presence. Table 9 summarizes these studies and other relevant research.

2) SPECTRUM MOBILITY MANAGEMENT
Spectrum mobility management, which refers to the spectrum handoff or stay-and-wait phenomenon, is one of the major tasks in CR-VANET. Spectrum handoff means that the SU has to switch to another vacant spectrum to release the currently using spectrum when any PU appears or reappears. Stay-and-wait refers to the situation in which the SU pauses its transmission for a moment until the condition improves again [192].
For a smoother transmission of safety and non-safety messages, appropriate spectrum mobility management is required. It also has to have a long-term impact on network performance. Several challenging issues (e.g., handoff in dynamic radio environment or handoff in multiple radio access networks) can be tackled using ML. In [193], the authors focused on spectrum adaptation (SUs' transmission behavior adjustment, such as packet transfer rate) and spectrum handoff. For the spectrum adaptation issue, they used the raptor codes. For the handoff issue, they used transfer ML in which a learned node teaches or transfers its knowledge to the learning node. This transfer learning reduces the learning time and increases the converge rate. They used transfer actor-critic learning (TACT) for this issue. Here, a ''student'' or learning SU learns from the ''teacher or expert'' SU regarding the spectrum decision. In [192], the authors used TACT for spectrum mobility management. They obtained better results compared with the results of the traditional Q-learning approach. Their primary goal was to design an intelligent spectrum handoff and stay-and-wait decision for rate-less multimedia transmissions in dynamic CRN. They calculated the channel utilization factor to gain knowledge on channel quality and used CDF-enhanced raptor codes to adapt to dynamic channel conditions.

B. ML IN THE SECURITY ISSUES OF CR-VANET
Security is one of the most serious concerns in CR-VANET. Wrong information provided by a malicious vehicle or compromised RSU to legitimate vehicles causes severe damages. For example, wrong information does not allow a vehicle to perform an appropriate projection of the vehicles ahead and might cause accidents. A vehicle can provide wrong information regarding the presence of PU to other vehicles for the exclusive use of the spectrum. A pedestrian might obtain inaccurate information from the vehicle, make an inappropriate decision, and eventually face an accident. Several types of attacks occur in CR-VANET scenarios. Similar to other networking systems, the security issues of CR-VANET can be classified into the following major areas [195], [196]: 1) Confidentiality: Communication should be secret, and only the sender and genuine receiver should understand the message. Third-party users cannot intercept or understand the message.
2) Authentication: The identification of legitimate users is ensured.
3) Authorization or access control: It controls the rights, privileges, and access domain of users.

4) Non-repudiation:
The sender cannot deny sending the message, and the receiver cannot deny receiving it.

5) Data integrity:
The message sent by the sender should not be altered.
6) Network availability: The network and its services should always be available for users. VOLUME 8, 2020  Alexandros et al. in [197] surveyed several security threats in CR and CRN and their mitigation techniques. The authors in [6] described numerous security threats encountered in the CR environment. They reviewed the usages of RL to solve these security issues. Layerwise taxonomies and descriptions of various security aspects of CRN were presented in [198]. Engoulou et al. provided state-of-the-art security issues and challenges in VANET in [196]. Various routing-related security issues were surveyed in [199].
Other survey studies based on the security issues of VANET are available, such as [200], [201], and [202].
A summary of several important security threats identified in the literature on the CR-VANET scenario is presented in Table 10. Several attacks are found only in CR, others are only in VANET (indicated in the table), and a few occur in both networks. Figure 13 shows a typical scenario of PUEA in the CR-VANET scenario. For example, the TV spectrum is used by a vehicle for DSA purposes if no DSRC is available. An attacker wants to use this vacant TV spectrum selfishly. For this reason, it pretends to be a PU and sends a similar signal of PU to the SUs. The vehicles or the SUs consider this the presence of a PU and then release the spectrum to avoid interference. After the legitimate SUs release the spectrum, the attacker grabs the chance to use this vacant TV spectrum selfishly. ML can play a crucial role in mitigating these security issues. The pattern of the attacks or the attackers can be detected using ML. In [16], the authors proposed a two-level authentication of PUs and used SVM to train the system to detect the PUEA attacker. Li and Peng [17] used an unsupervised ML approach to solve PUEA and SSDF attacks. They assigned an adaptive identity value for identifying each SU to overcome the identification error and increase reliability. A malicious traffic detection technique that uses an AI-based jamming detector was proposed in [204]. The authors used deep learning to ensure malicious-free cooperative awareness message (CAM) communication.
To mitigate jamming attacks, learning about the radio channel model and the methods used for jamming is required. Otherwise, it becomes a challenging and complex task. According to previous observations, a user can learn an optimal policy by using ML to address such a challenge. In [220], the authors proposed a 2D anti-jamming communication scheme for CRN. In their scheme, an SU exploits the spread spectrum and user mobility to address this attack. They used deep Q-network or learning (DQN) to enable an SU to learn the optimal frequency hopping policy and decide whether to leave the jamming area. In [221], Xiao et al. formulated the power interaction between two SUs and a jammer as an anti-jamming transmission game. Given that the learning process of Q-learning is slow, they used Q-learning and ''win or learn fast'' principle with the hill-climbing principle (WoLF-PHC), which is a multi-agent scenario of Q-learning, to mitigate the jamming problem with the help of the relaying concept. The authors in [222] discussed the exploitation of the MDP model and the Q-learning algorithm to solve jamming attacks. They also emphasized on the Q-learning algorithm to make it achieve rapid convergence. Another approach was found to accelerate the learning speed of Q-learning as a jamming mitigation technique in CRN in [223]. The authors used QV learning (a value function-based Q-learning) and SARSA to replace minimax-Q learning (a variation of Q-learning). Q-learning was also used to mitigate jamming attacks in [224], [225].
RL can be used to solve security issues related to the cooperation in CRN, such as Byzantine attack, SSDF, and CCDA. The authors in [226] used RL to teach each CR user to autonomously decide with whom to cooperate by learning cooperator behavior. The RL algorithm defines the appropriateness of the available cooperators and selects the most suitable ones to cooperate with. In [227], a reputation scheme was proposed with the help of the RL algorithm and on-policy Monte Carlo method to avoid malicious users.
In VANET, an attacker can inject wrong information (it may be spectrum-related or routing-related). The attacker can also compromise the roadside sensors to inject faulty data to legitimate vehicles. As a result, vehicles might miscalculate the safe spacing among them, which and eventually leads to accidents. For example, as illustrated in Figure 14, the attacker injects malicious code to the sensors to compromise them. As a result, these sensors and the attacker send the wrong message to the bus. The bus miscalculates the safety spacing and might cross the safety spacing with the ambulance, eventually colliding with it.
Several studies, such as [228], were performed to solve this issue. The authors focused on wrong message detection and proposed to impose a fine on misbehaving vehicles. However, in these algorithms, the attackers' actions are assumed stable. This is unsuitable for practical implementation. ML can be used by the vehicle to learn about the attackers' actions based on their time-varying observations [229]. The pattern of the   [203]. attacks can be tracked by using ML. Therefore, if any similar situation occurs, the attack can be easily detected. Similarly, the attacker's history can be recorded; therefore, if the same vehicle sends any messages, detecting and discarding such messages would be easy. The authors in [210] proposed an ML approach to classifying various misbehavior in VANET. They used concrete and behavioral features of each vehicle that transmits safety messages. Their designed framework was used to differentiate a malicious vehicle from a legitimate one. Their scheme can be adopted to solve several security issues, such as Sybil, position forging, and identity spoofing attacks. They analyzed the features of vehicles, such as their geographical position, accepted range with respect to RSU, speed deviation, received signal strength, packet drop and capture ratio, and error rate. Another misbehavior detection scheme using the ML method was introduced in [230]. The authors used the feed-forward back-propagation ANN classification method in their proposed scheme. Zhang and Zhu proposed a privacy-preserving ML-based collaborative intrusion detection system (PML-CIDS) architecture [231]. This approach enables vehicles to collaboratively exchange information and share knowledge to detect the misbehavior of malicious vehicles. To mitigate several malicious attacks in VANETs, SVM was used in [77]. The developed framework can determine the boundary between malicious and legitimate vehicles. The authors modeled contextual information, such as velocity, temperature, and altitude, as SVM's feature vector. The authors in [232] used KNN and SVM to detect and classify the misbehavior of malicious vehicles. Another misbehavior detection approach using SVM was proposed in [233]. An IDS was developed by using ANN and fuzzified data to detect black hole attacks [234]. The system, which can detect misuse and anomaly, utilizes features extracted from the trace file as auditable data.
In [235], the authors proposed a collaborative security attack detection mechanism by using multi-class SVM to detect various types of attacks dynamically. In their scheme, a group of vehicles analyzes the incoming flow and sends flow information to the controller, which trains the multi-class SVM in a centralized manner. Subsequently, the controller creates an SVM classifier and directs it to all vehicles. As a result, the vehicles can classify the types of attacks from the new incoming flow. This approach can protect against any attack. In [206], the authors used supervised learning to mitigate DoS or DDoS attacks. They employed two open-source network intrusion detection systems (NIDS), namely, Bro and Corsaro, and two supervised ML approaches, namely, classification and regression tree (CART) decision tree and naive Bayes classifier. Aneja et al. proposed a hybrid IDS using ANN as a classification engine and a genetic algorithm as an optimization engine for feature subset selection to mitigate flooding attacks [236]. Yang et al. proposed a Sybil detection scheme based on mobility similarities among vehicles by using three ML classification models, namely, naive Bayes classifier, SVM, and decision tree [237].
They extracted mobility features from vehicle trajectories and trained to differentiate the attacker from an honest vehicle. To counter location forgery, they utilized base stations (BS) as the location certifiers.
In [238], the authors used hotbooting (a type of transfer learning) policy hill climbing (PHC, a model-free RL technique for the mixed-strategy game) based unmanned aerial vehicles (UAVs) to relay messages of vehicles to mitigate jamming attacks in VANET. A location verification system using DNN was proposed in [239] to mitigate several routing falsification attacks, such as position forging attack, wormhole, and gray hole. The authors proposed their scheme based on time of arrival (ToA) measurements from several verifying BS in vehicular networks. Another study [240] used swarm algorithms of AI to detect several routing-related security threats. In [241], a physical layer rogue edge detection (RED) scheme was proposed by using a Q-learningbased authentication system to mitigate MITM attacks. The authors used ambient radio signals and the received signal strength indicator (RSSI) of packets, by which they modeled the RED process as a dynamic spoofing detection game. Here, Q-learning was used to enable a vehicle to achieve the optimal authentication policy. Security is one of the most serious concerns and challenges in the CR-VANET scenario because spectrum sensing and data transmission attacks can occur simultaneously. CR-VANET is more vulnerable than CRN and VANET individually. Therefore, the combined mitigation policy should be considered. A very small error or minor mistake can lead to a massive accident. A vehicle might collide with another vehicle just because of wrong message reception or when compromised. ML can be used with CR-VANET to alleviate several security threats. It can be utilized to identify malicious vehicles, the misbehavior of vehicles, and the pattern of attacks and for original and fake message identification. Table 11 summarizes several of the works mentioned above.

C. ML IN ROAD SAFETY
This subsection discusses various road safety aspects where ML can be applied to ameliorate the overall performance. Figure 15 shows various applications of ML to ensure road safety.
VANET is mainly applied to reduce road accidents and fatalities. Every year, approximately 1.25 million people die due to road accidents. A report showed that around 90% of these accidents is due to human errors (speeding, not detecting the risk, slow response of drivers, abrupt lane change, drowsiness, and so on). Therefore, 90% of road accidents can be avoided by using several intelligent vehicle assistant technologies [242].
A total of 60% of road accidents can be avoided if the driver receives the safety message even before 0.5 seconds of the accident [243]. ML has significant contributions to advanced driver assistance systems (ADAS) for the AV system. Intelligent vehicles using ML can inform the driver or warn of severe dangerous situations prior to an accident. Safety messages might not be exchanged due to the shortage of DSRC. Therefore, dynamic spectrum access is required to allocate an emergency spectrum for safety message exchange. The integration of CR and VANET plays a crucial role in reducing road accidents. However, the CR process (sensing, selecting, adapting, and mobility) should be rapid to ensure on-time safety message reachability. In this regard, ML is the best candidate to ensure such QoS for safety messages. Falsified information, jamming, and other security issues also lead to road accidents. ML is the optimal counterpart to tackle these security threats (discussed in the last subsection) and eventually contributes to reducing road casualties.

1) BARRIER DETECTION, CRUISE, AND LONGITUDINAL CONTROL
Obstacle detection is one of the essential elements to reduce road accidents. The position of obstacles or any other front or back vehicles can be measured by using various sensors (RADAR, LIDAR, and camera) embedded into cars, roadside sensors, and GPS. A collision can be avoided if a safe space exists between vehicles. By gathering data from these sources and by using ML, a warning message can be sent to the driver or vehicle (for AV) for emergency braking or slowing down (or steering to the left or right) if any barrier or obstacle is encountered. Figure 16(a) shows that vehicle A senses its surroundings for obstacle detection. If A speeds up, then it would collide with C; if it suddenly brakes, then it would collide with B. Besides, it cannot go left or right due to other obstacles. In this scenario, the speed of A should be balanced to provide a warning message to B and keep a safe distance from it.
ML is useful in training vehicles in these scenarios. In [244], the authors proposed a general framework for robust on-road pedestrian and vehicle detection, recognition, VOLUME 8, 2020    and tracking based on a deep learning approach. For their framework, they initially produced a robust disparity map under various driving conditions by using the adaptive U-V disparity algorithm (refer to [245] for details on U-V disparity). After detection, they classified obstacles into vehicles, pedestrians, and unknown objects by using the tile convolution neural network. Then, another deep learning algorithm was developed to track detected obstacles in the consequent frame. Dairi et al. in [246] proposed a detection scheme by using a stereovision-based method for an urban vehicular network. They used a deep-stacked auto-encoder (DSA) model with the KNN classifier to accurately and reliably detect the presence of obstacles. They utilized three real-life datasets, namely, the Malaga stereovision urban dataset (MSVUD), the Daimler urban segmentation dataset (DUSD), and the Bahnhof dataset. Another comprehensive work is [247]. Here, the authors proposed a learning-based driving event classification method by using decision trees and linear logistic regression to detect obstacles. Figure 16(b) shows that vehicle C detects a pedestrian in front; thus, it needs to perform emergency braking. However, if C presses the brake, it would collide with A and A with B. Therefore, C should send a warning message to A to slow down or to brake, and similarly, A should send the message to B. These emergency braking and warning message transfer should be performed first. This situation can be trained to the vehicles by using ML. As a result, they receive an automatic warning message to brake or to slow down. In [248], Chae et al. used DQN to design a system for autonomous braking. Their system automatically decides whether to apply the brake when facing the risk of an accident by using obstacle information obtained by sensors. In their proposed system, the reward is achieved when the vehicle eliminates the danger as early as possible.
Adaptive cruise control (ACC) is a system where the vehicle's speed and acceleration are maintained automatically. This is done based on the obstacle ahead or to keep a safe distance from the front vehicle. Initially, the system sends a warning message to the driver (for the driving vehicle), and if the driver takes no action, then it automatically adjusts the speed. For a driverless vehicle, the entire process is executed automatically. In [249], a cooperative ACC (CACC) system with the help of RL was proposed. The authors emphasized on V2V communication to exchange the safety message for ACC. They used RL to design a controller for the safe longitudinal following of a front vehicle. Another CACC work is found in [250]. Here, the authors used supervised learning (trained with real driving data) and actor-critic RL (in this RL, the value function and action policy are approximated and suitable for problems where the model information is minimal; it is also known as neural dynamic programming) to obtain an adaptive system. Zhu et al. proposed an adaptive longitudinal control method by using actor-critic RL in [251].

2) LANE CHANGING ASSESSMENT
Another reason for road accidents is uncontrolled lane changing and keeping. This lateral control is highly required in road safety. Figure 16(c) shows that car A should change lane from the current lane to the left to maintain its speed and avoid collision with C. In the autonomous car system, this is performed by assessing the obstacles surrounding the car and other vehicles' position, speed, acceleration rate, and steering torque. A vehicle might have to change its lane to free that lane for an ambulance or emergency vehicle. Further details on this issue can be found in [252].
Several studies have found that various ML approaches can be used to train vehicles in terms of lane changing. In [253], the authors used DQN to train a vehicle to handle speed, overtaking, and lane change decisions. They compared their work with the reference model consisting of IDM (used for modeling the longitudinal dynamics of a vehicle) and MOBIL (used for lane changing decision) models and found that their work is better. Kim et al. used ANN and SVM to design an algorithm for the accuracy improvement of the classification of the lane change intention of a driver [254]. They used various onboard sensors for the basic measurements. In [255], SVM was used to classify the driver's intention of lane changing. The authors in [256] compared the accuracy performance of various supervised learning approaches, such as SVM, naive Bayes, logic regression, nearest neighborhoods, decision trees, extra trees, and random forest classifiers, in lane changing modeling.

3) MITIGATING SECURITY ISSUES
A security vulnerability is one of the major reasons for road accidents, especially in autonomous smart vehicles. A vehicle might be affected by wrong information due to various security attacks. As a result, it might take inappropriate decisions (inability to detect obstacles, a miscalculation in longitudinal and lateral control, and dangerous lane changing) that lead to severe road casualties. ML can play a significant role in mitigating the security threats that affect road safety. Section III.B has discussed these security threats and the usages of ML for their mitigation.

4) DRIVER VIGILANCE MONITORING
One of the main causes of road accidents is the distraction of drivers. Appropriate intelligent driver vigilance monitoring is mandatory to secure roads. In this monitoring system, cameras and embedded sensors are used to monitor real-time facial expressions of drivers. The data are processed to assess drivers' emotions (stress, anger, etc.) or whether or not they are sleepy On the basis of this assessment, an intelligent vehicle takes appropriate actions (may send warning messages, slow down the car, or slowly park on the roadside safely) [257]. ML is a tool to train vehicles regarding drivers' vigilance monitoring. Similar to [258], the authors used SVM with Hu invariant moments to design a real-time eye detection method. This method can assess whether or not the driver focuses on the road (by judging eye movements and openness). If the driver is not focused, then the vehicle would send an alarm to the driver. Ding et al. proposed a method to detect drivers' postures by using pressure sensor data and the SVM classifier [259]. They placed pressure sensors between the driver and the driver's seat to collect data. The method can assess a driver's movement and activities (whether or not the driver is drowsy or inattentive to driving) by classification. In [260], an SVM-based drowsiness prediction method was proposed. The authors used eyelid-related parameters to design their prediction models.

5) ROAD SIGN AND TRAFFIC SIGNAL IDENTIFICATION
Appropriate identification of road signs and traffic signals is a key issue in road safety. If the detection is inaccurate, then the vehicle will take inappropriate actions that lead to accidents. Therefore, accurate identification and exact action based on road signs or signals are crucial to ensure road safety. For example, as shown in Figure 16(d), if car A cannot detect the red signal and does not stop, then it would collide with B. Then, car C has to detect the speed breaker in front of it to avoid any casualty. For AVs, road signs and traffic signals must be appropriately learned; otherwise, severe accidents would occur.
In [261], the authors used ANN for real-time traffic sign classification and identification. They classified signs into different shapes (triangle, square, etc.) and colors. Then, based on the shape and color combination, they classified signs into different classes, such as danger, information, obligation, or prohibition. After appropriate sign detection, it sends an alert to the driver or vehicle to take the appropriate action. A convolutional neural network (CNN, a deep learning method) was used to recognize traffic signs in [262]. In [263], a traffic light and sign detection mechanism were designed. The authors used modified CNN in their real-time experiments and a mini-batch selection mechanism to train vehicles on a traffic light and sign datasets simultaneously.

6) SAFETY MESSAGES AND QoS
These accident reduction schemes are dependent on safety message exchange. If a message does not reach the driver or the system, then the vehicle cannot operate at all. For example, if the vehicle does not receive any safety message from the front vehicle to slow down or to stop, then it will collide with it and result in casualties. Safety messages are of two types, namely, alarm and beacon; such messages must be reliable and have very low latency. For safety message exchange, the latency must be less than 100 ms [264]. However, in high-traffic situations or serious traffic congestion in urban areas, the allocated DSRC might be exhausted. Therefore, the CR concept was introduced. Under this condition, a vehicle searches for a vacant spectrum and accesses this spectrum opportunistically. Therefore, CR with VANET plays a very significant role in accident reduction. A vehicle must execute the CR process in a fast mode because the message latency is very low. The timely delivery of safety messages is a challenging task in VANETs due to vehicles' high-speed mobility and random traffic environments. ML is an effective tool in this regard. We have discussed the roles of ML in spectrum sensing issues in Section III.A. We have observed that several ML methods are used for fast spectrum sensing so that a vehicle would receive the spectrum rapidly and can communicate without any delay. We now focus on other issues related to safety messages and the roles of ML in such.
In [13], the authors addressed safety and QoS concerns in a V2I scenario. They used DQN that learns an energyefficient scheduling policy from inputs corresponding to the characteristics and requirements of vehicles located within the range of an RSU. Aside from having road safety and acceptable QoS, their policy is expected to prolong the lifetime of battery-powered RSU. In [265], the authors proposed a data collection protocol by using the distributed Q-learning algorithm. They used the relaying technique in their proposed scheme. Unnecessary network overhead can cause congestion in the radio network of VANET. Therefore, message exchange methods should use reduced overhead and communication costs. A clustering-based learning algorithm was proposed to ensure such a low communication cost and network overhead in [266].

7) DATA CONGESTION
On roads, especially in urban areas, the presence of many vehicles creates data congestion. Data congestion occurs, particularly at the intersection points of roads. Network congestion occurs when all channels are occupied in a highly dense network; as a result, packets are lost and face delay, which eventually degrades the network performance. An appropriate congestion control mechanism is required to overcome this problem. In [90], the authors proposed an ML-based congestion control mechanism. They used RSU to control congestion with the proposed hybrid centralized and localized strategy by using a k-means algorithm. Their mechanism was used to cluster the messages used in VANETs. The parameters included the size, duration, type, and directions of the messages and the distance between the vehicles and RSU. The authors in [187] used deep Q-learning to propose a data transmission scheduling strategy for minimizing transmission costs and delays. They considered the CR spectrum, vehicular caching, the link between various transmission modes, the vehicle's mobility, and the QoS requirement.

8) VEHICLE's HEALTH MONITORING
Accidents sometimes occur due to the system failure of vehicles. Several subsystems of a vehicle can fail at any time and can lead to accidents. These subsystems include fuel, ignition, exhaust, braking, and cooling systems [269].
For example, if the braking system suddenly fails while a vehicle is on the move, then a fatality might take place. If the driver can monitor the braking system early or is notified of the fault of the system early, then he could avoid the accident. Therefore, an appropriate vehicle health monitoring system is required. The system must have the ability to detect, correct, and predict failure and provide an appropriate messaging system.
A fault detection, prevention, and correction mechanism can be designed using several sensors and ML. In [269], the authors presented a vehicle monitoring and fault predicting system. For fault detection, they used four classifiers, namely, decision tree, SVM, random forest, and KNN. To collect the data, they utilized various sensors in a Toyota Corolla car. A driver can know about the internal conditions of systems and becomes aware of any future failure by using the system. An engine fault detection mechanism was proposed in [270] by using the Hilbert-Huang transform (HHT) and the SVM ML approach. Engine faults can be detected by analyzing the current performance, lubricating oil, vibration, and noise. Table 12 summarizes the papers mentioned in this subsection.

D. ML TO REDUCE TRAFFIC CONGESTION
We have discussed the impacts of traffic congestion. It does not only affect the economy but also our daily social lives. It cost approximately $305 billion in 2017 in the U.S. alone [3]. Approximately 4.8 billion hours are wasted cumulatively, and 1.9 billion gallons of fuel are wasted globally [271]. It increases the stress level of drivers, thereby leading to road accidents. VANET has emerged as a solution to reduce the level of congestion, and CR is an integral part of it. Therefore, CR-VANET greatly affects traffic jam reduction. ML is a potential candidate to enhance the performance of all aspects of CR-VANET. Figure 17 shows the areas of CR-VANET where ML can be applied to reduce traffic congestion.

1) TRAFFIC FLOW PREDICTION
Retrieving live traffic information has become easy with the help of ITS and the advancements of the Internet of Things (IoT). Live and stored historical data can help predict traffic flow, which is h important for congestion reduction. They also help reduce fuel consumption and carbon emission. ML is a proven tool to achieve high prediction accuracy in real-time environments. In [272], the authors used big data and the VOLUME 8, 2020 deep learning (deep-layered hierarchical NN) approach for traffic flow prediction. They utilized stacked autoencoders to determine generic traffic flow features and the greedy layerwise algorithm for training purposes. They compared the results with those of other supervised learning approaches, such as SVM, backpropagation NN and random walk forecasting model, and radial basis function NN. They claimed that their proposed method provides better results than others, with over 90% forecasting accuracy.
In [273], Ide et al. designed a model for traffic flow prediction. They correlated LTE data traffic with vehicular traffic to design their model. After the analysis, they identified Poisson regression trees as the best candidate for traffic flow prediction. The online learning weighted SVR approach was proposed for short-term freeway traffic flow prediction in [274]. In [275], the authors proposed a deep learning method for traffic speed prediction. They used CNN in its proposed scheme. A Bayesian model was adopted to design a model for traffic flow prediction and used it for experimentation in the urban area of Beijing [276].

2) ROUTING AND LOAD BALANCING
Routing and load balancing of traffic is another solution to reduce traffic congestion. The VANET environment, despite its challenges, has advantages, such as a depiction of potential patterns of everyday traffic. By using ML, these patterns can be further exploited to establish a proper routing of traffic and for load balancing to reduce traffic congestion.
Road traffic conditions can be determined by using and analyzing satellite images, GPS measurements, various sensor data, and drivers' cell phone data. As a result, a driver can be informed about road traffic and can avoid congested roads. Deep RL can be used to analyze these data.
Berkeley Laboratory scientists, in collaboration with UC Berkeley, used deep RL to achieve congestion-free roads. Their traffic congestion reduction project was known as Congestion Impact Reduction via Connected and AV-in-the-loop Lagrangian Energy Smoothing (CIRCLES). This project was based on the open-source software framework called ''Flow.'' Their aim was to reduce traffic jams and save energy. ''Flow'' trains vehicles to learn about the behavior of the front and back vehicles and take appropriate actions. They have another project called ''DeepAir,'' in which they used deep RL and satellite imaginary to estimate air quality impact (wind speed, pressure, precipitation, and temperature). This project provides an insight into the sources of pollutants and helps design appropriate routing and load balancing of traffic [277]. In [278], the authors used Q-learning and ANN to assess policies regarding the maximum driving speed allowed on highways so that traffic congestion is avoided. They considered traffic prediction in their scheme.

3) SMART PARKING
In many situations, an inappropriate parking system causes serious traffic congestion, especially in crowded urban areas. A driver takes a long time to park due to the lack of knowledge regarding the parking space; as a result, a long queue is created. On average, vehicle users spend 7.8 minutes for parking purposes. This leads to approximately 30% of the traffic flows in cities and this causes traffic congestion especially in peak hours [279]. Moreover, inappropriate parking hampers normal traffic flow. Therefore, appropriate parking management is required to alleviate traffic congestion.
Existing manual methods for parking management are inefficient, time-consuming, and annoying. Therefore, researchers have selected various ML approaches to achieve a smart and effective parking system. ML-based parking systems provide accurate and real-time parking information without the need for expensive infrastructure. Automated smart systems such as the Parking Guidance and Information (PGI) system integrated with the ML can alleviate such issues. Camera and sensor-based systems are widely studied in the literature. Incorporating ML with these systems would provide more accurate, robust and faster detection for the free and occupied parking lot [280]. ML is also capable of offering predictions of parking occupancy in advance. ML provides more accurate parking occupancy forecasts, this gives improved parking guidance for the vehicle users and reduces the time needed for the parking purpose.
A parking guidance and information (PGI) system was designed by using deep CNN and binary SVM classifiers in [281]. The authors used public datasets (PKLot) with variations of illuminance and weather conditions. In PKLot, 12,417 images of three parking sites are available, thus generating 695,899 segmented parking spaces that are labeled in the package. Deep CNN was also used in [282] to detect vacant parking spaces. In [283], the authors proposed an illegal vehicle parking detection system by using deep learning. They utilized the single shot multibox detector (SSD) to design their detection model. Their system analyzed the state of tracked vehicles to determine whether a vehicle is illegal or not. A visual parking lot occupancy detection system was proposed by using CNN in [284]. The system only requires smart cameras; hence, it is simple and costeffective. The authors performed experiments on PKLot and their dataset (now publicly available). A Bayesian framework was designed to detect vacant parking spaces in [285]. The proposed plane-based method adopts a structural 3D parking space model, which has abundant planar surfaces.

4) ADVANCED TOLL SYSTEM
Traffic congestion is a regular phenomenon at tolling stations. A long queue is created due to the manual and/or slow tolling system, thereby leading to traffic congestion. Therefore, the tolling system can be developed by using advanced techniques, such as IoT and ML. Another approach to reducing traffic congestion is to implement a congestion or cordon fee (i.e., every vehicle is charged a toll when it uses the specified cordon or road of an urban area where traffic congestion is very high). For example, Singapore has introduced electronic road pricing to charge vehicles when they enter a certain cordon, and London charges vehicles operating within the Congestion Charge Zone [286]. To obtain a smarter congestion fee system, the authors in [286] used the RL algorithm to model a distance-based dynamic tolling system. In their model, no specified toll station is available to collect tolls; instead, roadside sensors are used. Here, vehicles can freely enter the toll lane at any point. The toll is calculated based on the vehicle's entry location and is controlled by the tolling system.
For automated toll collection, vehicles need to be appropriately classified and verified. This task is challenging, especially for heterogeneous vehicular traffic environments. The authors in [287] used SVM to classify vehicles and k-means to cluster vehicle signatures (where the class labels of vehicles are unavailable). Their methodology had four phases, namely, signal denoising, signal segmentation, feature extraction, and classification.

5) ADAPTIVE TRAFFIC SIGNAL CONTROL
Traffic congestion occurs due to the inefficient control of traffic signals. To reduce congestion, an adaptive and intelligent traffic signal mechanism is required. Gao et al. [288] proposed a deep RL algorithm that automatically extracts useful features from raw and live traffic data (position, speed of vehicles, vehicle queue length, etc.) and learns the optimal policy for adaptive traffic signal control. They used machinecrafted features instead of human-crafted ones.
In [289], an intelligent traffic light control system based on Q-learning and neural networks was proposed to determine signal light times to minimize total delays in an isolated intersection. The authors used detectors to calculate current traffic at an intersection and extended or terminated the green time based on this information. In [290], the authors proposed a scheduling scheme for traffic signals in multi-intersection vehicular networks by using Q-learning and feedforward neural networks for value function approximation. A similar work was presented in [291] by using Q-learning. The authors also implemented their algorithms on open-source Java-based software called Green Light District (GLD). Table 13 summarizes the papers mentioned in this subsection.

E. ML IN RESOURCE ALLOCATION IN CR-VANET
The number of vehicles is increasing rapidly. Substantial amounts of messages should be exchanged because several new services and features are added regularly. However, the allocated resources (time, frequency, etc.) are limited. An appropriate resource management system is required to accommodate such a massive number of vehicles and their services with the minimum required QoS. ML has great potential to perform this RA job for CR-VANET.
Learning RA strategies directly by gathering experiences from the dynamic environment is more practically suitable and effective than traditional heuristic-based schemes. Concerning the research question, ''Can systems learn to manage resources on their own?'', the answer can be found in [292], in which an experiment was conducted using deep RL and DNN.
In [293], the authors used DQN to formulate the RA strategy as a joint optimization problem for CR-VANET. They jointly addressed three underlying resources enabling vehicular applications, namely, networking, caching, and computing, to enhance the performance of vehicular networks. Their proposed framework used the ideas of information centricity, which originated from information-centric networking. Their framework could enable dynamic adaptation of networking, caching, and computing resources to satisfy the QoS of different services of VANETs. The same work was extended in [294] with more analysis.
The cumulative energy consumption by the information and communication technology industry reached 616 TWh in 2013, and it is predicted to grow to 910 TWh by 2020; the annual carbon emission is expected to reach 235 Mto by 2020 [190]. We already discussed the spectrum shortage issue. Energy-and spectrum-efficient RA strategies are required. Zhou et al. [295] proposed an RA scheme for real-time performance with a simple implementation method. They designed their system by using DNN, and they presented a training method to train neural networks. In [296], the authors considered the input and output of the RA algorithm as an unknown nonlinear mapping. If it is learned accurately and effectively by using DNN, then real-time RA is possible and requires only a few operations. An interesting DNN-based hierarchical predictive RA scheme was proposed in [297]. Prediction can be made based on the mobility and traffic load related to user behavior. The end-to-end prediction method accelerated the performance of under-utilized networks by predicting behavior-related information from historical data.
In [298], a DQN-based decentralized RA mechanism was presented for V2V communication. In this work, each V2V link is regarded as an agent and can make its own decisions to find the optimal spectrum and power for transmission. The proposed scheme did not require any global information for the agent, needed a minimal transmission overhead, and overcame the issue of the latency constraints of V2V messages. These advantages are difficult to achieve in traditional RA schemes where ML was not used.
V2I or V2R links require appropriate RA schemes that can tackle the inherent challenges of heterogeneous demands for resources and strict QoS requirements. The work in [299] focused on these issues. They used MDP in the RA scheme, in which the resources allocated for the long term are minimized. They also provided a state-of-the-art vehicular cloud model that combines resources from individual devices and systems in VANET and traditional cloud.

F. ML IN SPECTRUM-AWARE ROUTING IN CR-VANET
VANET routing is used to select the best path between the source and destination vehicle through a set of other nodes (might be other vehicles, RSU, and so on); thus, the message VOLUME 8, 2020  can be transferred with the best QoS (minimum allowed latency, maximum possible throughput, etc.) This routing is required especially for vehicular safety message exchange where the end-to-end delay must be less than the threshold value and the reliability must be high. The VANET network is changed frequently because the mobility of vehicles is high. This rapid change in network topology causes a delayed transfer of messages and data losses. The traditional routing protocol cannot cope with the dynamicity of VANETs. Therefore, robust and adaptive routing protocols should be. available for VANET [300]. More details on the routing in VANET can be found in [301], [45].
Software defined network (SDN) is the blessing technology for the CR-VANET. SDN is a technology that can manage the whole network efficiently and transform the complex network architecture into a simple and manageable one [302]. Non-SDN supports only vendor-specific policies and offers no flexibility for dynamic network environment but SDN is capable of these. A network administrator can control traffic from a centralized control console without having to touch individual switches, routers or other devices in SDN. Due to SDN, the control of the routing processes in CR-VANET become very easy.
For CR-based routing, the routing modules must be aware of the surrounding radio environment. The cooperation between routing modules and spectrum awareness must be strong. The routing of CR depends on how spectrum information is gathered. The routing engine is provided with spectrum information in three ways, as follows [303]: i) by the external entities or database, ii) locally by each SU, and iii) hybrid (a mixture of i and ii).
The routing in CRN is highly dependent on the entire CR cycle and the behavior of PUs. It is also influenced by QoS metrics, such as nominal bandwidth, throughput, delay, and energy efficiency, with path stability and the presence of PUs [303].
For example, if the activity of PU is from moderate to low, then the topology of the SUs is relatively static. As a result, maximum QoS is achieved. On the contrary, the sudden arrival or re-arrival of PUs causes unexpected route failure. Instant rerouting is required for seamless communication. Therefore, the routing of CRN should be dynamic, adaptive, and intelligent. ML can be used to find the vacant spectrum rapidly and can predict the PU's behavior (when the PU is absent or when it reappears). These tasks are necessary for a stable and effective routing protocol for CRN. To determine the various types of CR routing with their features, advantages, and disadvantages, interested readers should refer to [303], [304].
CR-VANET is dynamic in nature, and route selection is one of its biggest challenges. Finding the spectrum hole and high speed of vehicles is among the major issues considered for selecting the routing protocol in CR-VANET.
Therefore, routing protocols specifically for VANET or routing protocols for CRN are not directly applicable to CR-VANET cases. Unstable and inappropriate routing leads to delay in the network, thereby reducing the overall performance of the network. To ensure stable routing, which provides improved QoS and energy efficiency by reducing end-to-end delay, ML tools can be used in CR-VANETs. Figure 18 shows a typical routing situation. By using a routing mechanism, the source selects the path SU1→SU2→ SU4→SU6→SU9→SU10 as the best path because it has high bandwidth, low delay, low presence of PU, and high reliability. The route SU1→SU3→SU8→SU10 has a small number of hops and high throughput, but it is avoided because it is prone to the presence of PUs. On the contrary, due to the high-speed mobility of SU4, it might stay out of the range of SU2 for the transmission because the best route might fail. In this case, the best alternate route might be used, i.e., SU1→SU2→SU5→SU8→SU10. In summary, the overall routing in CR-VANET is different from traditional VANET or CR routing. Therefore, this CR-VANET routing is challenging to tackle.
In [305], the authors proposed an SDN-based routing protocol by using the belief propagation algorithm for CR-VANETs. They found that their routing protocol is more stable and performs better than the traditional routing protocol for CR-VANETs. In this scheme, two vehicles can only communicate when they have agreed to use the same vacant channel. This scheme considers spectrum sensing and routing simultaneously. To solve the routing issue, the authors in [306] used a clustering technique that improves the network by reducing the excess routing overheads. It is also used to obtain a stable network because it reduces the effects of the dynamicity of channel availability. The authors designed a cluster-based routing protocol using RL and named it SMART. The authors in [307] used the RL algorithm to design a routing scheme (they called it weighted cognitive radio Q-routing or WCRQ-routing) for CRN. They investigated the effects of various attributes of RL, such as reward function, trade-off between exploitation and exploration, and convergence rate.
Experiments to validate VANET routing studies are difficult to conduct in real-life scenarios due to the high cost and risk involved. Therefore, to model and simulate the VANET environment, several mobility models were proposed. Prominent mobility models for VANETs include random waypoint, random walk, Manhattan grid, freeway, reference point group, and Gauss-Markov mobility models [300].

G. ML IN INFOTAINMENT IN CR-VANET
Infotainment refers to information and entertainment broadcasting. In VANET, value-added services, such as entertainment and advertising, are provided along with the safety message communication. Live streaming video communication, for example, is not only used for entertainment/ advertising but also accident management. By watching the live video of an accident case, traffic police or rescuers can make robust and effective decisions. Meanwhile, passengers can enjoy online services. For example, they can use any social media, video streaming websites, and navigation systems. Roadside companies can send an advertisement to vehicles to market their products or services. Nearby authorities can also provide warning or safety instructions directly to vehicles. Real-time parking navigation information can be obtained from a nearby parking lot. The potential services of VANET can also be applied to road entertainment or gaming between vehicles. Besides, other user services, such as LIDAR, OBU's sensors, and GPS, should exchange a substantial amount of data. Figure 19 displays such a situation, where a car or the car user can simultaneously experience such services.
The current standards of VANET are WAVE and DSRC that suffer from large packet delay and spectrum scarcity. To overcome these issues, DSA of the vacant license spectrum provides a promising solution. Several channels and relay node selection (intermediary vehicle(s)) mechanisms should be available for smooth infotainment services. Some data include delay tolerance and non-tolerance. Therefore, various applications have various QoS requirements. ML methods can be applied to train vehicles to learn the surrounding radio environment for the spectrum information, the diverse QoS requirement, and the best candidate for relaying appropriate infotainment services.
In [308], the authors proposed a channel selection mechanism for video transmission. They prioritized safety application messages and selected the best DSRC and CR channels for smooth video transmission. They selected the CR channel in which the PU activity is minimal. They also chose a subset of strategic nodes (rather than selecting all) for rebroadcasting the content. Q-learning or DQN can be applied to accelerate the performance of this proposed mechanism. A vehicle would be trained using ML on the behavior of the PU to select the best channel and suitable nodes for the rebroadcasting.
The quality of the transmission for infotainment depends on channel selection, RA schemes, schedules, appropriate routing, and traffic prediction capability. The previous sections and subsections discussed the applications of various ML methods on these dependencies. From such discussions, we conclude that ML is a promising tool to provide the best infotainment experiences to users. The authors in [187] proposed a data scheduling method by formulating an MDP model to analyze the transmission performance of CR-VANET. They considered CR, states of vehicular caching, a correlation between various transmission modes, mobility of vehicles, and QoS data requirements. In their proposed optimal data transmission schedule scheme, they used the deep Q-learning method and the vehicle's caching to minimize the overall transmission costs.
In [309], the authors proposed a content-aware and on-demand clustering technique for video streaming in VANET. Here, vehicles with the same video requirement and mobility features are clustered. The authors constructed an overlay tree based on the relation between supply and demand of the videos in the VANET scenario. Various ML clustering techniques (e.g., k-means or Dirichlet process) can be integrated into their approach to enhancing performance. ML-based video admission control and resource management algorithms were proposed in [310]. The authors developed a scheme by using ML that can extract the quality-rate characteristics of unknown H.264-encoded video frames. They used unsupervised feature learning with supervised classification techniques. Then, they were able to estimate the QoE parameters that characterize each video. In [311], the authors proposed a framework called cognition-based networks (COBANETS) that includes cognitive network nodes with an infrastructure for learning. They used modified DNN (called generative DNN or GDNN) and RL to develop the learning tool, by which the quality-rate characteristics of video flows were estimated and QoE-aware RA schemes were exploited.
The medium access control (MAC) standard for V2V communication is IEEE 802.11p (a member of IEEE802.11 or WLANs family). V2V is an ad hoc-based communication technique for vehicles in VANETs. VANETs' vehicle density varies from sparse to hundreds, and all of them are contending for limited channel access. An appropriate MAC is required to cope with this situation, especially for a dense urban network. In [312], the scalability problem of the IEEE 802.11p MAC protocol was discussed. The authors used the RL algorithm to modify MAC for IEEE802.11p to solve such issues. Their proposed MAC was claimed to reduce the packet collision probability and bandwidth wastage.

IV. OPEN ISSUES AND FUTURE RESEARCH DIRECTIONS
CR-VANET is a promising field for future research. ML is another potential area for research. Several works in the area include CR-VANETs, ML in VANETs, and ML in CR; however, studies on ML in CR-VANETs are few. Therefore, researchers should explore this area. This section presents several open issues and future research challenges.

A. ADVANCED SPECTRUM SENSING AND MOBILITY MANAGEMENT ISSUES
The application of ML in these issues has been discussed in Section III.A. However, substantial work should still be conducted. For example, most current studies focus on TVWS, but in reality, other radio access networks or RANs (Wi-Fi, WiMAX, LTE, or 5G) are available and coexist in overlapping.
These RANs have different characteristics and attributes. In CR-VANET, vehicles should have the capability to perform two or more non-safety message (audio or video) transmission simultaneously. However, selecting the optimal network for spectrum handoff is a challenging job for a vehicle. The authors in [313] proposed multiple-attribute decision-making (MADM) methods to solve these issues. However, the use of ML accelerates the performance of the method. Therefore, adaptive ML-based algorithms and frameworks are required to solve these issues. For the best VOLUME 8, 2020 spectrum management, multiple ML methods can be merged. For example, CBR, deep learning, and RL can be merged to perform the SS job in a dynamic manner.
For CR-VANET cases, vehicles or SUs have high-speed mobility, but in most of the cases, the PUs are considered stationary devices or nodes. The simultaneous mobility effects of SUs and PUs should be considered for improved and realistic results. Interferences occur due to PUs and SUs' activities. Shadowing or the hidden terminal problem is another issue for spectrum sensing. Therefore, further work is required to alleviate such an interference and shadowing problem by using ML.

B. SLOW CONVERGENCE OF RL
For CR-VANET, the RL algorithm, especially Q-learning, is the most suitable because it does not require any environmental model or training dataset and has high adaptability to the dynamic environment. However, the main problem of Q-learning is its slow convergence. A longer time is needed for learning purposes. To solve this issue, researchers have suggested combining Q-learning with other ML schemes, such as CBR. Prolonged learning time for vehicles is unacceptable. This issue can be a potential topic. The slowness of Q-learning is due to its inherent functionalities. It learns everything by itself without taking any help. It faces a tradeoff between exploration and exploitation. To obtain rewards, further exploration is needed, and as a result, it consumes much time. Transfer learning is an interesting learning method. It can be applied to reduce the learning time of Q-learning. In this transfer learning (such as TACT, teacher-student learning approach, or docitive learning approach), a vehicle can learn about the surrounding radio environment from other vehicles that have already learned about this environment [193]. For example, a vehicle requires 1000 iterations to learn one maximum-rewarded state-action value. Meanwhile, the state-action pair value for the maximum reward is already known. If this vehicle can transfer this pair value knowledge to the previous vehicle, then it could skip those 1000 iterations. To learn the same Q-value, it does not need to iterate every vehicle. Transferring and sharing can accelerate learning and provide fast convergence. A vehicle might transfer wrong or false learning to the new learning vehicle. As a result, this learning vehicle would be misguided. Therefore, security issues regarding this transfer learning should be explored.

C. OTHER COMBINED SECURITY ISSUES TO BE SOLVED
Along with the individual security threats in CR and VANET, other combined security threats (e.g., JSSDT attacks) should be investigated. For the infotainment issue, several studies were conducted based on V2V communication. Here, ensuring privacy is one of the major challenges. Most security mitigation techniques are based on learning from experiences (e.g., by exploiting the attacker's behavior). Therefore, these techniques cannot solve the zero-day attack (newly invented attack, not stored or experienced previously by the network or vehicles). This zero-day attack can be solved by using classifiers, such as SVM or naïve Bayes, and an expert-labeled dataset [314]. Further studies are required to implement this approach in the CR-VANET scenario.
When using RL, most previous studies considered a small state space, but in reality, the state space is large and dynamic. Multi-agent RL faces the curse of dimensionality (increases the state-action pairs exponentially). As a result, performing functions, such as determining malicious attacks, becomes slow. More work is required to solve ''the curse of dimensionality'' issue so that attack mitigation can be improved. Several attackers also use ML to design their attacks [6]. Highly sophisticated ML algorithms are required to fight such attacks.
MLs are vulnerable to adversarial attacks. In this type of attack, the ML models are fooled by malicious input. For example, if a fake toxic traffic sign is placed on the road, an AV might perform misclassification. The human driver might consider the sign as a ''no overtaking'' sign, and the AV might view it as a ''speed limit'' sign. This misclassification could lead to the cause of fatal road accidents [315]. This type of adversarial attacks can occur in three stages, namely, training, testing, and model deployment. The three categories to defend against this attack are modifying data, modifying models, and auxiliary tool usage [316]. To know more about this attack, the last reference is recommended. This attack is new and highly threatening to the usage of MLs in CR-VANET. Therefore, extensive studies must be carried out in this field.

D. INTELLIGENT AND ACCURATE AVs
In Section III.C, we discussed several vehicular safetyrelated issues that can be solved with the help of ML. We focused on AVs' smart services, such as ADAS, barrier detection, road sign detection, and lane changing, using ML. Several companies work with autonomous or driverless vehicle systems and their intelligence services. Examples include Waymo (formerly known as the Google self-driving project) [317], Tesla's Autopilot [318], and UBER's driverless car project [319]. Although they have revealed the excellent performance of self-driving vehicles by using various ML approaches, they still face many challenges. Their vehicles are still not as intelligent as human drivers. Several casualties have been reported due to these self-driving vehicles. A fatal incident occurred at Tesla in May 2016 in Florida; the driver was killed while the car was in autopilot mode. The incident was due to wrong detection (the car's sensor system failed and could not differentiate between the white bright sky and a large white truck) [320]. UBER's selfdriving car killed a pedestrian woman in March 2018 because it could not detect the pedestrian [321]. The reason for these incidents was the lack of detection accuracy.
Therefore, increased accuracy is needed for real-life experiments. The algorithms, such as DNN or CNN, used for detection purposes must be highly robust to fast-paced vehicles. Highly effective debugging, testing, and verification techniques should be developed using several ML algorithms.
For the high-speed mobility of vehicles, the processes of detection and taking appropriate actions after detection are challenging (especially because these processes must be performed instantly, i.e., without any delay). Moreover, most AVs use onboard cameras for barriers or pedestrian detection. Therefore, it can predict only those barriers or pedestrians that can be captured by the cameras. Unmanned aerial vehicle (UAV)-based, drone-based, or satellite-based imagery can be used to detect any barrier (any other vehicle or obstacle) or pedestrian in advance or can be adopted in curved road areas where the vehicle's visibility might be obstructed. For example, in Figure 20, the left-side car is unable to see the pedestrian and the right-side vehicle in advance. The high-speed instant detection and taking an inaccurate action might lead to an accident. Now, suppose that a UAV captures images of the pedestrian and vehicles and obtains GPS values of these. After capturing the images and GPS values, it sends this information to the vehicular clouds for processing. After swift processing, the cloud sends a warning message to the leftside car that a pedestrian and a car are located in front. After obtaining this warning message, the car becomes cautious and takes appropriate actions (slowing down or changing its lane). These detection processes would become more effective and accurate if Drone2Map (an app that processes raw images captured by drones or UAVs into precise information by using cloud-based mapping and analysis tools, such as ArcGIS) and TensorFlow tool (an open-source ML library) is used with CNN and regional CNN. These techniques can also be used for the smart parking system. UAVs or drones can also be adopted to provide spectrum information to nearby vehicles. As a result, these vehicles are not required to undergo CR processes. These areas require further exploration.

E. SIMULATION TOOLS, TESTBEDS, AND DATASETS FOR ML IN CR-VANETS
Real-life experiments on CR-VANET are complex, risky, and expensive. Nevertheless, a complete testbed for CR-VANET and ML remains lacking. Thus, most studies found in the literature are based on simulations. However, suitable simulation tools that can provide several features (e.g., spectrum sensing, mobility models, traffic classification or regression, or applying any other ML) in an integrated form are lacking.
The three main parts of ML-based CR-VANET simulation are traffic simulation, network simulation, and data analysis. Several separate traffic simulators are available for traffic simulations, namely, network simulators for network simulations and VANET simulators for both traffic and network simulations. Figure 21 displays the simulation tools used for traffic simulations, network simulations, and data analysis.
ML and data analysis tools are used for ML in the CR-VANET perspective. For traffic data and mobility patterns, various traffic simulators, such as SUMO and MOVE, are used [322].
To add the CR features in VANET to VANET network simulation, various network simulators can be utilized, such as NS2/3, NetSim, and OMNet++ [323]. Other simulators, such as Veins and TraNs, are used for both traffic and network simulations. Several tools are employed for ML and data analysis, such as Python's ML libraries, TensorFlow, and MATLAB's ML toolbox. This discussion indicates that for experimenting with ML in the CR-VANET scenario, two or three tools should be used. This arrangement is complex and difficult. Therefore, a single simulation and ML platform is required. Moreover, numerous practical features of CR-VANET (such as security, Doppler effects, interference level, and shadowing issue) should be added in the simulation tools. VOLUME 8, 2020 Few testbeds are available for CR experiments, such as USRP-N210 (or other versions) [334], GNU radio (a free open-source software development tool for SDN) [335], and VT-CORNET [336]. These testbeds can be used for stationary cases or along with a vehicle to obtain spectrum information on real-life scenarios [340]. These testbeds are utilized for the CR perspective only. The data captured by these testbeds are ultimately analyzed using software or data analysis tools. Other testbeds for CR and CR-VANET include Virginia Tech's CORNET [341], cognitive cars testbed [337], ORBIT [338], and UCLA's C-VeT [339]. Building a complete testbed for ML, CR, and VANET is still an open issue. Several real-life datasets are available individually for CR and VANET. Table 14 presents several datasets used for CR and VANETs along with the simulation tools and testbeds. These individual datasets are for CR and VANETs. The real-life implementation of CR-VANET by using ML requires many combined datasets. Moreover, a dataset varies from place to place (due to different policies, requirements, etc.). Therefore, further experiments on dataset generation are required for realistic studies.

F. INTEGRATION WITH BLOCKCHAIN TECHNOLOGY
Blockchain, which was introduced by Satoshi Nakamoto in 2008, was invented to serve as a public transaction ledger of the cryptocurrency called ''bitcoin'' (also known as virtual currency) [342]. It provides a distributed peer-to-peer network where non-trusting members can interrelate with each other without a trusted third party but in a strictly secured manner. Although this technique was intended only for financial transactions, it is currently used in several areas, such as network security. This technique can also be applied with ML-based CR-VANET. A few works are available on these integrated techniques. For example, in [343], the authors used a permissioned blockchain approach to reach a consensus in distributed SDN-based VANET. To overcome the existing drawbacks of the permissioned blockchain, they used deep Q-learning. In [344], Dai et al. used blockchain technology and Q-learning to secure VANETs. In their proposed framework, OBUs in VANET help each other mitigate possible attacks.
In permissioned blockchain, resource caching is a crucial issue. Future work with virtual caching resources can be performed to overcome the drawbacks of blockchain technology. This technology and deep learning can be merged to solve several aspects of CR-VANETs, such as creating strong trust management to reduce falsification or other attacks. Blockchain is used as the decentralized database, TABLE 15. MLs covered in this paper. VOLUME 8, 2020 and ML is adopted to process the data. This approach provides more trusted and reliable results. In general, ML and blockchain can help each other and can accelerate their individual performance. For example, ML can be utilized to provide energy efficiency, rapid computation, and security to blockchain technology. Meanwhile, blockchain can alleviate the flaws of ML, such as providing data and model reliability and tracing the decision-making process of machines (for further improvement). Therefore, working with this blockchain technology in ML-based CR-VANETs is essential.

V. CONCLUSION
VANET has emerged as a solution to ameliorating road safety and traffic congestion, supporting infotainment, and improving the QoE of users. CR was proposed to alleviate the spectrum scarcity issue caused by the exponential growth of VANETs. Therefore, CR-based VANETs or CR-VANETs were considered major research domains in recent years. ML has become an integral part of CR-VANETs to ease complexities and enhance network performance. The amalgamation of ML in CR-VANETs is still at its infancy, but it has great potential to be used in the near future. This survey presented the applications of ML in emerging CR-VANETs. An overview of VANETs and CR was provided. Various ML tools and their taxonomies, applications, and limitations were presented. The usages and recent advancements of ML methods in various aspects of CR-VANETs, such as spectrum sensing, resource allocation, security, and routing, were discussed. The roles of ML in reducing road accidents and traffic congestion were elaborated, and several aspects of the usages of ML in AVs were described. Using ML tools to leverage the benefits of CR-VANETs was also explained. Many other scopes need to be explored given that these fields are still in the preliminary stage. Several of these scopes, open issues, and future research trends were discussed in this paper. Table 15 shows the ML algorithms used in this study and their corresponding topics and reference numbers. RAFIDAH MD. NOOR received the bachelor's degree in IT from University Utara Malaysia, in 1998, the M.Sc. degree in computer science from Universiti Teknologi Malaysia, in 2000, and the Ph.D. degree in computing from Lancaster University, U.K., in 2010. She is currently an Associate Professor with the Department of Computer System and Technology, Faculty of Computer Science and Information Technology, University of Malaya, and the Director of the Centre of Mobile Cloud Computing Research, which focuses on high-impact research. She has performed nearly RM 665 606.00 for High-Impact Research, Ministry of Education Grant, and other research grants from the University of Malaya and public sectors. She has supervised more than 30 postgraduate students within five years. She has published more than 50 articles in Science Citation Index and Expanded Non-Science Citation Index. The proceeding articles were published in international/national conferences and in a few book chapters. Her research is related to transportation systems in computer science, including vehicular networks, wireless networks, network mobility, quality of service, and the Internet of Things. His research interests include wireless sensor networks, the Internet of Things, optimization algorithms, and energy management. He was a recipient of a full scholarship to pursue his master's degree.