Artificial Intelligence Approaches for UAV Navigation: Recent Advances and Future Challenges

Unmanned aerial vehicles (UAVs) applications have increased in popularity in recent years because of their ability to incorporate a wide variety of sensors while retaining cheap operating costs, easy deployment, and excellent mobility. However, controlling UAVs remotely in complex environments limits the capability of the UAVs and decreases the efficiency of the whole system. Therefore, many researchers are working on autonomous UAV navigation where UAVs can move and perform the assigned tasks based on their surroundings. With recent technological advancements, the application of artificial intelligence (AI) has proliferated. Autonomous UAV navigation is an example of an application in which AI plays a critical role in providing fundamental human control characteristics. Thus, many researchers have adopted different AI approaches to make autonomous UAV navigation more efficient. This paper comprehensively surveys and categorizes several AI approaches for autonomous UAV navigation implicated by several researchers. Different AI approaches comprise mathematical-based optimization and model-based learning approaches. The fundamentals, working principles, and main features of the different optimization-based and learning-based approaches are discussed in this paper. In addition, the characteristics, types, navigation models, and applications of UAVs are highlighted to make AI implementation understandable. Finally, the open research directions are discussed to provide researchers with clear and direct insights for further research.


I. INTRODUCTION
U NMANNED aerial vehicles (UAVs) are vehicles that can fly without a human pilot onboard [1]. Because of their high mobility, easy deployment, low maintenance, UAVs are increasingly being used in civilian and military applications [2]- [4]. In addition, UAVs can accommodate a wide range of potential sensors for any crucial missions. [5]. There are many UAV applications such as wildfire monitoring [6], [7], crowd monitoring [8] target tracking [9], goods delivery [10], medical assistance, search and rescue (SaR) [11], emergency cellular deployment [12], and intelligent transportation. However, UAVs can not perform optimally in a complex dynamic environment owing to the dependency on human control and limitation of radio frequency (RF) communication [1]. Autonomous navigation of UAVs in largescale dynamic environments is one of the key components for optimal outcome. Localization and mapping techniques [13], [14], and sensing and avoidance techniques [15] are often used in traditional approaches to achieve autonomous navigation.
The implementation of 5G networks has paved many new ways to optimize autonomous UAV navigation [16]. However, three-dimensional (3D) deployment, navigation, and resource utilization of UAVs are only a few of the engineering issues that have been explored in early research contributions [17], [18]. To solve these fundamental problems, powerful optimization methods such as convex optimization [17], game theory [19], transport theory [20], and stochastic optimization have been used. Although many nonlinear techniques have shown satisfactory results, they are typically confined to primary and narrow habitats, such as the countryside or indoor areas. It is impossible to apply them specifically to large-scale complex environments because passively navigating significant barriers or constantly constructing maps is impractical in large and complex environments [21].
Various navigation methods have been proposed to date and are categorized into three groups: inertial navigation, satellite navigation, and vision-based navigation [1]. Nonetheless, neither of these approaches is perfect; thus, it is crucial to choose the optimal technique for autonomous UAV navigation based on the mission at hand [1]. For the past few years, researchers have been studying and attempting to automate UAV navigation in which UAVs learn from their surroundings. Autonomous navigation is one of the most critical aspects of UAV automation. One of the main challenges of autonomous UAV navigation is the avoidance of obstacles to reach the desired destination.
Artificial intelligence (AI) has become an essential element of nearly every engineering-related study field owing to recent advances in computer technology and hardware. AI is an ideal tool for solving complex problems where no specific solutions are available or conventional solutions require a considerable amount of hand-tuning. Automatic feature extraction, which eliminates costly hand-crafted feature engineering, is a significant distinction between AI and traditional cognitive algorithms. In general, an AI task can spot anomalies, forecast potential scenarios, respond to changing situations, gain insights into complicated issues involving vast quantities of data, and find patterns that a person might ignore [22]. It can exploit and learn the surrounding big data to improve UAV maneuvering. Moreover, AI can intelligently manage onboard resources compared with traditional optimization approaches.
AI methods can be divided into two groups based on their level of intelligence. The first group are the most fundamental, allowing the machine to react predictably to the environment. This enables UAVs to perform according to performance metrics. The second group of methods allow UAVs to communicate with their surroundings, enabling them to make decisions even when the environment is unpredictable [23]. Thus, AI techniques are increasingly being used to improve autonomous UAV navigation. There are many parameters in autonomous UAV navigation, and some are set using heuristic equations, because solid closed-form solutions for their value do not exist or are computationally expensive to find. AI can help with these problems by forecasting parameters and calculating functions based on available data [22]. Furthermore, artificial neural networks (ANNs), a type of AI technique, can be used to model the objective functions of nonlinear problems that involve optimization or approximation [24].
Nonetheless, there are many challenges in applying AI in autonomous navigation, such as reducing training time, reducing computational power, reducing complexity, updating information for extended periods, and quick adaptation to new environments [25]. Thus, many researchers have investigated and proposed different solutions to overcome the challenges of autonomous UAV navigation while utilizing AI efficiently.

A. EXISTING SURVEYS
The significant advancements in AI and the contribution of AI to autonomous UAV navigation are the primary motivations behind this study. Several surveys on UAVs in different aspects have been published in the last decade. In [26], Souissi et al. discussed several stat-of-the-art methods for UAV path planning, such as Dijkstra's algorithm [38], A * algorithms [39], particle swarm optimization (PSO) [40], ant colony optimization (ACO) [41], probabilistic roadmapping [42], rapidly-exploring random trees (RRT), and multi-agent path planning [43], and outlined their advantages and disadvantages. Moreover, the authors categorized UAV path planning in terms of environmental modeling. UAVs with environmental knowledge have a higher probability of achieving an optimal or near-optimal solution compared with the UAVs having no environmental knowledge, however, they cannot deal with sudden changes in the environment.
Meanwhile, Zhao et al. [30] discussed computational intelligence (CI)-based approaches for UAV path planning. The authors highlighted the genetic algorithm (GA), PSO, ACO [41], artificial neural network (ANN), fuzzy logic (FL), and Q-learning based papers considering online/offline planning and 2D/3D environments. However, research works based on AI/ML were not included in the paper. Radmanesh et al. carried out a comparative study of UAV path planning algorithms for heuristic and non-heuristic methods in [31]. The authors tested algorithms, such as the potential field, Floyd-Warshall, GA, greedy algorithm, multi-step look-ahead policy, Dijkstra's, A * , Bellman-Ford, Q-learning algorithms, and mixed-integer linear programming (MILP), under three different obstacle scenarios in terms of computational time and optimality.
In contrast, Lu et al. [1] surveyed vision-based methods of UAV navigation while focusing on visual localization related to cellular-connected UAVs in wireless networks in [22], [33], [34], [36], [57]. In addition, Liu et al. [37] highlighted the AI-based approaches for resource allocation, big data handling, dynamic deployment, and trajectory design for UAV-aided wireless networks (UAWN). The majority of the surveys focus on AI for UAVconnected wireless communication or CI-based solutions for autonomous UAV navigation, explaining their future applications. However, none of them focuses solely on AI approaches for autonomous UAV navigation and future AI potential approaches as shown in Table 1. Unassociated with these works, this paper discusses the present and future AI approaches for autonomous UAV navigation. This paper provides a comprehensive survey of this crucial paradigm of AI approaches covering all UAV navigation scenarios, identifying the prevailing gap in the literature that inspired the current research. This survey aims to help the researchers to work in the direction of AI-based methods in autonomous UAV navigation.

B. CONTRIBUTION
This study focuses on different AI approaches, such as deep learning, mathematical optimization methods, reinforcement learning, and transfer learning, for different types of UAV navigation. After analyzing different AI techniques, future research directions for UAV navigation are highlighted. The main contributions of this study are as follows.
• Many authors have simulated and incorporated different types of UAVs and their characteristics. Knowledge of different UAV parameters is mandatory for implementing different AI algorithms. These parameters help researchers to develop appropriate system models and simulation scenarios. Moreover, setting an reasonable goal is very important for implementing AI algorithms. Thus, the key characteristics and types of UAVs are highlighted to familiarize the reader with UAV architecture. A brief overview of the UAV navigation system and application-based categorization are provided, which will help new researchers to easily understand the various methods of AI implementation. • AI is a vast area that includes different types of learning and optimization algorithms. Thus, the AI approaches for autonomous UAV navigation are divided into two parts: optimization-based and learningbased approaches. Different types of memory-free computational heuristic approaches are discussed in the optimization-based part. In these types of approaches, the AI agent has to perform all the necessary calculations from the beginning every time to obtain the optimal solution. Thus, optimization-based solutions have high time complexity and require high computational power. The memory-based computational learning approaches are discussed. Here, the AI agent learns the surrounding, obtains an optimal policy, and saves it for future use. The agent can use and update the saved policy later if needed. Therefore, the fundamentals and working principles of several AI techniques implemented by different researchers for autonomous UAV navigation in terms of optimization-based and learningbased approaches are presented in this paper, as shown in Fig. 4. • A comparative study of different optimization-based and learning-based AI approaches for autonomous UAV navigation is conducted in this paper. Here, the features of the approaches are identified and compared them in terms of their complexity, hyper-parameters, and objectives. Extended categorizations of the optimizationbased and learning-based AI approaches are included in the comparative analysis.

II. UAV CHARACTERISTICS AND NAVIGATION MODEL
The realization of unmanned aerial systems have been a significant challenge for engineers and scientists since the invention of airplanes. There are many different types of UAVs available today for military and civilian applications. UAVs are often classified based on characteristics related to shape, range, price, maximum take-off weight, and pricing as shown in Table 2. One of the most crucial features of a UAV is its payload. The maximum weight that a UAV can carry, or payload, is a measurement of its lifting capabilities. UAV payloads can range from a few grams to hundreds of kilograms [33], [60]. The larger the payload, the more equipment, and accessories can be carried at the price of the UAV's size, battery capacity, and flight time. Conventional payloads include cameras, sensors, mobile phones, and base stations for cellular assistance.
In general, UAVs can be categorized into four categories based on their flight mechanisms: fixed-wing, helicopters, loons, and multi-copters, as shown in Fig. 1. Fixed-wing UAVs can glide through the air, making them more energyefficient and capable of carrying heavier payloads. In addition, fixed-wing UAVs can benefit from gliding to go quicker. However, they require more space to take off and land, and they cannot hover over a fixed position. Helicopters are a combination of multi-copters and fixed-wings. They can glide through the air with tail wings and take off and land vertically. In contrast, loons depend entirely on air pressure and have no motors for directed movement [59]. Lastly, Multi-copters can take off and land vertically and hover over a certain place. Thus, they are excellent for any application because of their exceptional maneuverability. However, multi-copters have limited flight time and use a considerable amount of energy because they always fly against gravity. As flying is the main characteristic of UAVs, UAV navigation can be categorized into four categories based on application: outdoor navigation, indoor navigation, navigation for SaR, and navigation for wireless networking, as shown in Fig. 2. Here, outdoor navigation includes applications, such as surveillance, good delivery, target tracking, and crowd monitoring, and indoor navigation includes applications, such as indoor mapping, factory automation, and indoor surveillance. In addition, the UAV navigation can be categorized based on navigation parameters: inertia-based, signal-based, and vision-based navigation. For inertia-based UAV Navigation

Indoor Navigation
Outdoor Navigation

Vision-Based
Signal-Based Others FIGURE 2: Application-based categorization of UAV Navigation.
navigation, UAVs mainly use gyroscopes, accelerometers, and altimeters to guide the onboard flight controller [62]. UAVs use GPS modules and a remote radio head (RRH) in the case of cellular connectivity for signal-based navigation and cameras for vision-based navigation.
Initially, the altitude and horizontal controllers receive feedback from these sensors and guide the pitch and yaw controllers depending on the desired path planning. Then, the pitch and yaw controllers guide the elevators and ailerons to maneuver the UAV depending on the feedback of these sensors, as shown in Fig. 3 [61]. UAVs obtain the desired path planning in case of autonomous navigation, as shown in Fig. 3 utilizing various AI techniques. Thus, this paper focuses on different AI approaches implemented by different researchers for UAV navigation.

III. OPTIMIZATION-BASED APPROACHES
Optimization-based approaches cover the traditional mathematical-based problem-solving algorithms of AI. These algorithms can achieve near-optimal solutions for any given non-deterministic polynomial-time hard (NP-hard) problems. However, these algorithms are quite complex in terms of time and space. This section briefly discusses the most widely used optimization-based AI approaches for autonomous UAV navigation, namely PSO, ACO, GA, cuckoo search (CS) algorithm [63], SA, DE, pigeon-inspired optimization (PIO), Dijkstra's algorithm, A* algorithm, greywolf optimization (GWO) [64], and other miscellaneous algorithms. Moreover, Table 3 shows a comparative analysis among of these optimization-based AI approaches where their main features, time complexities with a number of m operations, and hyper-parameter counts are highlighted.

A. PARTICLE SWARM OPTIMIZATION (PSO)
Eberhart and Kennedy introduced PSO in 1995 [40]. PSO is a population-based search algorithm that simulates different animal groups, such as birds and bees. In PSO, each animal can be represented as a vector particle in a 3D space. PSO determines the movement of a particle depending on its current position and velocity. The velocity of the particle continues to update based on the optimal position vector explored by it (P best ) and the swarm (G best ), as shown in Fig 5. PSO reaches the optimal point when it achieves its goal or minimum error possible.
In UAV navigation, PSO considers UAVs as particles and controls their movement in a 3D space. In [65], Autor Jalal modified the conventional PSO for offline UAV navigation while avoiding obstacles. The modified PSO (MPSO) functions like the conventional PSO; however, an additional error factor is modeled to ensure convergence. The main function of the error factor is to convert the infeasible paths generated by PSO into feasible paths. MPSO relocates and re-initializes particles that fall within an obstacle boundary for confirmed optimality. The authors ensured the efficacy of the MPSO by simulating single and multiple obstacle scenarios.
Similarly, Phung et al. modified the conventional continuous PSO into discrete PSO (DPSO) to solve the UAV path planning problem in [66]. The authors modeled the UAV path planning problem as a traveling salesman problem (TSP) while considering discrete 3D space and obstacles. Moreover, deterministic initialization, random mutation, edge exchange, and parallel implementation of GPU techniques were used to speed up the convergence of the DPSO. In 2018, Huang et al. proposed a competition strategy-based PSO (GBPSO) for selecting the global best path for UAVs in [67]. The proposed competition strategy compares the current global path with other global path candidates to select the optimal path for particles.

B. ANT COLONY OPTIMIZATION (ACO)
Colorni et al. first proposed ACO in 1991 [41] to solve NPhard optimization problems. As the name suggests, the foodsearching technique of ants inspired ACO. In the search for foods, ants use a legacy volatile chemical called pheromone to communicate and collaborate. Initially, ants start searching for paths towards the source of food and release pheromones on the way to the source of food. Once an ant reaches the food source, other ants follow the pheromone traces and discover other paths to reach the food source. Thus, the shortest path discovered by the ants will have a higher concentration of pheromones, as shown in Fig 6. Moreover, the concentration of pheromones on the abandoned paths decays with time.
For autonomous UAV navigation, Cekmez et al. proposed a multi-colony ACO-based solution while avoiding obstacles in a 3D space in [68]. According to the authors, multicolony ACO overcomes the premature convergence problem caused by single-colony ACO. Initially, the authors formulated the UAV navigation problem as a TSP problem, and then multiple UAV groups searched for optimal routes to the destination. In multi-colony ACO, the UAVs are responsible for not only intra-colony but also inter-colony pheromone value exchange. Similarly, Guan et al. proposed a doublecolony ACO in which pheromones are generated exploiting the GA in [69].
Jin et al. [70] proposed a combination of an artificial potential field (APF) and ACO named potential field ACO (PFACO) to overcome the premature convergence problem. APF is an obstacle avoidance algorithm that ensures the optimal speed and safety of the UAV in an environment with gravitational and repulsive forces. Furthermore, the APF manipulates the transition probability of a UAV from one node to another in ACO to improve global searching. Moreover, the authors used the min-max ant system (MMAS) to find the best path and the worst path and weaken the worst path while updating the global pheromone value for faster convergence.

C. GENETIC ALGORITHM (GA)
The GA is a stochastic optimization algorithm that starts with a population of randomly produced chromosomes known as the starting population. Each chromosome gene is a series of numerical numbers. Each chromosome or individual in this study reflects a UAV trajectory that is restricted by the UAV dynamics. Genetic operations, such as crossover, mutation, selection, insertion, and deletion, will alter the population periodically in each generation; the modified chromosomes  will be selected according to a fitness function. This procedure aims to reduce the fitness function as much as possible by identifying the chromosome with the near-minimal fitness value. Thus, the chromosomes achieve a near-optimal solution. The GA method is thoroughly explained in [71].
In [71], Bagherian implemented GA to solve the NP-hard problem of UAV navigation. First, the author encodes the 3D position of the UAV into chromosomes that consist of the acceleration, climbing angle rate, and heading angle rate at discrete time steps of a UAV as shown in Fig 7. At the present time-step, this chromosome is decoded to get 3D coordinates at the next time-step for the UAV. Then the 3D coordinate is evaluated using a fitness function that considers the costs of the distance between two points, total path length, height, and obstacles. Afterward, the genetic operations are performed where selection refers to selecting paths, crossover refers to exchanging path information, mutation deals with the information loss, and insertion and deletion handle the path information management.
Tao et al. improved the GA by designing a temporary path based on the encoding vector, with each individual guidance including not only the guide point location information but also the status variables in [72]. Thus, it keeps track of whether the guiding point is feasible if it meets the constraint condition, and whether the path between the connecting point and the next guide points had the lowest performance cost. The temporary path is practicable if all the guiding points are reliable. The encoding method is based on the change in the UAV yaw angle sequence.
Yang et al. proposed a hierarchical recursive multi-agent GA (HR-MAGA) in [73]. During the evolution process of HR-MAGA, agents can detect the environment, communicate with their neighbors, and decrease their loss by employing the corresponding operators, who discover a good solution instantaneously. Moreover, HR-MAGA can optimize the local path to obtain a more refined path using the hierarchical recursive process.
Meantime, Gao et al. proposed the opposite and chaos searching GA (OCGA) to speed up the convergence in [74]. An opposite and chaotic search is used to produce a highquality initial population. Chaos searching can span a specific range of solutions. On the basis of chaotic searching, opposite searching can provide more suitable reverse sequences. The convergence speed is also an essential parameter in optimization. Thus, to accelerate convergence, the authors proposed a unique crossover technique based on the teaching-learningbased optimization learning mechanism (TLBO).

D. SIMULATED ANNEALING (SA)
SA is a continuous-time approximation approach that tends to converge with the global minimum [48]. Annealing is a controlled heating and cooling process for metals that mini-mize the defects at the atomic level. When the metal is heated, the atoms vibrate and reconfigure themselves with minimum energy. Afterward, the metal is cooled slowly to ensure that the configuration has minimum energy. Otherwise, the atoms can become stuck in a configuration with a local minimum internal energy. The SA algorithm emulates the same process to obtain a global minimum for NP-hard problems. The fundamental strategy for implementing the SA is to select random points in the surroundings of the present best point and quantify the cost functions [75]. Then, the UAVs move from one point to another, comparing the present and next point values. The Boltzmann-Gibbs distribution probability density function value named temperature, which determines the acceptability of a point. Initially, the temperature is initialized with a very high value, and then it decreases with each iteration. As the temperature decreases, the acceptance probability gradually reduces until it reaches zero, as shown in Fig 8. Thus, UAVs achieve their goals. However, SA optimization is a time-consuming process.
In [76], Behnck et al. proposed a modified SA algorithm that interprets multi-UAV navigation problem as multiple TSP (mTSP). The authors stochastically chose the points of interest so that UAVs travelled smaller distances. Moreover, the energy consumption was considered within the temperature value. Later, Liu and Zhang [77] incorporated SA with ACO to solve the navigation problem, where the temperature This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  value of SA depends on the pheromone value of ACO. In their approach, the optimal path must satisfy both ACO and SA conditions, ensuring an obstacle-free shortest path for the UAVs. Recently, Xiao et al. proposed a UAV path planning algorithm utilizing SA and a grid map in [78]. Their primary goal was to develop a 3D reconstruction of an area using multiple UAVs circulating on an optimized path energyefficiently. However, the authors considered UAVs flying at a fixed altitude and did not consider obstacles while optimizing the flying paths.

E. PIGEON-INSPIRED OPTIMIZATION (PIO)
Pigeons are the most common bird on the planet, and Egyptians previously employed them to deliver messages and numerous military operations. Pigeons use three homing aids to find their way home: the magnetic field, the sun, and landmarks [79]. Similarly, the fundamental PIO algorithm is based on the phenomena of pigeon self-navigation and is primarily determined by two operators: the map and compass operator and the landmark operator. Pigeons in the wild go through several stages of nerve feedback when homing and use magnetic fields and landmarks to locate their flight path. The magnetic field factor, which occurs in the initial stages, is represented by the map and compass operators [80]. The map and compass operator helps the virtual pigeons locate themselves and calculate their velocity. The landmark operator helps to identify the global center coordinates for autonomous navigation as shown in Fig 9. Although the standard PIO has been shown to be superior in several areas, it still has significant flaws, such as a lack of diversity and immaturity.
In [81], Zhang et al. proposed a social-class PIO (SC-PIO) to overcome the shortcomings of traditional PIO for autonomous multi-UAV navigation. The authors divided the pigeons into different layers of classes, where lower-class pigeons followed the top-class pigeon. Pigeons can also go from one class to another, depending on the obtained optimal path. Hu et al. [82] proposed an adaptive operator quantumbehaved PIO (AOQPIO) with adaptive operators to overcome the problems of PIO for autonomous UAV navigation. Moreover, the authors introduced chaotic strategies to generate initial solutions in advance to obtain a wider solution space coverage. Chaos is an apparently random motion that appears in a critical dynamic system. The features of chaotic motion are (1) high sensitivity to starting values, (2) ergodicity of motion trajectories, and (3) randomization.

F. CUCKOO SEARCH (CS) ALGORITHM
The CS algorithm replicates the natural egg-laying strategy of the parasitic cuckoo birds. The cuckoo searches for a nest by random walk, utilizing Levy flight, and lays eggs. Frequent short flights and infrequent long flights utilizing the Levy distribution are considered to be Levy flights [83]. The CS algorithm mainly follows three rules: each cuckoo randomly chooses a nest and lays one egg at a time, the nest with the best quality egg is passed to the next generation, and the total number of host nests is fixed with egg uncertainty probability [0, 1] [63]. Egg uncertainty refers to discovering of the cuckoo eggs by a host bird. In this case, the host can throw eggs or leave the nest for good. In the case of UAV navigation, UAVs are the cuckoo, and the coordinates are the nests. UAVs randomly choose a nest or coordinate to reach the target location. Target location can remain same or change depending on the UAV's mission. If the coordinate is blocked by an obstacle, the UAV chooses another nest or coordinate to reach the target. Otherwise, the coordinate is considered as the best solution and carried to next generation, where its used to find the next coordinate.
Xie and Zheng proposed an improved CS algorithm combining genetic operators for UAV path planning in [84]. The authors incorporated crossover and mutation operators of the GA with the CS algorithm to speed up the convergence of the algorithm and avoid local optimums. In contrast, Hu et al. implemented a conventional CS algorithm for UAV trajectory planning in an urban area where each egg represents a trajectory and each trajectory includes multiple coordinates in [85]. To reduce the computational load, the authors used the Chevyshev collocation points to represent the coordinates. Moreover, they showed that with optimal parameters, the CS can outperform the PSO algorithm.

G. DIJKSTRA'S AND A* ALGORITHMS
Dijkstra's algorithm is a weighted graph method that calculates the shortest distance between two nodes. Edsger Wybe Dijkstra, a Dutch mathematician and computer scientist, created this algorithm. Algorithms are employed in a variety of applications, including navigation [86]. A starting point is chosen in the Dijkstra's algorithm. All the other nodes are regarded as infinitely distant. As the nodes are approached, the distances are updated. Dijkstra's algorithm examines VOLUME 4, 2016 neighbors leaving a node at each step, and if a shorter path is discovered, the distances are updated. Similarly, the A* algorithm is a hybrid of Dijkstra's algorithm and the greedy best-first-search because it can not only discover the shortest path but also employ a heuristic to steer itself [87]. The A* combines the information used by Dijkstra's algorithm (favoring vertices near the beginning point) with the information used by greedy best-first-search (favoring vertices close to the target) as shown in Fig 10. Many authors in [88]- [90] have proposed different modified versions of the Dijkstra's and A* algorithms that consider target tracking, and real-time environment updates for obstacle avoidance for autonomous UAV navigation. However, these two algorithms are quite complex compared to other optimization-based approaches.

H. DIFFERENTIAL EVOLUTION (DE)
Differential evolution (DE) is a population-based optimization method that was first proposed in 1997 [51]. It combines the parent or initial points with a few additional points from the overall population of paths to create new solutions. Each solution has a group of variables that are subjected to mutation, selection, and crossover search operators to generate new solutions, as shown in Fig 11. DE only considers solutions that are better than their parents and passes them to the next generation. DE is straightforward to implement in real-life applications, such as UAV navigation, owing to its minimal control parameters.
Ghambari et al. proposed a hybrid evolutionary algorithm combining A* and DE algorithms to optimize the NP-hard problem of UAV navigation in [91]. Here, the DE algorithm is responsible for exploring and exploiting the entire flying space, and generating multiple connected regions between the start and destination points while ensuring the shortest distance from the straight-line path in an admissible space. In contrast, the A* algorithm searches for the shortest paths in every region generated by the DE algorithm. Although this hybrid method decreases the overall computational time, higher computational power is required to compute the two algorithms simultaneously. In [92], Yu et al. proposed a constraint DE (CDE) to solve the UAV path planning problem in disaster scenarios where a mutation is performed selectively for better convergence. They modeled the UAV path planning with different nonlinear constraints and ranked all the proba- ble traveling points depending on the fitness values and constraint violation. The CDE only selects points that have high fitness values and minimum constraint violations. Then, the authors proposed the knee-guided DE algorithm (DEAKP) for autonomous UAV navigation in [93], where the knee point depends on minimum Manhattan distance (MMD). The DEAKP algorithm reduces the overall computational complexity by focusing on knee solutions instead of Pareto front solutions in constrained multi-objective optimization scenarios. It identifies the knee solution based on MMD and generates offspring for next-generation combing with nondominated points.

I. GREY WOLF OPTIMIZATION (GWO)
Grey wolf optimization was first proposed in [64] in 2014 based on the prey hunting strategy of grey wolves. Grey wolves have a social hierarchy where they are divided into alpha, beta, delta, and omega groups. Alpha group wolves are the leaders, and other groups follow and help them make decisions. Initially, the alpha, beta, and delta groups start searching for the static target stochastically, and the omega wolves wait for the order to join. The decision to find the target of the alpha group has the highest priority compared to the beta and delta groups. To identify the exact position of the target, the alpha, beta, and delta wolves estimate the distance between their current position and the target position as shown in Fig 12. After locating the target, the wolves send signals to other wolves to join them during target hunting. GWO has only two parameters, A, which is responsible for exploration, and exploitation and C, which helps the wolves avoid obstacles during the search.
In [94], Zhang et al. utilized conventional GWO to solve the 2D path planning problem of unmanned combat aerial vehicles (UCAVs) while ensuring minimum fuel usage and zero threat. The authors compared GWO with other optimization algorithms such as ACO, PSO, and CS in three different scenarios. Following this, Dewangan et al. implemented conventional GWO to solve the 3D path planning problem of UAVs while avoiding obstacles in [95]. The authors compared 3D GWO with other optimization methods for three different maps. Qu et al. proposed a hybrid GWO algorithm that combines a modified symbiotic organisms search (MSOS) named HSGWO-MSOS for UAV path planning in [96]. The authors combined GWO and MSOS for fast convergence and efficient global and local environmental exploitation. Finally, they analyzed the complexity of HSGWO-MSOS and showed that HSGWO-MSOS outperformed the SA algorithm.

J. MISCELLANEOUS ALGORITHMS
In addition to the aforementioned optimization-based approaches, several AI techniques are available, and are discussed in this section. However, there has been very little or no significant and simple study utilizing these methods in recent years.

1) Improved intelligent water drop algorithm
The water drop visits only neighboring cells instead of soil probability-based movement. This algorithm utilizes both soil and the distance from the destination to guide the water. Moreover, the global soil update rate increases with the evolution of the water-dropping path, affecting local and global path searching [97].

2) APF-based RRT-connect
The APF-based target attraction function is integrated with the bare rapidly exploring random tree (RRT) connect algorithm that helps a random tree grow in the direction of the goal. This function reduces the overall search space and complexity of the algorithm, which ensures near-optimal convergence of the path planning problem [98].

3) Fuzzy logic
A fuzzy logic-based solution is proposed to control the leader-follower formation of a swarm of homogeneous UAVs and enable the swarm to avoid collisions and maintain the formation depending on the leader's moves [99].

4) Firefly fuzzy logic
A hybrid firefly fuzzy controller is proposed where the firefly algorithm estimates the intermediate turning angle considering the Euclidean distance from the obstacles and goal. Finally, the fuzzy logic confirms the final turning angle and speed, validating the measured distances using the firefly algorithm [100].

5) Glow-worm swarm optimization
In glow-worm optimization, worms move toward another worm with a higher luciferin content. UAVs first identify the obstacles around them and search for higher luciferin content in the neighboring points in the case of navigation. The point nearest to the goal has the highest luciferin content. The luciferin content is distributed randomly initially, and then it decays and gets updated with time [101].

6) Modified central force optimization
The central force optimization (CFO) method depends on the law of gravity among particles. In UAV navigation, each point acts as a particle, and higher mass particles attract the UAV. However, CFO tends to become stuck in local minima and has a poor memory-less searching capability. The search strategy of the PSO and mutation capability of the GA are introduced in the modified CFO to mitigate the shortcomings [102].

7) Unsupervised SA
The UAV flying area is divided into multiple small areas for multiple UAVs. Then, the target points of the entire flying area are clustered using the k-means algorithm. Finally, each UAV autonomously flies towards the targets using the SA algorithm in each flying area [103].

8) Improved T -distribution evolution algorithm
An evolutionary algorithm based on an improved Tdistribution is proposed for autonomous UAV navigation with little or no prior knowledge of the flying area. A directional perturbation operator obtained by the Sigmoid function is introduced to the improved T -distribution evolution algorithm to reduce the computational complexity, increase the convergence rate and make the algorithm more robust [104]. VOLUME 4, 2016 9) Bio-inspired predator-prey The main concept of the predator-prey algorithm is that in a search space, there are many prey representing solutions, and the predators (e.g., UAVs) search for prey with the highest fitness value. Whenever the predator consumes a solution, new solutions are generated around the predator. Moreover, mutation and crossover are the main parameters that help the predators reach the optimal solution [105].

10) Predator-prey PIO
The predator-prey characteristics are incorporated with the traditional PIO to find the best optimal path and speed up the convergence of the algorithm. The goal of predator-prey is to eliminate the solutions with the most negligible fitness value in the neighborhood, increasing the diversity of the population. Thus, UAVs tend to find optimal solutions faster [106].

IV. LEARNING-BASED APPROACHES
Learning-based approaches cover the traditional modelbased AI algorithms. These algorithms can achieve nearoptimal solutions for any given NP-hard problem with very low complexity. This section briefly discusses the most widely used learning-based AI approaches for UAV navigation: reinforcement learning (RL), deep learning (DL), asynchronous advantage actor-critic (A3C), and deep reinforcement learning (DRL) utilizing the Markov decision process (MDP), Partially Observable MDP (POMDP), or convolutional neural network (CNN). Moreover, Table 4 summarizes their main features, goals, time complexities with a number of operations m, number of layers, and number of hyperparameters and presents a comparative analysis of these learning-based AI approaches.

A. REINFORCEMENT LEARNING (RL)
Reinforcement learning is an effective and widely used AI technique that learns about the environment by performing various actions and determining the best operating strategy. An agent and environment are the two fundamental components of RL. Using the MDP, the agent interacts with the environment and determines which action to take [107]. At each time step t, the agent observes its current state s t in the environment and takes action a t , as depicted in Fig. 13. The environment then rewards the agent with reward r t . Thereafter the agent moves to a new state s t+1 . The agent's principal aim is to establish a policy π that collects the possible reward from the environment. The agent also seeks to maximize the predicted discounted total reward defined by max[ T t=0 δr t (s t , π(s t ))] in the long run, where the discount factor δ ∈ [0, 1]. When the state transition probabilities are known in advance, a Bellman equation called the Q-function (1) is built using the discounted reward. Initially, the agent investigates each state of the environment by performing various actions and creating a Q-table for each state-action pair using the Q-function. The agent then begins to exploit the environment by performing actions that have the highest Q-value in the Q-table.
(1) Because of its self-learning capabilities and energy efficiency, RL is an excellent option for autonomous UAV navigation systems. The autonomous UAV navigation systems that have been used in the past are inefficient and sluggish. If RL is employed, each UAV acts as an agent and tries to fly towards the target. The target can be fixed or dynamic depending on the system model. The closer the UAV gets to the target, the more rewards it receives from the environment.
Therefore, Pham et al. proposed a Q-learning algorithm in [109] for autonomous UAV navigation where Q-learning controls the proportional-integral-derivative (PID) controller parameters to navigate the UAV in a 2D indoor space. Later in [110], the authors upgraded Q-learning with function approximation based on fixed sparse representation (FSR) for better convergence. In contrast, Chowdhury et al. [111] proposed a received signal strength (RSS)-based Q-learning algorithm utilizing -greedy policy for indoor SaR. To make the UAVs more efficient, Liu et al. [112] proposed a double Q-learning solution for navigating a UAV base station autonomously to serve ground users in 2D space. In [113], Colonnese et al. implemented a Q-learning algorithm for UAV path planning to improve the quality of experience (QoE) of ground users. The authors considered autonomous visits to charging stations to maintain interrupted communication. Liu et al. proposed a multi-agent Q-learning algorithm for UAV deployment and navigation to satisfy user QoE in a 3D space [114]. The authors separated the flying zone of each UAV by utilizing the GA and k-means algorithms. Furthermore, in [115], the authors incorporated the prediction of user movement for advance path planning utilizing an echo state network. Later, Hu et al. proposed a multi-agent Q-leaningbased real-time decentralized trajectory planning algorithm in which multiple UAVs perform sense-and-send tasks assigned by the nearest BS in [116]. For better convergence and low complexity, the authors reduced the state-action space and incorporated a model-based reward system. Zeng and Xu proposed a Q-learning algorithm utilizing the temporal difference (TD) method to achieve an optimal solution for cellular-connected autonomous UAV navigation in [117]. In addition, the authors discretized the state-action space and introduced a linear function approximation with tile coding to handle large state-action space.

B. DEEP REINFORCEMENT LEARNING (DRL)
Deep reinforcement learning (DRL) uses Q-values in the same way as Q-learning, except for the Q-table component, as illustrated in Fig. 14  Oubbati et al. proposed an MDP-based DRL algorithm for the urban vehicular network to minimize the average energy consumption and maximize the vehicle coverage with multiple UAVs [120]. They considered centralized training of the UAVs to cover as many vehicles as possible with the minimum number of UAVs while avoiding obstacles and collisions. Oubbati et al. also proposed an MDP-based multi agent DQN (MADQN) algorithm where two UAVs fly over several Internet of Thing (IoT) devices to minimize the age of information (AoI) and enable wireless powered communication networks (WPCN) [121]. They utilized centralized training of the UAVs to maximize energy efficiency and avoid a collision.
Moreover, Wang et al. [21] proposed an autonomous UAV navigation system while avoiding obstacles by utilizing MDP-based DRL with extremely sparse rewards utilizing non-expert helpers (LwH). LwH generates a policy before DRL training, which helps the agent to reach optimality by setting dynamic learning goals. Theile et al. [122] proposed a CNN-based double DQN (DDQN) for UAV coverage path planning considering energy consumption and map-based movement. In contrast, He et al. proposed a vision-based DRL algorithm is used to solve the navigation problem in which the navigation problem is modeled as an MDP, a CNN is used, and a twin delayed deep deterministic policy gradient (TD3) from a demonstration is used [125]. Chen et al. proposed an object detection-assisted MDP-based DRL for collision-free autonomous UAV navigation in [123]. The authors considered the positions of the objects, UAV orientation angle and velocity, and 2D coordinates in the state space for faster convergence.

2) Partially Observable Markov Decision Process (POMDP)
POMDP is an extension of the MDP, where the agent can observe the environment without knowing the actual state and take action. POMDP considers all possible uncertainties of the environment to estimate the actions. POMDP comprises observation space, state space, and action space. It is a timeconsuming process and can provide precise optimality compared with MDP. Thus, Walker et al. proposed a POMDPbased DRL framework for autonomous UAV indoor navigation in [126]. Here the agent utilizes MDP for global and POMDP for local path planning while avoiding obstacles. In addition, the authors used trust region policy optimization (TRPO) to control the policy upgrading during learning. In [127], Pearson et al. used POMDP to develop visionbased DRL algorithm for autonomous UAV navigation. The authors proposed an extended double deep Q-learning (ED-DQN) method, including a modified Q-function that uses the surrounding image to navigate the UAV to explore the environment. Similarly, Theile et al. proposed a POMDPbased DDQN for autonomous UAV navigation. The main goal of the UAV is to move around and harvest data from a certain area by utilizing the map of the area.

C. ASYNCHRONOUS ADVANTAGE ACTOR-CRITIC (A3C)
A3C is an advanced DRL algorithm where each agent consists of two networks: an actor network and critic network. A3C is commonly used in multi-agent environments. The actor network is responsible for observing the current state of the environment and selecting actions. After executing the actions, the agents obtain rewards. Collecting all the states, actions, rewards, and next states of all UAVs, the critic network produces the Q-values and updates the actor network using a deep deterministic policy gradient (DDPG). A3C is highly efficient in multi-UAV scenarios. Thus, Wang et al. [129], [130] proposed an A3C-based DRL framework for autonomous UAV navigation to support mobile edge computing. Each UAV consists of a critic and actor network. All the actor networks in the UAVs are trained using the same data from the entire network. However, critic networks are trained using individual UAV data utilizing a multi-agent DDPG. Moreover, Wang et al. [131] proposed a fast recurrent deterministic policy gradient algorithm (fast-RDPG) to navigate UAVs in a large complex environment while avoiding obstacles. Fast-RDPG is an A3C-based DRL online algorithm that can easily handle POMDP problems and converge faster. In contrast, Liu et al. proposed a an A3C-based DRL algorithm for decentralized energy-efficient autonomous UAV navigation for long-term cellular coverage in [132]. The authors used a modified policy gradient to update the target network by considering the observations of the actor network.

D. DEEP LEARNING (DL)
Deep learning is the common tool for vision-based UAV navigation. Deep learning comprises only the deep neural network (DNN) part of the DRL. Considering recent improvements in a variety of tasks such as object identification and localization, image segmentation, and depth recognition from monocular or stereo images, the DNN method has been successfully utilized by several researchers for the identification of roads and streets on key routes and metropolitan regions by focusing on achieving a high level of autonomy for self-driving cars [133]. DNNs can be used to achieve autonomous navigation for UAVs in extremely difficult environments. There are different types of DNNs, such as fully connected NN (FNN) and CNN.
Menfoukh [133] proposed an image augmentation method utilizing a CNN for vision-based UAV navigation. Back et al. proposed vision-based UAV navigation utilizing CNN in [134], where UAVs perform trail following, disturbance recovery, and obstacle avoidance. In contrast, Pearson et al. [135] proposed autonomous trail following and steering for UAVs by utilizing real-time photos and CNN. For indoor navigation, Chhikara et al. proposed a GA-based deep CNN (DCNN-GA) architecture in which the hyper-parameters of the neural network are tuned using GA. The trained DCNN-GA was utilized for autonomous UAV navigation using transfer learning. VOLUME 4, 2016

E. MISCELLANEOUS LEARNING ALGORITHMS
In addition to the aforementioned learning-based approaches, several other AI techniques exist. For example, proximal policy optimization [137], Monte Carlo [138], linear regression, and deep deterministic policy gradient. However, there are limited and insignificant studies utilizing these methods.

V. FUTURE RESEARCH DIRECTIONS
This section discusses and highlights future possible research directions based on the present research trends described in the previous sections. Previous sections have summarized and presented a comparative study of different AI approaches implemented by researchers. The AI sector is growing, and many efficient AI approaches have not yet been explored adequately for autonomous UAV navigation. The open research issues are summarized below.
• New approaches: Federated learning (FL) is at the top of the list of new approaches. The goal of FL is to train an AI model in a distributed manner across multiple devices using local datasets without sharing them. In addition, FL prevents cyberattacks naturally, as UAVs do not require any data sharing. FL can be integrated with any AI algorithms for autonomous UAV navigation, and it reduces the space and time complexity by utilizing central learning. However, FL has not yet been implemented yet for autonomous UAV navigation, which necessitates its exploration. In addition, the implementation of ontology-based approaches for navigating a UAV swarm is still not explored properly. Karimi et al. utilized an ontology-based approach to navigate robots in a construction site in [139] which can be modified to use in UAV swarm navigation in future research. • Energy consumption: UAVs use batteries as their primary power source to support all activities, including flight, communication, and processing. However, the capacities of UAV batteries are insufficient for extended flight. Many researchers have applied different algorithms, such as sleep and wake-up schemes, incorporating mobile edge devices for external computing, and use of solar, to optimize the energy usage of UAVs [140]. Solving the energy issue using energy harvesting while flying is a direction for future research. However, an autonomous visit to a charging station utilizing AI can also solve this issue. • Computational power: UAVs are very small in size compared to other vehicles. Thus, their memory and energy capacities are low, which gives them a low computational power. In contrast, the implementation of both the optimization-based and learning-based AI approaches requires high computational power. Overcoming this issue remains an open research problem. Developing efficient AI approaches with low computational power consumption can be a key solution to this problem. Thus, this area needs to be explored adequately.
• Physical threats: Physical threats are very recurrent when it comes to surveillance and SaR UAV missions. Many AI approaches have been previously implemented for obstacle avoidance. However, there is no existing solution that avoids sudden physical threats. Consequently, AI-based solutions for physical threat avoidance require an in-depth investigation. • Fault handling: Faults occur frequently in moving vehicles. The handling of software faults is very easy to achieve with an onboard emergency program. However, AI-based solution for faults that are difficult to handle are lacking, such as hardware problems, equipment failures, and inter-component communication failures. Therefore, this area remains an open research problem.

VI. CONCLUSION
Autonomous UAV navigation has introduced great flexibility and increased performance in complex dynamic surroundings. This survey highlights UAVs' essential characteristics and types to familiarize the reader with the UAV architecture. Furthermore, the UAV navigation system and applicationbased classification were summarized to make it easier for researchers to grasp the concepts introduced in this survey.
In terms of optimization-based and learning-based methods, the fundamentals, operating principles, and critical features of numerous AI algorithms applied by different researchers for autonomous UAV navigation were described. Different optimization-based approaches such as the PSO, ACO, GA, SA, PIO, CS, A*, DE, and GWO algorithms were analyzed and highlighted. Many researchers have modified these methods according to their requirements to achieve optimal objectives. In addition, this survey categorized and analyzed learningbased algorithms such as RL, DRL, A3C, and DL. The researchers utilized different neural networks, learning parameters, and decision-making processes to fulfill their objectives. After analyzing all AI approaches, comparative studies were presented comparing all the methods from the same ground. In summary, various resources and data related to autonomous UAV navigation and AI are available to further research and development. Furthermore, there is a scope of improvement and novel ideas in different scenarios, such as big data processing, computing power, energy efficiency, and fault handling. Thus, this survey highlights future research directions to speed up the present research on AI-based autonomous UAV navigation. Finally, AI can be computationally expensive, but it increases the overall performance of UAVs in terms of significant parameters, such as energy consumption, flight time, and communication delay, in a complex dynamic environment for any critical mission.