A Fast Large-Scale Path Planning Method on Lunar DEM Using Distributed Tile Pyramid Strategy

In lunar exploration missions, path planning for lunar rovers using digital elevation models (DEMs) is currently a hot topic in academic research. However, research on path planning using large-scale DEMs has rarely been discussed, owing to the low time efficiency of existing algorithms. Therefore, in this article, we propose a fast path-planning method using a distributed tile pyramid strategy and an improved A* algorithm. The proposed method consists of three main steps. First, the tile pyramid is generated for the large lunar DEM and stored in Hadoop distributed file system. Second, a distributed path-planning strategy based on tile pyramid (DPPS-TP) is used to accelerate path-planning tasks on large-scale lunar DEMs using Spark and Hadoop. Finally, an improved A* algorithm was proposed to improve the speed of the path-planning task in each tile. The method was tested using lunar DEM images. Experimental results demonstrate that: in a single-machine serial strategy using source DEM generated by the Chang'e-2 CCD stereo camera, the proposed A* algorithm for open list and closed list with random access feature (OC-RA-A* algorithm) is 3.59 times faster than the traditional A* algorithm in long-distance path planning tasks and compared to the distributed parallel computation strategy using source DEM generated by the Chang'e-2 CCD stereo camera, the proposed DPPS-TP based on tile pyramid DEM is 113.66 times faster in the long-range path planning task.


I. INTRODUCTION
L UNAR exploration using a lunar rover is the first step in human space exploration, and the path planning problem for lunar rovers has been an important focus of research in lunar exploration projects [1].
The core idea of path planning is to determine an optimal path from the current starting position to the goal position in an unknown environment. Path planning algorithms are generally Manuscript  divided into two categories: global path planning and local path planning [2], [3]. Global path-planning research uses global terrain and obstacle information to model and calculate optimal paths, whereas local path-planning research usually models the environmental data in the process of data collection and then measures as many effective paths as possible based on the model. Global path-planning algorithms include classical algorithms such as Dijsktra [4], Floyd [5], A * [6], RRT [7], and intelligent algorithms such as the ant colony algorithm [8] and genetic algorithm [9]. Local path-planning algorithms mainly include artificial potential field-based and neural-network-based methods. Among the above-mentioned methods, Dijsktra and Floyd can calculate the optimal path, but require more time and a large amount of memory as they need to traverse and store all the points [10], [11]. The A * algorithm adopts a heuristic search technique based on the Dijsktra algorithm, which accelerates the path search speed [12]. The RRT algorithm has a fast path-finding speed; however, the path is usually neither optimal nor smooth [13]. The ant colony and genetic algorithms show strong performance in path planning; however, they rely heavily on the parameter setting, and the convergence speed is slow. Gan et al. [14] proposed an improved RRT algorithm for fast tree construction to reduce the time spent of path planning. Liu et al. [15] combined pheromone diffusion and geometric local optimization to propose an improved ant colony algorithm for solving the problem of slow convergence. Bounini et al. [16] proposed a novel potential field method for robot navigation, and Qu et al. [17] proposed a modified pulse-coupled neural network model for real-time path planning of mobile robots in dynamic environments.
With the rapid development of modern mapping technology and the upgrading of sensor hardware, the generated digital elevation models (DEMs) have become increasingly accurate, and the data volume has become larger; therefore, pathfinding calculations based on DEMs are becoming increasingly time-consuming. To solve the above-mentioned problems, Hong et al. proposed an improved A * algorithm using closed list with random access data structure (C-RA-A * algorithm) [18]. Compared with the traditional A * algorithm (Trad-A * algorithm), the efficiency of path planning on DEM generation was significantly improved. However, when the data volume of DEMs is excessively large, the applicability of this algorithm is limited by the memory size of the server. Compared with a single computer, distributed storage [e.g., Hadoop distributed file This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ system (HDFS), HBase] and computational technologies (e.g., MapReduce, Spark) use the storage and computational resources of clusters and show tremendous advantages when data increases dramatically. Therefore, they are extensively used in the storage [19], [20], [21], calculation [22], [23], segmentation [24], and path planning of massive remote sensing data. Wang et al. used the MapReduce-based distributed parallel Dijkstra algorithm to solve the shortest path problem. The MapReduce-based distributed Dijkstra algorithm has significant advantages over the traditional Dijkstra algorithm for large-scale path planning [25]. However, for frequent iterative computation, MapReduce needs to spend a considerable amount of time on the disk IO of intermediate data. Alazzam et al. proposed a path-planning algorithm for A * on Spark, and the results showed that the Spark-based A * algorithm has a significant effect on large-scale graph theoretic data [26]. However, their algorithm cannot efficiently use the neighborhood property of DEM grids to obtain neighbour nodes.
According to the above analysis, significant progress has been made, and a number of algorithms have been proposed to improve the efficiency of path planning. However, some problems still need to be explored. First, most existing pathplanning algorithms focus on small areas. Relatively few studies have been conducted on path planning based on large-scale data. Second, the efficiency of existing algorithms needs to be improved, particularly when applied on a large scale, and the processing of nodes using big data platforms is mutually irrelevant. Finally, most research focuses on road and traffic data and DEMs of the Earth, and few studies on path planning based on lunar DEMs have been conducted. Therefore, the objective of this article is to propose a fast large-scale path-planning method based on lunar DEMs. The proposed method adopts a distributed tile-pyramid strategy to improve the path-planning efficiency on a large-scale lunar DEM. In addition, an improved A * algorithm was proposed by modifying the data structure, which can enhance the efficiency of sub-path planning on tiles.
The main contributions of this article can be summarized as follows.
1) This article presents a fast large-scale path planning method based on lunar DEM images. The time efficiency of the path search is significantly improved, especially in long distance path planning task. 2) We propose a distributed tile-pyramid strategy for performing large-scale path-planning tasks. This strategy calculates the nodes of path planning from coarse to fine according to the characteristics of the tile pyramid and uses the idea of divide-and-conquer to accelerate the process by distributing the data across the cluster.
3) The data structure of the A * algorithm is improved, thereby accelerating the sub-path-planning task. 4) The method of this article provides support for the fast search of long-distance three-dimensional (3-D) paths over thousands of kilometers. The proposed method is not only applicable to the path planning of lunar rover, but also suitable for path planning task on Mars and Earth. The remainder of this article is organized as follows. Section II presents the details of the methodology. Section III describes the dataset and experimental design. The experimental results and analyses are presented in Section IV. The influence of factors with different cut tile sizes or computational parallelisms is discussed in Section V. Finally, Section VI concludes the article.

II. METHODS
This article proposes a fast large-scale path-planning method for lunar DEMs using a distributed tile-pyramid strategy. As shown in Fig. 1, the proposed method comprises three components. First, a tile pyramid was generated for a large lunar DEM and stored in HDFS. Second, a distributed path-planning strategy based on a tile pyramid (DPPS-TP) was used to accelerate path-planning tasks on large-scale lunar DEMs using Spark and Hadoop. Finally, an improved A * algorithm (OC-RA-A * algorithm) was proposed to improve the speed of the path-planning task in each tile.

A. Pyramid Generation and Storage
The tile pyramid was constructed by up-sampling with a 2:1 ratio for the source DEM data to generate the pyramid and cut each layer of the pyramid into rectangular tiles of the same size [27]. As shown in Fig. 2, the source DEM (size 2048×2048px) is the bottom layer (layer 0) of the tile pyramid, and its tile pyramid consists of three layers when the tile size is 512×512px, of which the size of layer 1 is 1024×1024px and that of layer 2 is 512×512px. The tiles of each layer have corresponding tile row and column numbers, which are used to quickly locate the tiles on the tile pyramid.
HDFS was used in this article to store the tile pyramid of the DEM. Because HDFS has high-throughput data access and a data redundancy mechanism, it is well suited for large-scale remote sensing data applications. The HDFS cluster is a typical master-slave operation mode that controls and manages the distributed storage of the cluster through the NameNode and DataNode nodes. The NameNode node is used to store the metadata of tile byte blocks. The function of the metadata is to locate the storage location and order of the tile byte blocks in each DataNode node, which is used to store and redundantly generate the data of the tile pyramid. The NameNode node communicates with the DataNode node through a network [28]. Fig. 3 shows how a DEM with a pixel size of 2048×2048 is converted into a tile pyramid with a tile size of 512×512 and stored in the HDFS. First, the tile pyramid is generated from the source DEM with a tile size of 512×512px; subsequently, the tiles of each layer are filled using a Z-order curve [29] or Hilbert curve [29], [30]. Finally, the tile data and pyramid layer information are serialized and stored in HDFS.

B. Principle and Implementation of DPPS-TP
This article proposed DPPS-TP and implemented it using Spark. Using the pyramid model properties, the set of starting and ending points of each tile was calculated from coarse to fine, and the sub-path planning task on each tile was accelerated by Spark using the divide-and-conquer idea.
To reduce the iteration time, a layer-hopping process was included in the tile layer selection from the upper tile layer to the lower tile layer. As shown in Fig. 4, coarse-grained path planning is first performed from the topmost tile layer (n = 4), and the set of starting and ending points of n = 2 tile layers is inferred. Subsequently, distributed path planning is performed for the n-2 tile layers until the path planning result of the bottom tile layer is calculated.
This article used Spark to implement DPPS-TP. Apache Spark is a fast, general-purpose computational engine designed for large-scale data processing. It uses resilient distributed datasets (RDDs) and directed acyclic graphs (DAGs) to ensure that the Spark tasks run quickly and correctly in a distributed environment. RDDs are the most fundamental data processing model in Spark and represent a resilient, immutable, partitionable, parallel computable in-memory set. DAGs are a set of combinations of vertices and edges, where vertices are used to represent RDDs and edges are used to represent the operational relationships between RDDs. DAGs in Spark are essential for ensuring that distributed computations can perform tasks sequentially [22], [31]. Research shows that the main reason Spark runs faster than MapReduce is that Spark reduces unnecessary disk IO operations by building DAGs to improve task execution efficiency; however, as MapReduce operations are independent of each other, the results produced by each MapReduce operation are written to the disk [32], [33].
As shown in Fig. 5, in a master-slave mode cluster, Spark divides the path-planning task into multiple subtasks and assigns them to the slave nodes for execution. The slave nodes fetch the tile data from HDFS, execute the subtasks assigned by the master nodes, and finally write the results to HDFS.
In this article, DPPS-TP was implemented using Spark, which is used to construct distributed tile datasets and distribute subpath planning tasks to clusters for execution. As shown in Fig. 6, the main steps for implementing DPPS-TP using Spark are as follows: 1) Read the topmost pyramid tiles from HDFS and construct tile data RDDs in memory. 2) Input the starting and ending points of the topmost pyramid. 3) Filter the tile RDDs containing the start and end points based on the set of start and end points. 4) Perform the distributed path planning subtask for each tile from the filtered tile RDDs to obtain the tile path RDDs for this layer.

5) Determine whether this layer is the bottom tile pyramid; if
it is not the bottom layer, execute step 6; if it is the bottom layer, output the tile path RDDs to the HDFS. 6) Use tile path RDDs of this layer pyramid to deduce the set of starting and ending points in the next tile pyramid layer that needs to perform local path planning. 7) Read the tiles of the next layer pyramid from HDFS and construct tile data RDDs in memory. 8) Input the results of step 6 and step 7 into step 3 and continue down from step 3. It should be noted that since the starting and ending points of each tile are determined by the high abstraction and low resolution layers, there may be points between tiles that will be fail to pass. These impassable points will be corrected by iterating over the global path points at the end of the experimental program. This step has little impact on the final path planning task. More detail analysis please refer to the results section.
DPPS-TP reduces the time spent performing path planning tasks on large-scale DEMs by moving from coarse-grained tile path-planning tasks to fine-grained distributed tile path planning tasks in an iterative manner.
In the Spark implementation of DPPS-TP, Driver is responsible for broadcasting the set of starting and ending points of each tile and controlling the number of iterations, and workers is responsible for the sub-path planning task of each tile.
As shown in Fig. 7, first, driver and workers construct the tile RDDs and tile metadata of the topmost pyramid (i = n) from HDFS, and driver broadcasts the start and end points of the path planning to workers. Subsequently, workers filter the tile RDDs that do not participate in the path planning and perform sub-path planning for the remaining tiles according to the set of

A. Data
The 20-m resolution lunar DEM data generated by the Chang'e-2 CCD stereo camera were used in this experiment. The area selected for the experiment is located from 77 N to 87 N, 158 E to 158 W. The total image size of the selected area was 32768×32768 pixels, and the total area was 429496.7296 km 2 , as shown in Fig. 9.

B. Experimental Design
Experiments were performed on a single local computer with an AMD Ryzen 7 processor with 2.90 GHz speed and 32G random access memory and on a distributed cluster of three virtual machines with 96G RAM and 48 cores each. In addition, a tile pyramid was generated from the DEM data and stored in HDFS before the experiment.
To ensure the effectiveness of path planning, the tracked lunar vehicle was used as a reference, and a feasible slope threshold of 20°was set for whether to pass that was applied to the experiments of the DEM grid path planning task [34]. As shown in Fig. 10, in the experiments, the projected area of the vehicle is assumed to be a grid cell, and a 3 × 3 window is formed based on the grid cell (i, j) where the vehicle is located, and the eight surrounding grid cells (the corresponding elevation of (i, j) is Z i,j ). The slopes in the vertical/horizontal and oblique directions were calculated using (1) [35], where CellSize is the size of each grid The heuristic function of the A * algorithm used in this experiment is given by F (P ) = G (P ) + H (P ) . (2) In (2), G(P ) is the actual distance cost from node P to the starting point and H(P ) is the estimated distance cost from node P to the end point. In this article, G(P ) was obtained by calculating the Euclidean distance, as shown in (3), and H(P ) was obtained by calculating the Manhattan distance, as shown in (4). In (3) and (4), G(P') represents the actual distance from the starting point to P ; G(P ) represents the actual distance from the starting point through P to point P ; P end represents the position of the end point; x, y, z represent the horizontal and vertical positions of the nodes and the corresponding elevation values, respectively, In this article, four experiments were designed to verify and explore the efficiency of the DPPS-TP and OC-RA-A * algorithms. As given in Table I, these experiments selected the starting and ending points at different distances. Except for the experiments on A * path planning based on long paths with different degrees of parallelism in DPPS-TP, four sets of points were selected for the remaining experiments to simulate the starting and ending points of the long, medium, medium, and short paths.

C. Evaluation Indicators
Time cost and path accuracy were used to measure the performance of the proposed strategy. The time cost is defined as the total running time for path planning, where the time cost for distributed computing is divided into the cluster start-up time and parallel computing time. The total time was calculated using In addition, the average deviation between the planned path and the reference path is calculated to evaluate the accuracy of the path planning. The reference path is defined as the shortest passable path from the starting point to the ending point by using the traditional A * algorithm. The planned path represents the path obtained from the experimental calculation.
The average deviation is calculated as the mean offset between the points of the reference path and the planned path, as shown in (6). Where (x i , y i ), (xg i , yg i ) denotes the pixel horizontal and vertical coordinates of the planned path and the reference path, respectively, n denotes the number of point pairs, and CellSize denotes the size of each grid. Totally, 90% of the points in the IV. RESULTS

A. Results of Time Cost Comparison of Different Methods
The method in this article consists of two parts: the OC-RA-A * algorithm for sub-path planning on each tile and DPPS-TP for parallel processing of the tiles. In order to verify the efficiency of the OC-RA-A * algorithm for path planning at different distances on tiles, this experiment uses a single machine serial computing strategy of the source DEM to mimic the path planning task of different A * algorithms on tiles.
In order to verify the efficiency of the improved A * algorithm in the proposed method, the proposed OC-RA-A * algorithm is compared with the C-RA-A * and Trad-A * algorithms. As given in Table II, the time cost of the Trad-A * algorithm is 3.59 times higher than that of the OC-RA-A * algorithm and the time cost of the C-RA-A * algorithm is 2.96 times higher than that of the OC-RA-A * algorithm when a single machine serial computing strategy is used for the long distance path planning task of the source DEM. It is important to note that the scale of data and the computational environment used in this experiment is different from that used by Hong [18], and that slope maps were not pre-generated for the DEM in this experiment.
To further validate the efficiency of using DPPS-TP, the proposed DPPS-TP is compared with a single machine serial computing strategy at the source DEM. As shown in Fig. 11 and Table II, the time cost using the single machine serial strategy at the source DEM is 1.57 times higher than that of DPPS-TP when using the OC-RA-A * algorithm for long distance path planning tasks. Meanwhile, as shown in Fig. 12 and Table III, the time cost of using a distributed parallel strategy for the long-range path planning task using source DEM is 113 times higher than that of DPPS-TP in a distributed computing environment.
In this experiment, there are several results that need to be explained why. As shown in Fig. 11 and Table II, the proposed method is only applicable to long-distance path planning tasks. For shorter distance path planning tasks, the time taken by DPPS-TP for cluster start-up and RDD transformation would be greater than the computation time for path planning. In addition, in the experiments with DPPS-TP, the cluster start-up times are all controlled between 1 and 1.3 s as the cluster start-up time is only related to the cluster configuration. The speed impact of DPPS-TP with different pathfinding distances is smaller due to   the faster sub-path planning on the tiles and the fact that the main time consumption of DPPS-TP is spent on the operation of the RDD operator. As shown in Fig. 12 and Table III, the reason for the long time taken for distributed path planning using clusters only for the DEM is that it transforms the grid data of the DEM into discrete point data resulting in a dramatic increase in the number of nodes involved in the search in the distributed calculation.

B. Results of Accuracy Comparison of Different Methods
To verify the accuracy of the large-scale path planning, the proposed DPPS-TP was compared with a single-computer serial computing strategy and a distributed parallel computing-only strategy.
Figs. 13 and 14 shows the path planning results based on single machine and distributed path-planning strategy, respectively. As can be seen from the figures, the path planning results obtained by different strategies are very similar. The pathfinding distances using DPPS-TP are a little longer than those using the single machine serial strategy. This is because, in the tile pyramid, the seek nodes in the upper layer cannot accurately derive the optimal starting and ending points of each tile in the lower layer. Tables IV and V is the accuracy comparison of single machine and distributed path-planning strategy. The search paths from the single-machine serial computing strategy and the distributed parallel computing strategy are the same as the reference path because there is no mapping process of different resolution tiles. Therefore, the deviation is zero. The search path using DPPS-TP  has a small deviation relative to the reference path since the starting and ending points of the local path planning on the tiles are not optimal. As can be seen from the tables, the deviation is smaller than 1 km for most of the circumstance. Even for long distance path planning, e.g., the planned distance is greater than 600 km, the deviation is less than 1.2 km. However, the time efficiency is greatly improved. The experimental result indicates the effectiveness of the proposed strategy.

A. Influence of Using Different Numbers of Parallel Cores
To explore the influence of different numbers of parallel cores on the time cost of path planning using DPPS-TP, seven parallel degrees were selected for the long-distance pathfinding task.
The results shown in Fig. 15 and Table VI indicate that the parallelism of the distributed computation affects the program running time in path planning over long distances, and the lower the number of parallel cores, the higher the time overhead of DPPS-TP.

B. Influence of Constructing Different Tile Pyramids
To explore the influence of different tile pyramids on the efficiency of path planning using DPPS-TP, this experiment was performed using three different tile pyramids for the pathfinding task.    DEM data were used to construct tile pyramids with tile sizes of 512×512, 2048×2048, and 8192×8192. The number of pyramid layers is seven for a 512×512 tile size, five for a tile size of 2048×2048, and three for a tile size of 8192×8192. The results shown in Fig. 16 and Table VII indicate that storing tile pyramids of different sizes also affected the efficiency of DPPS-TP.
The experiments were run from the topmost level of tile pyramids until the bottom level yielded the results. The least time overhead was spent for DPPS-TP using a tile size of 2048×2048, followed by the tile pyramid using a tile size of 8192×8192. The time expense of DPPS-TP with a tile pyramid of tile size 512×512 was larger than that with a tile pyramid of tile size 2048×2048. This is because a tile pyramid of tile size 512×512 requires processing of a larger number of tiles. The time expense of DPPS-TP using a tile pyramid of tile size 8192 × 8192 is greater than that of DPPS-TP using a tile pyramid of tile size 2048 × 2048 because it takes longer to process local sub-paths using tiles of size 8192 × 8192. As shown in Fig. 17 and Table VIII, in terms of the pathfinding distances, using a tile pyramid with a tile size of 8192×8192 is better than using tile pyramids with tile sizes of 2048×2048 and 512×512. In particular, using a tile pyramid with a tile size of 512 × 512 for DPPS-TP yielded the worst pathfinding distances. This is because with the increasing number of tiles participating in DPPS-TP, the uncertainty of connection points between adjacent tiles will also increase, which may lead to long pathfinding distance.

VI. CONCLUSION
In this article, we propose a fast path planning method using a distributed tile pyramid strategy to solve the problem of the low time efficiency of existing algorithms in large-scale path planning tasks. In addition, an A * algorithm supporting random access data structures is proposed to reduce the time cost of path planning. In experiments with a single machine serial computing strategy using lunar source DEM data, by virtue of the properties of the random-access data structure, the OC-RA-A * algorithm of this article is 3.59 times faster than the conventional A * algorithm in long-distance path planning tasks. However, the OC-RA-A * algorithm requires high memory capacity and the single-machine serial computing strategy is limited by machine memory, so the distributed path planning strategy based on tile pyramids is further proposed. By divide-and-conquer and coarse to fine grained conversion, the proposed distributed path planning strategy based on tile pyramids is 1.57 times faster than the single machine serial computing strategy using lunar source DEM data. In addition, by retaining the neighborhood information of each pixel, the proposed distributed path planning strategy based on tile pyramids is 113.66 times faster than the distributed parallel computation strategy using the lunar source DEM.