Multisource Classification of Meridiani Planum's Aeolian Landscape Using HiRISE and Opportunity Images Analysis Based on Deep Learning

The aim of the research was to analyze the possibilities of using deep learning methods for classifying multisource image data for Mars. It should be emphasized that the main goal of the research was to develop a methodology for integrating image data acquired from orbiters (MRO mission's HIRISE camera) and in situ (opportunity rover's NAVCAM camera) and to use their combined analytical potential. We used a VGG-16-based network for this article, which is well-characterized in the literature and has been successfully applied in a wide range of applications. The article proposes a methodology for the supervised classification of landforms on Mars. The proposed solution was evaluated using the Meridiani Planum area, utilizing neural network deep learning and was based on multisource image data. We found that our approach classified aeolian reliefs correctly for more than 94% of the test dataset. The classification accuracy increased to almost 96% when using panoramas developed from opportunity's images and the derivatives of the digital terrain models used during the classification process. It is possible to broaden the proposed concept of multisource classification and the customized deep learning system to the analysis of other regions of Mars and to multispectral imaging without losing the generalizability of the solution.


I. INTRODUCTION
S INCE aeolian processes play a dominant role on presentday Mars, the bedforms created due to these processes cover almost the entire Martian surface. Ripples are one of the most common aeolian features on Mars; they can be up to 1 m high and up to several meters wide, and form straight or sinuous ridges created by the accumulation of small particles (from ∼100 µm to ∼1 mm in diameter). The orientation of these bedforms is perpendicular to the direction of the wind that is responsible for their creation. On Mars, such bedforms exist as ripple fields, but can also be found as isolated features [1], [2].
The increasing amount of spaceborne and groundborne data about Mars' surface enables large scale terrain relief recognition. However, this activity is slow because of its manual approach, and requires automation. To contribute to this process, we attempted to analyze and classify these forms using machine learning methods. Because these forms appear over almost the entire surface of Mars, it is vital to employ automatic techniques to study their distribution and parameters. To develop such automatic methods, we started with a region that is well-covered by both orbiter and in situ data, and which serves as a groundtruth for analyzing bedforms and terrain. Imaging data from the rover allows for detailed, high-resolution, analysis of terrain morphometry, while HIRISE data allows for spatial context analysis. Multisource data fusion method enables consider the complementary information between each dataset confirmed, for example, in the publication [3].
The authors chose an area of Meridiani Planum (MP) that was investigated by one of two rovers on the Mars exploration rover (MER) mission [4]. What characterizes this area is its flat surface, which is interrupted only by impact craters of various ages [5], [6]. This region's uniformity favors the development of automated terrain classification methods.
The opportunity rover explored this region from 2004 to 2018 and took approximately 200 000 images of landforms, rocks, and sediments. Opportunity traveled more than 40 km across an area that, from a morphological point of view, can be divided into two classes of terrain: plains and craters. The surface of the plains is made of a flat layer of sulfate-rich sandstones partially covered by loose sediments (made up of sand and gravel). On these plains, there are vast fields of ripples located on the sandgravel covers or directly on the bedrock.
In this research, the objective of the analysis was to automate the classification process of three geomorphological settings within the MP area: ripple fields, ripples in bedrock, and sandgravel covers. Simultaneously, this classification would enable a detailed analysis of the distinguishable and unambiguous geomorphological features within the MP area. In other words, these three classes would enable the characterization of the terrain's surface in terms of the presence or absence of ripples and sand-gravel covers. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Deep learning methods has strong representation learning ability and classification ability for spatial big data processing to automate the process of extracting and classifying landforms, and therefore enable analysis of image data from both the orbiter high resolution imaging science experiment (HiRISE) and the navigational camera on the opportunity rover. The article's authors put forward the hypothesis that by using a fusion of orbiter and in situ data with deep learning, and integrating image data and the derivatives of the digital terrain models (DTMs), this would achieve a significant increase in the accuracy of the automatic classification of the aeolian geomorphological settings found on Mars. The use of the neural network in the research work makes it possible to process both image data and three-dimensional (3-D) models developed from HIRISE data and image panoramas from the rover. In fact, the image data from opportunity's camera panoramas can be processed into a multidimensional feature vector and analyzed in a joint classification process together with orthophotos and DTMs from HIRISE. The approach proposed in this article, and tested for the MP region, can be used to automate relief classification on Mars.
The issue of Martian aeolian landscape classification has been the subject of only a few scientific publications using orbiter orbiter imagery of varying spatial resolution, or using DTM as the sole data source [9], [12]. This article addresses the subject of developing a methodology for integrating and processing multisource photogrammetric data for Mars. The use of stereo images from the high-resolution HiRISE camera allowed not only the development of an orthophoto map, but also a DTM and its derived models: curvature, slopes, topographic position index (TPI), terrain ruggedness index (TRI), etc. The use of these models has significantly improved the quality of relief form classification. The integration of orbiter data with images taken by the Mars rover's navigational camera (NAVCAM) allowed us to develop a holistic methodology for integrating multisensory data and obtaining satisfactory classification results. The development of an innovative method for processing image data from orbiter and in situ sources allows for synergies and the "enrichment" of spatial data, and provides an important methodological contribution to fundamental research concerning Mars.
The layout of the article is as follows: Chapter II discusses related works on the classification of geomorphological landforms using machine learning methods and DTM creation; while Chapter III discusses the proposed proprietary research methodology that uses deep learning for processing multisource, spatial big data to classify aeolian reliefs on Mars. Chapter IV discusses the conducted research, taking into account both the selection of source data and their processing methods. This part of the article also includes a critical discussion of the results. Chapter V summarizes the research, and proposes a continuation of this article and the directions it might take.

II. PRELIMINARIES AND RELATED WORKS
The first decade of the 21st century saw the first attempts to automatically classify Martian landforms [7], [8], [9]. These works focused on the segmentation of craters from other landforms using low-resolution DTM data obtained from the Mars orbiter laser altimeter (MOLA) sensor. Machine learning algorithms, such as support vector machines were used for the automated landform segmentation [6]. The increase in the amount and quality of the imaging data from Mars has led to a better understanding of Mars' surface. Deep learning algorithms have become widely used for the automatic detection, classification, and segmentation of landforms on the planet. Such algorithms were used both for impact [7], [8], [10], [11] and geomorphological (including aeolian) forms [12], [13], [14]. Deep learning techniques have been applied successfully to the most widely used imagery and elevation data sources collected from Mars: MOLA [7], high resolution stereo camera (HRSC) [10], context camera (CTX) [12], and HiRISE [21], [15]. Bickel et al. [16] used convolutional neural networks (CNNs) to automate rockfall mapping on Mars and the Moon. The NAVCAM installed on the Mars rovers was used both for navigation and scientific research. Maki et al. [17] described the camera's parameters and possible uses.
There are also works concerning the segmentation and detection of landforms based on images taken by the Martian rovers. Wagstaff et al. [18] proposed a neural network for detecting the content of images taken by the Curiosity rover to provide a content-based search of such images using a web interface.
There are several studies relating to the use of deep learning techniques to map geomorphological structures, which is also the primary goal of this article. Barret et al. [13] are currently using neural networks for the segmentation of geomorphological forms that are visible in HiRISE imagery, investigating the Oxia Planum and Mawrth Vallis areas. Another paper, similar to the current article, is also by Barret et al. [13], which is aimed at creating a product that will be helpful to planetary geomorphologists. Wilhelm et al. [15] introduced a dataset for machine learning solutions for the geomorphological analysis of Mars. However, these works did not focus only on aeolian forms, and did not consider additional data sources such as elevation models or images from the rovers.
Thus far, only a few studies, by Rothrock et al. [19] and Tao et al. [20], have used orbiter and in situ data in one pipeline. Their studies used HiRISE imagery for determining optimal landing site traversability for future rovers, and in situ data for wheel slip predictions. However, these two data sources were used separately. While some studies use DTMs in deep learning processes concerning Mars [10], there has been no work combining image data and elevation models in a deep learning pipeline for the semantic segmentation of Martian terrain. However, such an approach has been used on data from Earth [21]. Furthermore, none of the published works have attempted to use data from both the rover and the orbiter in a single deep learning segmentation pipeline. Such attempts relating to Earth have been successful using orbiter data and street view services for the building type and land-use classification of urban areas [22], [23], [24]. The efficient and state-of-the-art practice of semantic segmentation are described in publications [25], [26].
Combining the rover's NAVCAM imagery and the orbiter HiRISE data is a critical step in knowledge acquisition based on diverse and mutually complementary data. This issue was the subject of research by Li et al. [27], Di et al. [28], and Alexander et al. [29].
The quality of the analyses and image classification resulting from deep learning methods relies heavily on skillful source data processing. Our research used information from both the analysis of HiRISE orbiter data and images from the NAVCAM on the opportunity rover. HiRISE image processing consists mainly of image orientation; then the terrain model from the images (or a stereo pair) is generated. Nonetheless, DTM generation from Mars images can be challenging, and many scientists are working on both DTM improvement and DTM evaluation, which can be a demanding task due to the lack of reference data relating to Mars (e.g., like the GNSS systems used on Earth). It is possible to generate DTMs using a photogrammetric workflow. Various privately developed or commercially available approaches and software can generate DTMs. Kirk et al. [30] compared various software packages used for DTM generation: DLR HRSC Team Pipeline; SOCET SET; and Ames stereo pipeline (ASP). The most popular open-source software for DTM generation for Mars is ASP [31]. NASA's ASP is an open-source tool used for the stereo image processing of data acquired from orbiters around Earth or other planets [32]. ASP requires the installation of integrated system for imagers and spectrometers (ISIS), a digital image processing software package developed by the United States geological survey for NASA. ASP enables DTMs, orthoimages, and 3-D models to be generated.
There are also articles in the literature about the comparison between Mars DTMs generated from different orbiter systems. Kirk et al. [33] calculated slope angles based on DTMs to assess the appropriateness of landing places for the Phoenix Mars Lander. Thus, it is possible to compare DTMs from the HiRISE images to DTMs from different systems, for example, HRSC or MOLA measurements.
Another prevalent task found in the scientific literature is the co-registration of multiresolution DTMs. HiRISE images, characterized by very high spatial resolution, do not cover Mars completely. Therefore, much of the research on DTMs for Mars relates to different approaches that demonstrate the possibilities of multiresolution DTM co-registration. Lin et al. [34] worked on the automated co-registration of MOLA, HRSC, and HiRISE DTMs using surface matching techniques. Wang and Wu [35] presented a different approach, in which they first co-registered the CTX and HiRISE images using characteristic points, and then the DTMs were co-registered based on the new position of the images.
The issue of the detection and analysis of Martian aeolian forms has also been the subject of research and publication by Bandeira et al. [36], Bandeira et al. [37], Carrera et al. [38], and Va et al. [39].

III. METHODOLOGY
The research aim of this study was twofold: to develop a methodology for using deep learning methods to automate the classification of geomorphological settings found in the MP region; and to perform a quantitative and qualitative analysis of the results. However, the automation of the image data's classification requires that they be preprocessed. It is particularly vital in the case of Martian data, which lack the unambiguous spatial reference that GNSS systems provide for Earth. Therefore, the methodology developed in this work incorporates both elements that obtained georeferencing for HiRISE and NAVCAM image data: processing the data into an orthophotomap; DTMs; and the derivatives of the elevation model developed using orbiter data (from HiRISE) and the panoramas "seen" by the opportunity rover (from NAVCAM).
1) The processing of a stereo pair of HIRISE images into a DTM and an orthophoto. 2) Obtaining DTM derivative models-curvature models, TRI, and TPI.
3) The processing of in situ images taken by the NAVCAM camera into a coherent panorama with an equalized tonal level and precisely defined spatial orientation. 4) The processing of the above-mentioned image data using the principal component analysis (PCA) method in order to reduce the dimensionality of the problem. 5) The combined processing of orbiterdata (orthophotomap, derived DTM models) and in situ panoramas by a neural network with a well-defined architecture (VGG-16). 6) The cartographic visualization of the obtained classification results together with a comprehensive quantitative and qualitative evaluation. Each of the stages required the development of original algorithms (or significant modification of existing methodological solutions) dedicated to processing image data of a specific type. For the implementation of most tasks we also developed their own scripts (mainly in the Python language). It should be emphasized that due to the specificity of Mars imaging data, an important problem that is related to the lack of GNSS systems for this planet, was the issue of the orientation, localization, and spatial reference of the data. Solving this problem was an important methodological contribution to the analysis of spatial data that had no georeferencing. Fig. 2 presents the general outline of the research methodology. The research employed two data sources for all other computations: HiRISE orbiter imagery and the opportunity rover's NAVCAM images. A digital orthophotomap with a high spatial resolution based on the HiRISE imagery and the digital elevation model of the area of interest, along with derived raster models (namely TPI, TRI, longitudinal, and cross-sectional curvatures) were created. Semantic features were extracted from opportunity's images and interpolated onto the spatial domain. A CNN model capable of working with many data sources enabled all the data to be used during the semantic segmentation process. This chapter focuses on describing the particular stages of the research.

A. HiRISE Data Preprocessing
The HiRISE data were downloaded from the planetary data system (PDS) [40] in the form of HiRISE experiment data records (EDRs) files. Files for eight images were enough to produce four stereo pairs and to create DTMs that covered the opportunity's traverse. The following is a list of the stereo pairs:  The HiRISE data were processed using the NASA ASP (version v2.6.2) and ISIS (version 3.6.0) on Ubuntu 18.04 OS.
After downloading, the EDRs were combined into one image. In the next step, the common areas of each stereo pair of images (the so-called overlap) were selected automatically to prepare the images for further processing. Finally, point clouds from the stereo pairs were generated. From the point clouds, DTMs in the form of raster files were produced. Later, the HiRISE images were orthorectified using the DTMs to remove the influence of terrain height on the images. Following this, the four orthoimages and the DTMs were mosaicked. The DTM and orthoimage mosaics were aligned horizontally to the data from

B. NAVCAM Data Preprocessing
The NAVCAM camera was chosen for the in situ investigation (see Fig. 3). This instrument, designed to provide terrain context for other instruments, acquired a significant number of landscape-type images. It also had a larger field of view (FOV) than the panoramic camera. The navigational camera was a CCD stereoscopic instrument, with each camera having a 45°x45°FOV and an angular resolution of 0.82 milliradians per pixel (mrad/pixel). Its depth of field ranged from 0.5 m to infinity. It was mounted on a mast 1.54 m above the Martian surface and had a stereo baseline separation of 20 cm [17]. Creating 360°p anoramic views of the local terrain is possible after mosaicking NAVCAM images. The approach used in this article is a development and modification of the methodology proposed by Cao et al. [22]. The architecture developed by the authors of the current article has reduced the number of trainable parameters, which yielded better results for a relatively small dataset.
Also of key importance is the issue of selecting the learning data. The work of Cao et al. [22] did not took into the account altitude data (DTM), which was a substantial part of the research for the current article.
As proposed by Cao et al. [22], to use the rover's images in a multisource deep learning process, it was necessary to transform them into panoramic images representing the in situ terrain representation around a given point. First, opportunity's NAVCAM images, which are available on the NASA PDS, are at a resolution of 1024×1024, organized by the Martian day (a sol) on which each image was acquired. Additionally, the rover's traverse includes metadata that allows each image to be associated with the rover's location and the corresponding position of the camera. Because of this information, it is possible to create a spatially localized panorama for each image-taking site of the traverse. During the photo stitching process, the histogram matching technique reduced the radiometric differences between the images. The final panorama image was created from the cylinder-projected original images. Areas in which two photos overlapped were merged by selecting every second pixel of each image and combining these into a mosaic (see Fig. 3). To ensure spatial consistency between panoramas, each began in a northern direction. The resolution of a single panorama was 2048×6992.
Since NAVCAM's primary purpose was to navigate the rover, multiple images were not taken at each point of the rover traverse. On average, 30% of each panorama was covered by data pixels (i.e., non-NaN pixels). There were 2905 panoramas created of the study area.

C. In Situ Semantic Feature Extraction
An algorithm proposed by Cao et al. [22] formed the basis for the workflow for extracting features from the rover's imagery data and interpolating them into the orbiter image domain. This process consisted of two main stages: feature extraction, and interpolation (see Fig. 4). In the first stage, a pretrained neural network extracted feature vectors from the input image. A deep CNN pretrained on the Places365 dataset [41] was used for this purpose. The Places365 dataset consists of 10 million scene photographs separated into three macro-classes: indoor; nature; and urban.
Each panorama image was divided into four equal parts, facing four different directions. Then, using the pretrained places-CNN network, a 512-D feature vector was extracted for each image. Four vectors were concatenated into one 2048dimensional vector representing each panorama location. Finally, PCA was used to reduce the dimensions of the feature vector to 50. Cao et al. [22] successfully used this number of representative features.
It was necessary to express the feature vectors representing the surroundings of opportunity's traverse in the same domain as the orbiter data. The Nadaraya-Watson interpolation was used to interpolate features into the orbiter image domain. This method, which is a generalization of the inverse distance weighting method, can model opportunity's closer panoramas as being of higher importance, and cut-off, at a predetermined distance, panoramas that are further away, where the in situ features should not impact the decision process.

D. Semantic Segmentation Network
An encoder-decoder CNN architecture served as the base network for the multisource semantic segmentation of the aeolian reliefon Mars. Such networks consist of two modules: an encoder and a decoder. The first one extracts the features from the input data, and the second one upsamples the extracted features to reconstruct the original shape of the data. This approach allows for the pixel-wise segmentation of input raster images. The use of convolutional layers enables the network to model the spatial features within the images, which is crucial for achieving good segmentation results (see Fig. 5).
It should be noted that the combined processing of image data from HIRISE and panoramas from the opportunity rover required data preprocessing. The orthophotomap and DMT developed from the orbital images were registered in the Martian spatial reference system. The panorama developed from NAVCAM images was preprocessed based on the rover's location and camera parameters and the angles (horizontal and vertical) of the imaging direction. The image features were spatially interpolated.
The encoder part of the network was based on VGG-16 architecture [42]. It consisted of five convolutional blocks. The first two blocks comprised two convolutional layers followed by batch normalization [43] and the rectified linear unit activation function [44]. The subsequent two blocks had three such layers. A max-pooling layer that reduced the output by two times followed each block. Instead of a fifth block of VGG-16, two convolutional layers were used.
The decoder module was also based on VGG-16, but without the last block, and with upsampling layers (by repeating columns and rows) instead of max pooling. The decoder also reduced the number of convolutional filters: the encoder had 17 million trainable parameters, while the decoder consisted of 1.5 million such parameters. The reason for introducing the modifications to the base network architecture (e.g., the reduction in the decoder's parameters and changing the final layer of the encoder) was that the processing of the spatial big data from Mars showed better performance on test data than the classic, deep neural network architecture used to process image data collected on Earth.
Finally, a single convolution layer with a spatial resolution equal to the input image finished the network. The softmax activation function [45] produced the probability that a given pixel would be affiliated with one of five classes. A class with a higher probability was chosen to be valid for each pixel. Fig. 5 shows an overview of the neural network model's architecture.
We used two data fusion methods for the network architecture. The digital elevation model's data was stacked along with orbiter imagery in the main part of the network and analyzed in the same fashion as the spectral bands in the RGB image; thus the input consisted of three layers. Integrating an additional 50 in situ feature maps in this way could significantly reduce the impact of the semantic segmentation from other bands (assuming that the imagery data is both reliable and the main source of information, and that other data sources are supplementary). Taking this into account, we decided to implement an approach introduced by Cao et al. [22]. The main goal of this approach was to use an additional encoder to extract semantic features from the maps and fuse the last layer of each convolutional block with the main encoder through concatenation with the related feature maps generated from the orbiter data.

IV. EXPERIMENTS AND ANALYSIS
The research covered the section of MP over which the opportunity rover had traveled (see Fig. 6). Due to the depth of the image obtained from the NAVCAM installed on the rover, an area, limited spatially to a 100 m buffer around the rover's route, was selected for analysis. First, the orbiter imagery and DTM products were cropped to the area of interest. The original data were also normalized to a value range of 1-255 (with 0 reserved for "no data" pixels). Elevation models have lower resolution than orbiter images, so the DTM products were resampled to match the resolution of the HiRISE images (27 cm/pixel) in such a way that each pixel of image data corresponded to information from the nearest pixel of the DTM products. Finally, test data along with labeling were divided into 256×256 georeferenced tiles. Information about the position of each tile was necessary to create matching interpolated terrain features. There were 7,631 tiles created for the entire opportunity traverse buffer area (see Fig. 6).
The original NAVCAM image dataset contains 51 308 images. However, these are stereo images; thus, only the images from the right-side camera were used to create panoramas. Moreover, images with decayed resolution and those that did not spatially intersect with an area of interest were also eliminated. This resulted in 2904 panoramas to be used for feature extraction. The distribution of the panorama locations was not even: Fig. 6 visualizes this distribution.

A. Training and Test Data
The classification used in the labeling was created based on the two main terrain relief features found within the research area. The first was the surface type, either bedrock or loose sediments. The second feature concerned the occurrence of ripples, which may cover the underlying surface entirely, partially, or not at all. The resultant combinations of these two main features produced five potential labeling classes. Two of these classes were excluded from classification because, in the first case, no large surfaces were covered solely by bedrock, and second, loose sediments covered partially by ripples were challenging to differentiate from loose sediments (see Fig. 7).
The final classifications used in the analysis consisted of: 1) ripple fields; 2) ripples on bedrock; 3) sand-gravel covers; and 4) others, which included craters and linear tectonic forms. Manually labeled vector data (divided into three classes: ripple fields-class 101; ripples on bedrock-class 102; and sand-gravel covers-class 104) served as the basis for training and testing the CNN model. The authors chose 11 areas for training purposes and 13 for testing. The process of labeling  and processing the vector data utilized ArcGIS and QGIS software environments. A polygon with an associated class number represented an area corresponding to one class. Fig. 8 shows an example of a labeled testing area. All labeled areas were rasterized and tiled along with the corresponding orbiter DTM, and interpolated in situ features data. As a result, 523 tiles for each dataset were created for training and 108 for testing. Fig. 9 presents a pixel-wise summary of the training and test datasets (with the "other" class neglected).

B. Data Augmentation
Because the training dataset was not large enough, it had to be augmented artificially by creating extra samples using data augmentation techniques. Many implementations that have had  limited datasets have employed such techniques successfully [46]. For each training dataset tile, five augmented tiles were created by applying the following transformations.
3) Random contrast change between −25% and 25%. After data augmentation, the entire training dataset consisted of 3138 tiles per data source.

C. Experiments
The authors conducted six experiments (see Table I) to test the impact of data source combinations on the final quality of the model: 1) HiRISE images only-the input data consisted only of HiRISE images using a single encoder. 2) HiRISE images and TPI/TRI layers-TPI and TRI layers stacked with HiRISE images using a single encoder. 3) HiRISE images and curvature layers-the use of longitudinal and cross-sectional curvature layers stacked with HiRISE images using a single encoder. 4) HiRISE images integrated with the in situ feature maps-a combination of the HiRISE images using the first encoder with features fused from the in situ feature maps generated using the second encoder. 5) HiRISE image and TPI/TRI layers integrated with the in situ feature maps-the same setup as in experiment 4, with additional TPI and TRI layers stacked with the HiRISE images. 6) HiRISE image and curvature layers integrated with the in situ feature maps-the same setup as in experiment 4, with additional longitudinal and cross-sectional curvature layers stacked with the HiRISE images.

D. Implementation and Hardware Details
All CNN models were implemented using the Keras library [47] on top of the TensorFlow framework [48]. The first four blocks of each encoder were initialized with the corresponding weights of the VGG-16 network trained on the ImageNet dataset [49]. Other convolutional layers of the network were initialized using the He initialization [50]. Each convolutional layer had l1 and l2 regularizations applied. Cross-entropy was used as a loss function, and the stochastic gradient descent as a network optimizer. The learning rate was initially set at 0.01 and then divided by ten after the 15th, 25th, and 35th epoch. The entire learning process lasted for 50 epochs. After every training epoch, data were randomly shuffled to ensure that the model learned independently of the sample order. The cut-off threshold for the significance of the in situ features was set to 50 m.
A CENAGIS infrastructure-a computing infrastructure for conducting spatial big data analyses, created at the Faculty of Geodesy and Cartography of the Warsaw University of Technology in 2021-was utilized in conducting the model training and data preprocessing. Computations were conducted using a separated Docker [51] environment with access to 128 Gigabytes of memory, 8 Intel Xeon Silver 4216 CPU cores, and 1 NVIDIA Tesla T4 GPU.

E. Evaluation Metrics
The authors used global and per-class evaluation metrics to assess the results: overall pixel accuracy; per-class pixel accuracy; per-class precision; per-class recall; and per-class F1 score. Additionally, a normalized confusion matrix for each experiment was created as follows.
1) Overall pixel accuracy where tr is the trace of the matrix, CM is the confusion matrix, and N is the number of pixels in all classes 1) Per-class accuracy 2) Per-class precision 3) Per-class recall where c represents the index of a given class in the confusion matrix, n is the number of all classes, CM ij is the ith row and the jth column element in the confusion matrix, and (c) is the number of elements in class c [for (2)-(4)].
The per-class F1 score takes precision and recall metrics into account V. RESULTS AND DISCUSSION The conducted experiments showed that using the HiRISE data and advanced machine learning methods that utilize a deep learning approach, produces over 94% accuracy in the automatic classification of aeolian reliefin the MP region. One should emphasize that, due to the spatial data processing, the training and validation dataset analysis is a long drawn-out process requiring specialized computing infrastructure based on graphics processors.
Using data fusion, i.e., the combined use of orbiter imaging and elevation data (DTM and its derivative TPI and TRI models), fosters further classification accuracy. Using an even broader scope of data, i.e., the fusion of HiRISE orbiter data (image and elevation data) and in situ data (panoramas recorded by opportunity's NAVCAM), produces the best results.
It is noteworthy that both the quantitative analysis (see Table II) and the qualitative analysis (see Fig. 10) were essential for evaluating the results that were obtained. A relatively small difference (amounting to 1%) in the accuracy of the results (95.94% for set 5 and 94.90% for Set 2) does not fully reflect the quality of the final classification. The fusion of orbiter and in situ data enables a "smooth" image to be produced of individual geomorphological settings (see Fig. 10), compared with the results for a different set of source data, for which the edges of feature classes were sharper.
When analyzing the results, one should also emphasize that they depend not only on the amount of source data and the methods used for their preprocessing, the neural network architecture, or the parameterization of the deep learning process, but also on the quality of the vector data used in the machine learning process. The preparation of a training data set requires that the data be divided into object classes (this approach employed three such classes), and a manual process is used for determining the individual divisions. The boundary between individual classes, for example, ripple fields and ripples on bedrock, is blurry and may differ depending on the observer's knowledge, experience, and intuition. The research used training data that was selected manually by two independent observers to minimize this problem. The same method was used to determine the validation and testing data.
It should also be noted that the specificity of Mars and its landscape both play an essential role in automating the spatial data classification process. The conducted experiments showed that using a deeper and more complex neural network architecture (SegNet [52]) did not produce better quantitative results, and resulted in a qualitatively worse semantic segmentation effect. The experiments also showed that, in the case of the SegNet tests, there was no improvement in the segmentation quality when using additional data sources (i.e., in situ images). This stems from the specificity of the aeolian landforms in the MP area. Obtaining these results (see Table III) required a series of numerical experiments applying many different modifications to the neural network architecture. Achieved accuracies are on the level of 1% in favor of a fused approach, which is not a large improvement in quantitative assessment; however, the difference is visible in the qualitative assessment of the results (see Fig. 10).
The main objective of the research was to test the impact of using combinations of different spatial data types on the performance of the network. The network used in this research was the VGG-16-the basis of this architecture is a known and well tested solution. However, in the future, it will be necessary to investigate the performance of the built system using another, more efficient architecture (such as EfficentNet [53] or CoAtNet [54]), which may translate into better results in less time. This task will require the development of an optimal methodology for connecting the intermediate layers of each encoder, taking into account the specifics of the network architecture used.
This issue requires further research, as the analysis and classification of aeolian features on Mars is of interest to many research teams [55].

VI. CONCLUSION
This article has shown that, for the analysis of multisource data that describes the surface of Mars, selecting the appropriate methodology and geoinformation tools is crucial. Because the HiRISE camera and the opportunity rover have collected spatial big data over a number of years, the processing of this data requires machine learning methods that adopt a deep learning approach. Developing a methodology for data analysis and classification also requires defining the object classes distinguished in the automatic classification process. This article assumes that ripple fields, ripples on bedrock, and sand-gravel cover are intrinsic to the MP region. Differentiating these object classes concerns the structure and distribution of aeolian reliefobserved in this region of Mars. However, the developed approach is so universal that, without inference accuracy loss, it is possible to either generalize or refine the distinguished classes, or add new ones with morphological features that are characteristic for other areas of the planet.
It is important to note that to obtain satisfactory scientific and cognitive classification results, the fusion of source data, their preprocessing, and the appropriate choice of deep neural network architecture are essential. Using both the image data from the orbiter and the data obtained in situ by the rover in the machine learning process improved our results. When analyzing the results, one should also note that using a broad range of source data and the derivatives of DTMs enables a "smooth" image to be achieved, made up of individual subdivisions, which is analogous to manual classification. Selecting the neural network architecture also plays a vital role in this process. The deep learning network models that we used increase the rate of correct classification and-similar to data fusion-contribute to the regularization of the shapes of individual features.
The results make it possible to develop a map of the dominant land cover types for the opportunity rover's traverse; the developed methodology is also the first step toward developing a comprehensive, multilayer geomorphological map of Mars.
According to the analyses, there are complex, modular geomorphological features in many places that may be interpreted as multilayer formations. The development of the research methodology proposed in this article will enable the classification of modular features and serve as a basis for developing tools that automate the classification process of land cover types and individual objects such as ripples. An in-depth numerical analysis of the morphometric parameters of individual forms and the determination of their features, such as spacing, morphometry and morphology, and crest direction, constitutes an initial step towards inferring, from static data analysis (e.g., image data), the lengthy morphodynamic processes that create aeolian landforms.
This article analyzes classification capabilities based on a DTM developed from HIRISE data and panoramas obtained from the opportunity rover. Further work will focus on the use of the 3-D models developed from the opportunity rover's stereo images, as well as comparing the interpretability of the terrain model obtained from orbiter data (HIRISE) and from the rover (opportunity).
Further research in this area will deal with the methodology we have developed in order to analyze the data collected by the Perseverance rover's significantly greater number of cameras, and the image data obtained by the Ingenuity drone. Using the data collected during these missions will enable the development of a digital elevation and terrain model that will have an order of magnitude greater resolution than that of the opportunity rover, thus enabling machine learning methods to automate the classification of specific morphological features on Mars.
ACKNOWLEDGMENT This project was funded by POB Research Centre Cybersecurity and Data Science of Warsaw University of Technology within the Excellence Initiative Program-Research University (ID-UB), and the Anthropocene Priority Research Area budget under the program "Excellence Initiative-Research University" at the Jagiellonian University.