Artisanal and Small-Scale Mine Detection in Semi-Desertic Areas by Improved U-Net

In this letter, we propose a deep learning (DL)-based approach, which exploits multispectral Sentinel-2 open-source data and a small-size inventory to map artisanal and small-scale mining (ASM). The study area is in central northern Burkina Faso (Africa) and is characterized by a semi-desert environment that makes mapping challenging. In sub-Saharan Africa, ASM represents a source of subsistence for a significant number of individuals. However, because ASM are often illegal and uncontrolled, the materials employed in the excavation process are highly dangerous for the environment as well as for the lives of the people involved in the mining activities. One of the most important aspects regarding ASM is the record of their spatial location, which, at the moment, is missing in most of the African regions. The performance evaluation of two state-of-the-art DL architectures [U-Net and attention deep supervised multiscale U-Net (ADSMS U-Net)] is provided, along with an in-depth analysis of the predictions when dealing with both dry and rainy seasons. The ADSMS U-Net architecture yields generally more accurate predictions than the basic U-Net allowing us to better discriminate ASM in such an environment. The findings show that the proposed approach can detect ASM in semi-desertic areas starting with a few samples at a low cost in terms of both human and financial resources.


I. INTRODUCTION
A RTISANAL and small-scale mining (ASM) represents a source of subsistence for a significant number of households in sub-Saharan Africa [1]. Burkina Faso is the fastest-growing gold producer in Africa and currently the fifth largest in the continent [2]. Recent estimates indicate that around one million miners work in ASM in Burkina Faso, over two-thirds of whom are engaged in gold production, providing a livelihood to almost four to five million people [3]. Despite the significant contribution of the mining sector to the country's economic development, ASM triggers major environmental, social, and health environmental issues since Manuscript  many exist in the informal mining sector and rely on unskilled workforce using rudimentary tools and techniques [4]. The most serious environmental impacts of ASM are connected to mercury emissions and surface waters, as well as soil degradation and siltation of aquatic bodies [5]. Mercury poses major health concerns, as do respiratory problems caused by breathing dust and tiny particles from blasting and drilling [4] and accidents caused by extraction operations without any protection or specified safety standard. Land-use conflicts and terrorist attacks targeting ASM activities have grown regular in Burkina Faso in terms of societal implications [6]. Furthermore, precarious housing conditions and child labor are common at ASM sites. Given the often-illicit condition of ASM activities in Burkina Faso, their location is usually unknown. ASM inventories are required by government institutions as well as legal mining companies to efficiently plan and implement management strategies for ASM formalization. However, the availability of in situ datasets is highly limited throughout this area. The increased activity of armed groups and intercommunity conflicts impede field data collection; thus, there is an urgent need to create remote sensing-based technologies capable of successfully monitoring ASM activities to be inventoried. Various studies have used pixel-based approaches to detect ASM and associated deforestation using Earth observation (EO) data [7], [8], [9]. The main issue of those models lied in the incapacity of correctly discriminating the water from the alluvial mine sites [9]. The absence of geographical context is a major flaw in pixel-based classifiers. Since each pixel is treated separately, a "speckled" appearance is observed in the output maps [10]. Recently, deep-learning (DL) methods, and, specifically, convolutional neural networks (CNNs), have shown promising results in a wide number of image processing applications [11]. CNNs can consider spatial relationships among target class pixels and surroundings. However, despite the apparent potential, DL-based methods for ASM detection are still in their infancy. Few papers have been published, in which machine learning (ML) methods have been employed to map ASM. Gallwey et al. [12] employed U-Net and open-source Sentinel-2 multispectral imagery to perform multitemporal ASM mapping and associated environmental changes in the tropical country of Ghana. Nyamekye et al. [13] used artificial neural network (ANN), random forest (RF), support vector machines (SVMs), and CNN on Sentinel-2 imagery to perform change analysis in ASM landscape in the Birim Basin, Ghana. This brief literature review shows the intense and adverse impact that ASM has on the environment as well as on the human population. Moreover, it shows that very few studies investigate DL-based automated approaches for ASM detection, and none of them explores the very challenging semi-desertic environment. Given the abovementioned importance of ASM inventories, we employ an improved U-Net architecture, originally designed for skin lesion segmentation by Abraham and Khan [14], and we adapt it to automatically map ASM on multispectral Sentinel-2 data. A performance comparison with the well-known cutting-edge U-Net architecture is provided. Furthermore, both dry and rainy seasons are investigated to define the most suitable settings for ASM mapping purposes.

A. Study Area
The study area is in the Bam province of Burkina Faso's Center-Nord region. This region is part of West Africa's Sudano-Sahelian zone. The climate is semiarid, characterized by a dry tropical climate with a short rainy season that goes from June-July to September-October and a long dry season. The area receives a mean annual rainfall in the range of 600-800 mm. Annual mean temperatures are usually high, July being the warmest month and January being the coldest, with temperatures in the range of 30 • -36 • and 20 • -25 • , respectively. The vegetation of the region is mainly composed of woody savanna and steppe [15] on sandy soils with low organic matter and nutrient content. Mean annual rainfall and soil nutrient content govern vegetation cover in the study area; consequently, vegetation growth occurs during the rainy season [16]. These characteristics make the presence of bare soil common in the study area during the dry season. Rain-fed subsistence agriculture is the primary economic activity [15]. However, the rapid development of the gold mining sector has dramatically transformed the economy of Burkina Faso [17], and many individuals have turned into what Lahiri-Dutt [18] defined as "extractive peasants," who combine agriculture and mining in subsistence settings. In fact, in the Centre-North region of Burkina Faso, 95% of households are engaged in agriculture and 75% of households report that the ASM sector is their principal source of income [2], [19].

B. Materials
Multispectral open-source Sentinel-2 level 2A (bottom of atmosphere) imagery is acquired from the Copernicus Access Hub [20]. Sentinel-2 Images from both dry and rainy seasons are acquired for April 29, 2021 and July 28, 2021, respectively. For both acquisitions, cloud coverage is less than 10%. We chose these items with the goal of developing a low/zero cost mapping technique that can be used by both commercial companies and public institutions to identify and map ASM. We tried various combinations of bands from Sentinel-2 imagery; however, we chose to use the red, green, blue (RGB), near-infrared (NIR) bands as the best composite to detect ASM in this work. The ASM inventory was manually digitized using Sentinel-2 imagery for ground truth and model validation. We used Pleiades and Google Earth imagery as an aid to accurately discriminate ASMs [21]. Because the spatial extension of the ASM does not alter throughout the wet season, we used ground truth from the dry season. Many digitized ASM are not visible in Google Earth since the last high-resolution image available in the area was acquired in November 2019. The polygons are then converted to raster and resampled to the resolution of the optical data (10 m), using the "nearest" interpolation operator to preserve the binary mask values (0, 1).

A. Dataset Creation
We downloaded a large Sentinel-2 tile that covers the whole research region, as well as a ground truth mask with the same size and geographic coverage as the Sentinel-2 tile. The optical tile is composed of 3710 × 3836 pixels, which correspond to 14 231.56 km 2 and four bands. The count of the pixels in the study area shows a ratio of 99.96% of pixels belonging to the background class and 0.04% of mapped ASMs. The tile is divided into 128 × 128 pixels patches with no overlap, for a total of 3028 image patches, of which we save only the 96 that have at least one labeled ASM pixel; 57 patches without ASM are randomly selected and added to the dataset to feed to the models more background information. Out of those 153 patches (before augmentations), we use 80% for training, 10% for the validation set, and 10% for the test dataset. We chose 128 × 128 as the best patch size since neither increasing the field of view to 256 × 256 nor decreasing it to 64 × 64 improved model accuracy. Finally, the pixel values of the images are normalized between 0 and 1 in relation to the minimum and maximum pixel values of each original patch.

B. U-Net
U-Net was first employed in biomedical image segmentation [22] and has been used in a range of semantic segmentation applications, generally yielding excellent results [23]. A contracting path (encoder) captures low-level representations, while an upsampling path (decoder) captures high-level ones. The encoding path is comparable to a conventional CNN structure, consisting of sequential convolution blocks. Each block has two convolutional layers with a kernel size of 3 × 3 and a 2 × 2 max-pooling layer. Each convolutional layer is activated by the rectified linear unit (ReLU) activation function [24]. In the encoder path, a 2 × 2 max-pooling layer is put at the end of the convolutional block to perform downsampling, while in the decoding path, it is replaced by a 2 × 2 upsampling layer. The use of skip connections between encoding and decoding convolutional blocks aids in the recovery of fine-grained features in prediction.

C. Attention Deep Supervised Multiscale Regional U-Net (ADSMS U-Net)
The conventional U-Net structure presents some limitations, especially when dealing with highly imbalanced data and small targets, as in this case. To mitigate the abovementioned issues, we adopt the U-Net-like architecture designed by Abraham and Khan [14] capable of improving the balance between precision and recall in the identification of small targets. The ADSMS U-Net architecture (see Fig. 1) is based on the above-explained U-Net, which is known for working well in data scarcity contexts. However, cascading convolutions may cause false detections for small objects that show large shape variability [25]. Soft attention gates (AGs) are used to pick relevant spatial features from low-level maps [14] and, therefore, mitigate the issue. Finally, because various sorts of class information are more easily reachable at different sizes, one input picture pyramid precedes each max-pooling layer into the encoder. This technique, when combined with deep supervision, increases segmentation accuracy for data populations, where small features might get lost in encoding convolutions. Deep supervision acts as a powerful "regularization" when training data are limited and networks are relatively shallow.

D. Supervised Classification
Semantic segmentation depicts the relationship between pixels and class labels, such as ASM and background in this example. We use two encoder-decoder network structures to classify each pixel and build a final output patch with the same size as the input one. The ASM detection task is handled as a binary classification problem, using the classes ASM and background. As stated above, two different CNN architectures are employed in this study, U-Net and ADSMS U-Net. Regularization strategies were employed to minimize overfitting due to the small training set. We choose vertical and horizontal random flip, random zoom, and random shear as image augmenters, considering that flipping a satellite ASM image results in newer ASM images exhibiting realistic shapes and orientations. Moreover, the focal Tversky loss (FTL c ) function [14] is used to mitigate the strong imbalance between ASM and background classes. Focal loss functions proved a high capability of balancing the tradeoff between precision and recall when training on small targets [26]. The focal Tversky loss nonlinearly concentrates training on hard cases with a Tversky similarity index (TI c ) (1) of less than 0.5, while excluding simple examples samples from the function where p ic denotes the likelihood that a pixel belongs to the ASM class c, while p ic denotes the likelihood that a pixel belongs to the non-ASM class c. The same may be said of g ic and g ic , respectively. In the case of a substantial class imbalance, the α and β weights can be modified to increase the recall. Finally, FTL c function can be defined as where γ ranges between 1 and 3. As a loss function optimizer, we used a stochastic gradient descent strategy, which is based on an adaptive estimation of first-and second-order moments (Adam). Adam optimizer is known for being beneficial in applications, where data are noisy and/or gradients are sparse [27]. A loop-based hyperparameter tuning strategy was applied to determine the best combination. Hence, we iteratively trained the model with a set of hyperparameter combinations, such as the number of filters (2,4,8,16, and 32), the batch size (2, 4, 8, 16, and 32), and the learning rate (10e-4, 5e-4, 10e-5, and 5e-5). Each model is trained for 250 epochs, while a model checkpoint strategy was used to save the weights associated with the lower validation loss. Automated early stopping was used to reduce the training/tuning time by setting 35 as patience.
The trained models are used to predict classes on an unseen test dataset. Predictions are validated through the manual inventory previously designed. The model output has two classes (ASM and background), and the precision (3), recall (4), F1-score (5), and intersection over union (IOU) score (6) are determined using the following formulas, where TP, TN, FP, and FN stand for true positives, true negatives, false positives, and false negatives, respectively: Recall : r = TP (TP + FN) (4) IOU − score : i = TP (TP + FP + FN) .

IV. RESULTS AND DISCUSSION
We present and discuss the findings of using U-Net and ADSMS U-Net models to detect ASM on Sentinel-2  multispectral imagery (see Fig. 2). Segmentation results are presented for both rainy and dry seasons (see Table I). Hyper-parameters are tweaked for each model to produce the best possible outcomes. As stated earlier, the goal of this study is to investigate and analyze the feasibility of implementing a DL-based ASM mapping approach in a semi-desertic area using open-source data and a small training dataset.
During the tuning phase, all models trained in the area showed a strong tendency to overpredict the ASM class, with FPs in correspondence with urban areas, bare soil, and water bodies. We found the first explanation for this issue in the relation between the 10-m spatial resolution of Sentinel-2 imagery and the dimensions and intrinsic characteristics of ASM. ASM have small dimensions, ranging from 6846 to 207 867.85 m 2 , with a frequency distribution that shows 300 mines out of 342 between 68 and 10 000 m 2 . The vertical excavation is typically 1 m; however, the spectral reflectance of the mines varies around the area. We discarded the idea of categorizing ASM into distinct groups based on soil reflectance since the inventory was insufficient. Finally, as stated above, the study area is a challenging environment with evident high spatial variability. However, FPs are reduced in the results by model tuning. Generally, the models trained in the dry season yielded more accurate results than the ones calibrated in the rainy season. In the last, we noticed that some ASM are flooded, and their spectral signatures are related to the muddy water. Moreover, small artificial reservoirs are quite common in the study area due to the scarcity of freshwater during the dry season. An analysis of the predictions indicates that, in the rainy season, most of the FPs are related to small artificial reservoirs, while most of the FNs concern the flooded ASMs. ADSMS U-Net outperformed the normal U-Net model in both seasons, achieving the best result in the dry period, with the higher precision (72.30%), F1-score (64.49%), and IOU (47.59%). The explanation for this result can be found in the model architecture, along with the Focal Tversky loss function, which has been designed to tackle highly imbalanced segmentation tasks. However, the classical U-Net architecture empowered by the same loss function achieves competitive results, showing in the best scenario (dry season) an F1-score of 62.06%.
The current process is built in Python, with ArcMap handling GIS processing and TensorFlow [28] handling DL algorithms and functions. All experiments were executed on a Mac-OS operating system laptop with a 2.2-GHz Intel Core i7 with six cores, a 256-GB SSD, and a RAM of 16 Gb.

V. CONCLUSION
In this letter, we introduce an approach to map ASM in a challenging environment, and we evaluate, compare, and discuss the prediction performances of the proposed architecture with that of the well-known U-Net. Moreover, we investigate mapping performances in both rainy and dry seasons from the perspective of two state-of-the-art segmentation models. We willfully chose a challenging area for ASM mapping to create an adaptable method reliable in extreme detection conditions. At the same time, the approach may be easily scaled to less challenging environments, such as highly vegetated areas. This research shows one of the first published multispectral CNN model for ASM mapping and the first CNN-based mapping of ASM in the challenging semi-desertic environment. Furthermore, to the best of our knowledge, this is the first research, in which the ADSMS U-Net model is used in combination with EO data. The findings show that the proposed approach can detect ASM in semi-desertic areas starting with a few training samples at low cost in terms of both human and financial resources and may help to design and implement management strategies for ASM formalization as well as to plan environmental remediation interventions.