A Meta-Analysis of Convolutional Neural Networks for Remote Sensing Applications

Since the rise of deep learning in the past few years, convolutional neural networks (CNNs) have quickly found their place within the remote sensing (RS) community. As a result, they have transitioned away from other machine learning techniques, achieving unprecedented improvements in many specific RS applications. This article presents a meta-analysis of 416 peer-reviewed journal articles, summarizes CNN advancements, and its current status under RS applications. The review process includes a statistical and descriptive analysis of a database comprised of 23 fields, including: 1) general characteristics, such as various applications, study objectives, sensors, and data types, and 2) algorithm specifications, such as different types of CNN models, parameter settings, and reported accuracies. This review begins with a comprehensive survey of the relevant articles without considering any specific criteria to give readers an idea of general trends, and then investigates CNNs within different RS applications to provide specific directions for the researchers. Finally, a conclusion summarizes potentialities, critical issues, and challenges related to the observed trends.

the rising diversity of objectives capable of being resolved, the use of ML in RS applications is expected to increase [4].
ML methods in the context of RS cover a vast range of applications, including land use and land cover (LULC) classification, change detection, object detection, feature selection, and extraction, etc. [2]. LULC, in particular, benefits from advancements in ML methods [5]. The growing number of satellite platforms with various revisit times has increased the ability to capture nature accurately, and human-made changes to the Earth's surface [6], and ML methods have increasingly been used to address related change detection problems [7]. Similarly, the development of high spatial resolution instruments installed on airborne and spaceborne platforms has resulted in an increase in applications of ML for special object detection [8]. ML also plays a vital role in dimensionality reduction [9] of hyperspectral images composed of many essential features for several scientific applications [10]. Other roles ML plays in RS applications include spectral unmixing, regression, image fusion, etc. [1]. Within each category, several ML algorithms have been introduced based on sensor type, study objective, ancillary data, and limitations such as spatial resolution and training sample size [11]. The applicability and effectiveness of these algorithms have been demonstrated in many geosciences and RS tasks [12], [13].
The most commonly used ML algorithms in RS are artificial neural networks (ANN), support vector machines (SVM), decision trees (DT), and ensemble methods, such as random forest (RF) [14]. Each of these methods carries specific advantages. For example, SVM can best tackle high-dimensionality problems and limited training data [15], while RF does not require the fine-tuning of a large number of hyperparameters and can easily be used for both simple and complex computations [16], [17]. These two methods share the advantage of lower computational complexity and higher interpretability capabilities [14].
Over the past few years, however, there has been an ongoing shift toward using deep learning (DL) methods in ML applications [18]. DL, which is characterized by neural networks (NN), is the fastest-growing trend in big RS data analysis and is regarded as a breakthrough technology [19]- [22]. DL has been used in many areas of research, such as speech recognition [23], stereo vision [24], biomedicine [25], time-series analysis [26], agriculture [27], and medical image recognition [28]. Although DL has the disadvantages of: 1) being a "black box" naturally, which mitigates its interpretability, and 2) requiring greater amounts of training samples compared to other ML methods, it This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ has become a hotspot in the realm of ML and has been approved by many researchers in the geoscience and RS community [29].
To date, several DL architectures have been introduced, of which the stacked autoencoder (SAE), convolutional NN (CNN), generative adversarial network (GAN), deep belief network (DBN), and recurrent NN (RNN) have become mainstream [30]. Of these DL networks, CNN is the most popular and the most published [31]. Along with the development of DL methods, CNNs have emerged as an incredibly powerful tool by providing both remarkable performances in image processing and the ability to work in a wide variety of applications in the vision community [32]. In the past few years, biologically inspired CNNs have emerged and proven to be effective in a diverse range of fields to which image processing is fundamental, from social media [33] to precision medicine [34] and robotics [35].
A particularly beneficial characteristic of CNNs is data processing in multiple arrays and automatic feature extraction ability, which has received acknowledgment in the RS community [18], [36]. Moreover, the inherent characteristics of CNNs, such as local connectivity and weight sharing, allow this DL method to tackle the drawbacks of artificial feature extraction by considering the 2-D structures and reducing network parameters using convolutional filters [32]. CNN-based approaches have benefited from the recent exponential increase in RS technologies that includes various image types (optical, RADAR, temperature and microwave radiometer, altimeter, etc.) with complex characteristics (high dimensionality, multiple scales, and nonstationary) [37].
CNNs are composed of a set of blocks that make them particularly suitable for image analysis. The multiple layers of operations, such as convolution, pooling, and nonlinear activation functions, allow for the hierarchical extraction of high-level abstract features [31], [38], [39]. Therefore, CNNs have been successfully used in image preprocessing, scene classification, pixel-based classification and image segmentation, and object detection [40]- [44]. For example, CNNs have been used in numerous studies to improve image classification results [36], [46]- [48], to extract buildings and nonbuilding regions automatically [49], and to detect areas of build-up [50]. Scarpa et al. [51] proposed and analyzed CNN-based methods to estimate spectral features when optical data are missing. In another example, a CNN regression was proposed to develop a model applicable to hyperspectral imagery for estimation of concentrations of phycocyanin and chlorophyll-a [52]. CNNs also have been used in OpenStreetMap data quality assessment [53], oil spill segmentation [54], ship position detection and direction prediction [55], multimodal RS image registration [56], road extraction [57], and many other areas of study [58]- [60].
Several related review papers have been published due to DL methods' significant performance over other state-of-the-art methods in RS. For example, a review by Ball et al. [31] focused on the theories, tools, and challenges of using DL algorithms in the RS community. In another review, Zhang et al. [45] and Zhu et al. [19] summarized recent advances in DL methods and discussed related challenges in RS applications. Following the explosive growth of new DL methods in different RS applications and their striking achievements, some review papers have focused on studying the task-based reports of DL methods [61]- [65]. Liu et al. [66] presented a systematic review of the application of DL techniques in the field of pixel-based image fusion. A recent comprehensive review by Tsagkatakis et al. [30] was conducted on RS image enhancement, including super-resolution, denoising, restoration, pan-sharpening, and fusion.
Most of the existing review manuscripts covering major DL concepts related to RS applications consider all DL architectures. Early efforts reviewing CNNs generally were performed by Rawat and Wang [67] in which the authors focused on the application of CNNs in image classification tasks and debated their rapid advancement in recent years and the contribution CNNs had made to DL developments. Many other articles provide readers with a summary of the CNNs' basic concepts in different applications such as radiology [68], biology [69], and action recognition [70].
The majority of the CNN review papers are descriptive, often with no quantitative assessment, and tend to focus on applications other than RS. Accordingly, this study's main objective is to describe and discuss the RS-based applications of CNN through a meta-analysis of published papers and to provide RS experts with a "big picture" summary of current research in this field. As a whole, the contributions of this article are: 1) whereas almost all the review papers in RS applications cover all related DL network structures, this article reviews the publications dedicated to the use of CNNs for RS applications alone and summarizes hotspots based on the paper frequency and accuracy. Moreover, it discusses trends and specific setups for different subtasks, and 2) this review defines a complete framework for professionals and even nonexperts, outlining ongoing research and the architectures that receive the most attention in each application.
To fulfill the proposed meta-analysis task and to construct a database of case studies, more than 400 peer-reviewed articles are reviewed, and many other papers are cited. To create a context for what follows, we first summarize the performed systematic literature search query using the preferred reporting items for systematic reviews and meta-analyses (PRISMA) in Section II. After presenting the general characteristics of CNNs in Section III, the application of CNNs in different study objective is discussed in Section IV. Finally, in Section V, concluding remarks are presented.

II. METHODS
A systematic literature search query was performed using the WoS to identify relevant articles for this comprehensive review. The WoS is one of the biggest bibliographic databases covering scholarly literature from approximately any discipline. Notably, the PRISMA methodology was followed for study selection [71]. After some trials, a title/abstract/keyword search was performed in WoS using a search query of: "convolutional neural network * " OR "CNN" OR "FCNN" OR "fully convolutional neural network * " OR "deep learning" for the title and "Remote sensing" for abstract/keyword, checking to include papers that used data from the most common RS platforms (search date: June 12, 2020). This research resulted in 1038 papers, which served as the basis for further paper surveys.
Of the 1038 initial number of studies, 664 papers were related to the peer-reviewed journals, and the remaining were majority proceedings papers. When investigating the 664 Journal papers in detail and after eligibility assessment, 248 papers were determined unrelated to this meta-analysis and removed. The journals with more than five papers are listed in Table I.
The 416 eligible papers were included in the meta-analysis and described using the following-title, year, journal, citation, first author's institution's country, RS data, study type, application, CNN model, processing unit, training sample, area of the data, spatial resolution, geographical coverage, framework, learning strategy, number of layers, dataset, CPU/GPU run, processing time, convolutional kernel dimension, accuracy, and accuracy metric. A summary of the literature search is demonstrated in Fig. 1.

A. General Characteristics of Studies
There has been a steep upward trend in the use of CNNs from 2014, the point at which the first RS-related application of CNNs was introduced [72] (see Fig. 2). The exponential trend of annual publication frequency peaks in 2019 and includes more than one-third of the database articles. The expansion in the use of CNNs continued in the current year (2020). In the first half of 2020, the number of papers published exceeds the number of published papers for the equivalent period in 2019.
Studies were conducted in 34 different countries and regions from six continents, the majority of which are based in Asia (72%), Europe (15%), and North America (9%). Among the different countries, most of the contributions were carried out    in China, with about 63% of total studies, followed by the USA (7%) and Germany (4%). Further analysis revealed that only six countries had published more than ten papers and 13 countries contributed only one paper. Fig. 4 demonstrates the usage frequency of CNNs in a vast range of applications. Of the reviewed 416 articles, LULC was the most frequented application (with about 155 studies), followed by object detection, scene classification, and urban studies with 68, 32, and 25 studies, respectively. The remaining articles include specific applications in crop- (25), disaster- (14), cloud-(12), tree-(10), forestry-(6), and water-(5) related research. Some other applications also benefited from CNNs, clustered in Fig. 3 as "other" as the number of associated articles was fewer than five, including sea ice, agriculture, and wetland mapping (see Fig. 3).

B. Sensors and Data Types
The first published RS-related work using CNNs occurred in 2014 for vehicle detection using multispectral satellite images [72]. Since then, CNNs have been applied in research using numerous RS data types (see Fig. 4). The largest share of this research has used multispectral satellite images, making up 51% of the database, mainly Landsat archives, Worldview-4, and Quickbird-2 imageries. Since early 2016, CNNs have been used for hyperspectral data analysis in about 12% of the studies. Through the development of CNN architectures and their achievements in research using different types of data, by 2017, CNNs were increasingly used to analyze other data types, including aerial (16%), unmanned aerial vehicle (UAV) (6%), RADAR (5%), and light detection and ranging (LiDAR) (2%). Almost 7% of the studies have also used a combination of different data types.

C. CNN Frameworks and Models
The DL community's framework and library development is highly dynamic and offers different possibilities to speed up the training process with interactive interfaces [73]. A graphical representation of the most used frameworks and libraries and their annual publication frequency in CNN studies is shown in A detailed search through the literature revealed that TensorFlow is the most significant CNN implementation source, with about 43% of total usages. TensorFlow is a free and open-source end-to-end DL framework for numerical computations using data-flow graphs [74]. It is designed to be highly portable, running on various platform scales, from a single CPU to a GPU or GPUs cluster [75].
Built on top of TensorFlow, Keras library has been employed for CNN implementation as well. Keras, which supports almost all models of CNNs, was executed on both CPU and GPU. The second most used framework is Caffe (22% of studies). Caffe is well suited for machine vision and forecasting applications, which permits a network with sophisticated configurations [76]. Caffe's specific properties include suitability for image processing tasks with CNNs, accessibility to pretrained networks, and easy coding on Python and MATLAB [73], [77]. MATLAB, the third most used platform for CNN implementation, was utilized in almost 19% of the studies. MATLAB's beneficial characteristics are its simplicity, especially for practitioners, various visualizing tools, and the capability of deploying models on a variety of servers and devices [78]. The remaining libraries include Torch (10%), Theano (5%), and MXNET (1%). A comparison of the frameworks' annual usage frequency shows that TensorFlow and Torch have the steepest growth rates since they were introduced for CNN-based RS applications, followed by MATLAB with a lower rate of increase. On the other hand, Theano reached its peak in 2018, falling in numbers with no cases thus far in 2020. Caffe, which was used first in 2015, experienced a steep increasing number of usages in 2017 and afterward held its position in CNN implementations.

IV. CNNS FOR DIFFERENT STUDY OBJECTIVES
A survey among RS-related studies revealed that CNNs could be applied to almost any significant RS task, making them a promising option for handling various problems. Fig. 7 shows that 54% of the reviewed papers (225 papers) are devoted to classification problems, including LULC and scene classification studies. CNNs have been used in several other study objectives, including object detection (23%), image segmentation (7%), data fusion (4%), image super-resolution (3%), image matching (2%), image correction (2%), and regression (1%). Recently, CNNs have been applied to other study objectives, such as image retrieval, prediction, quality assessment, and unmixing, making up about 4% of the studies.
Following the review of the publications in different research domains, an in-depth review of different study types and their respective findings are provided in the following sections. Based on the proximity of the research domains and their paper frequency, papers were categorized into three distinctive groups (i.e., image classification, object detection and segmentation, and others).

A. Classification
The number of classification-based studies focusing on different applications (the number of papers is shown in the parenthesis) alongside the statistical analyses reported by overall accuracy (OA) is shown in Fig. 8. Most of the classification tasks focused on LULC, scene classification, and crop-, urban-, cloud-, and disaster-classification, with about 52%, 14%, 12%, 5%, 5%, and 4%, respectively. Other classification tasks, including agriculture, forestry, wetland, and sea ice, comprise about 8% of the total cases but are not shown in Fig. 8. The classification accuracy assessment shows the maximum average accuracy and the lowest variability for cloud studies, with almost 95.5%, followed by scene classification with about 95.2%. Urban studies have the lowest average accuracy among the applications, with an average OA of 90%.
As mentioned earlier, many CNN classification studies are devoted to LULC applications, potentially because of the extensive scope of relevant datasets available for training the networks. The papers in this group mostly focused on hyperspectral image classification by using benchmark datasets, including Salinas [airborne visible/infrared imaging spectrometer (AVIRIS) sensor], Pavia University [reflective optics system imaging spectrometer (ROSIS-03) sensor], and Indian Pines (AVIRIS sensor) [88]. The second most studied classification task is scene classification, which is generally defined as a procedure to categorize a specific scene theme, e.g., a part of a forest, an agricultural landscape, a river, etc. [89]. A majority of these studies apply high-resolution RS images because of the availability of many large-scale high-resolution datasets in recent years [90], [91]. In scene classification, the most commonly used datasets are the UC-Merced dataset [92], the aerial image data set (AID) [93], and NWPU-RESISC 45 (RS image scene classification) datasets [94].
A detailed survey of the article database shows that about 46% of classification studies used spaceborne multispectral RS images for classification tasks (see Fig. 9). The most frequent multispectral satellite data sources are Landsat 8, Gaofen 1-2, and Sentinel 2. The least used data type is related to LiDAR datasets, making up about 2% of the whole database. The remaining types of RS data are aerial images (18%), hyperspectral (13%), UAV (7%), multidata (7%), and RADAR (6%). For the aerial images, the Vaihingen Semantic Labelling dataset [95] and Potsdam Semantic Labelling dataset [96] were the most used datasets, and for hyperspectral images, the most used  datasets were the Salinas, Indian Pines, and Pavia University datasets. In recent years, UAV data have been deployed in some CNN classification tasks, and it is expected that their usage will increase because the cost of UAV has lowered in recent years [97]. The capability of automatic data acquisition using UAV has made them a convenient tool for some classification tasks such as geological mapping [97], crop yield prediction [98], and wetland mapping [99]. Integration of different RS data types (i.e., multidata) is beneficial for different tasks such as tree species diversity mapping using LiDAR and high-resolution multispectral images [100], soybean yield prediction by fusion of weather data and MODIS products [101], and coastal land cover classification by integration of optical and RADAR satellite images [102].
A statistical analysis of different data types showed that using multidata resulted in maximum average OA (96.2%) and lower variability. In the case of single-data research, the mean classification accuracy of hyperspectral datasets is the highest at 96%, followed by UAV (94.20%), multispectral (93.58%), aerial (93.48%), RADAR (92.75%), and LiDAR (90.67%). Fig. 10 shows the average obtained accuracy for CNN classification based on the spatial resolution of the remotely sensed image dataset and their respective number of published papers (the number of papers is written in the parenthesis). The papers were categorized based on the spatial resolution into very high (<1 m), high (between 1 and 5 m), and medium and low resolution (>5 m).
Data with a spatial resolution of <5 m (very high and high) were used in 63% of the publications, mainly composed of studies using 1 m resolution. Based on this analysis, it can be concluded that there is an agreement between the mean accuracy of the classification and the spatial resolution. The mean accuracy for very high-and high-resolution datasets is 90.10% and 92.48%, respectively. The maximum mean accuracy is related to datasets with spatial resolutions > 5 m, with about 93.82% mean OA.
A survey among the usage frequency of different CNN models for classification tasks revealed that VGG variants were the most frequented backbones, followed by ResNet-50 and AlexNet. In about 52% of the studies, stochastic gradient descent (SGD) was used for parameter optimization of the CNN models [89], [103], while the remaining studies used adaptive moment estimation (Adam) optimizer [104]. Moreover, the training process was conducted over 100 epochs in about 68% of the cases.

B. Object Detection and Image Segmentation
As shown in Fig. 7, about 30% of CNN's RS studies involve image segmentation and object detection tasks. A close inspection of the usage frequency of different sensor types in image segmentation-based studies shows that the largest share use multispectral satellite images (60%), aerial (18%), and multidata (8%). The other articles were devoted to UAV, RADAR, LiDAR, and panchromatic images with almost 7%, 4%, 2%, and 1%, respectively. Open datasets and their respective applications significantly influenced these topics by providing information related to various land covers. A survey among the article database showed that two publicly available datasets, including Potsdam Semantic Labelling [96] and NWPU VHR-10 [105], established themselves as baseline datasets for CNN-based image segmentation and object detection. They are followed by DOTA (dataset for object detection in aerial images) [105], UC-Merced [92], and the Massachusetts buildings and roads datasets [106]. However, to investigate method development in large-scale research areas, some studies used custom spaceborne datasets. Google Earth was the most employed spaceborne data source, followed by Gaofen 1-2, Quickbird-2, and Worldview-4. Of all the reviewed papers for object detection and segmentation tasks, 68% of the CNN models are applied to analyze datasets with 1 m or finer spatial resolutions. SGD was used as an optimizer in about 75% of the studies, while Adam optimizer was used in the remaining 25%. In 72% of the cases, the number of epochs was set less than 100 iterations, and in the remaining cases (i.e., 28%), the number of epochs was more than 100.
An overview of the popular designs in the publications showed a focus on region-based CNN (R-CNN)-inspired architectures [107] and its series of improvements, including fast R-CNN [108], faster R-CNN [109], and mask R-CNN [110]. The most frequently used model, representing 39% of studies, was related to land cover mapping, with other categorial applications, including agriculture (15%), urban (11%), forest (10%), wetland (12%), disaster (3%), and soil (2%). The remaining applications, comprising about 8% of the case studies, mainly consist of mining area classification, water mapping, benthic habitat, rock types, and geology mapping.
Concerning the different architectures designed for image segmentation and object detection, the VGG variants have been the most used backbone models (34%), followed by the ResNet family (30%). The Inception, SegNet, LeNet, and GoogleNet backbones were much less commonly used.

C. Other Applications
Along with the above-mentioned conventional applications in RS, CNNs have also been applied in other research areas, such as data fusion, super-resolution, change detection, image registration, etc. Because of their specific capability of feature extraction and learning, CNNs demonstrate an outstanding possibility to delineate the relationship between different data, which has been used in panchromatic/multispectral data fusion applications [111]- [113]. The first papers that were introduced in this category were motivated by the impressive performance of CNNs in a large number of closely related super-resolution problems [113], [114]. A comprehensive overview of the article database's fusion studies shows a trend of using pixel-based processing units with residual learning strategy with SGD optimizers mostly implemented in MATLAB [115].
Super-resolution, which aims to enhance spatial resolution, is an ongoing research topic in computer vision and RS [114]. The latest super-resolution trend focused on example (learning)based techniques, including a training phase between lowresolution and high-resolution pairs of images [116]. Examplebased techniques have seen enhanced accuracies by the introduction of CNNs to generic super-resolution problems [117]. However, RS imageries exhibit a different level of complexity than images in other fields such as computer vision, which delayed the use of CNNs in RS image super-resolution until 2018 by introducing a specific super-resolution CNN architecture to adapt with multispectral satellite imagery [116]. An overview of the related papers shows that all the CNN models were 2-D structured, in which Adam and SGD equally were used for parameter optimizations with the epoch numbers ranging from 80 to 600.
In recent years, DL methods have been successfully applied in natural image change detection-based applications [118]. Previously, different DL-based methods have been applied to various change detection tasks, such as urban dynamics [119], LULC applications [120], or landslides [121]. First, CNN models were employed for high-resolution remotely sensed image change detection in 2018 using faster R-CNN [119] and have gained attention since then. With a review of the database, it is observed that most of the studies devoted to change detection applications applied a range of data types, from spaceborne and airborne optical to RADAR images.
Image registration is a fundamental part of many RS tasks, such as image fusion and change detection [122], [123]. Like the first study, in 2018, Ye et al. [122] fine-tuned the VGG-16 model using custom RS data to obtain deep CNN features to build an automatic registration algorithm. CNNs showed powerful performance in the registration of RGB and infrared [124], SAR [125], multimodal [56], and aerial RS images [126].
Another active area that uses CNNs is image correction, which includes categories such as image denoising [127], image reconstruction [128], and image compensation [129]. CNNs were introduced to this field in 2017 to build a nonparametric color-correcting scheme for multispectral images [130] and the year after that for removing haze from RS images [131]. Overall, eight studies in the article database used CNN models for correction purposes, of which most of them were applied to multispectral satellite images.
Besides the above applications of CNNs in RS image analysis, CNNs have also been applied in other areas, including regression [52], image retrieval [132], prediction [133], quality assessment [134], and hyperspectral unmixing [135]. CNN models have achieved outstanding performances in each of these applications by presenting a novel way to solve them. Considering the highaccuracy of CNN, it is expected that they will continue to find their use in other research areas.

V. CONCLUSION AND PROSPECTS
This review presents a comprehensive review of CNNs in RS data analysis. It summarizes their progression and advancement since their emergence in the RS field in terms of general characteristics and technical specifications. Based on the detailed analysis of the article database, we come to the following conclusion.
1) There is an increasing trend of using CNNs in RS applications beginning in 2014, and since then, their use has expanded into many new research areas.
2) The use of CNNs in the context of RS started through its employment to analyze multispectral images. As CNNs were increasingly used for efficient problem solving based on different data types and platforms, researchers began to incorporate CNNs in their projects using other data types such as LiDAR and RADAR.

3) Advances in new and freely open-source frameworks and
libraries with highly dynamic interfaces allowed the RS community to research new study objectives. The survey among different frameworks identifies TensorFlow as the most used framework. Based on the yearly usage frequency, it is expected to hold its position in the coming years. 4) A survey among the CNNs' parameters shows that SGD and Adam are the most frequented optimizers, and, in most cases, the number of epochs was set more than 100 iterations. 5) Classification tasks using various data types and sensors focus on most of the studies (54%) using CNNs. Classification results have shown to be better when using multisource data and using images with a spatial resolution of more than 5 m. As classification tasks require large quantities of training samples, researchers tackled this problem by improving their training dataset's efficiency using a transfer learning strategy. For this aim, most of the case studies used VGG variants. 6) We could not further analyze and summarize the processing time because it was neither available nor specified if the entire time is for optimizing metaparameters or not. However, in contrast to the general belief, numerous cases report a training time of less than 1 h. It is generally true that deep networks need considerably more processing time for training (though the testing/simulation process is generally quick). However, with continuous increases in processing power, deep networks are readily usable, particularly by incorporating both CPUs and GPUs. It would be interesting to evaluate the time saved by using pretrained networks and fine-tuning them, but currently, there were no statistics reported to extract conclusive information. 7) CNNs have been typically utilized for data with dimensions up to three which is common for RS data. However, 1-D CNNs have shown promising results in discovering intricate patterns in high-dimensional data, especially for sequential data. This capability of extracting meaningful information from the high-dimensional vector data has indicated 1-D CNNs as promising alternatives to conventional methods in some regression problems. 8) The majority of studies focused more on the architecture designs and few studies elaborated on the time efficiency while training networks. By the growth of big RS data and their applications in practical productions, which require much more time to train rather than typical research, much more attention is required to develop time-efficient networks that meet the practical projects' requirements. However, this problem has been partly solved by using public online free or commercial cloud computing platforms, such as Google, Amazon, Microsoft, and IBM. Cloud computing can be used for the development of computationally intensive CNNs by providing high speed and flexible facilities that can handle huge amount of data. Based on the 416 reviewed articles in this survey, it is evident that CNNs have pervaded every aspect of RS image analysis. This has happened very fast as over 96% of the contributions, a total of 400 papers, were published starting from 2017. For example, the first application of CNNs in object detection studies happened in 2014, whereas they have been used for image registration tasks in 2018. However, the growth rate of applying CNNs to different tasks and data types is challenged by the lack of large training datasets. Although CNNs can be considered newly introduced algorithms in RS, they are now clearly among the top performers in most RS applications. Despite this progress, the study of CNN-based approaches is currently at its beginning stages, and there is still much potential for new developments, particularly in applications such as hyperspectral unmixing, image retrieval, and image quality assessment. Another striking conclusion is that a few studies are conducted in new application areas. As a result, there is a gap in examining different aspects of CNNs. Therefore, in order to get the best results, researchers should consider investigating new CNN architectures. In this perspective, the design of new network architectures for specific tasks, the generation of large-scale datasets for network training, the integration of conventional techniques according to the RS data, the advancement and analysis of existing networks concerning their architectures, optimization techniques, and the regularization strategies are still open topics which are in close relation with each other and should be jointly considered.