Adaptive Granulation-Based Convolutional Neural Networks With Single Pass Learning for Remote Sensing Image Classification

Convolutional neural networks (CNNs) with the characteristics like spatial filtering, feed-forward mechanism, and back propagation-based learning are being widely used recently for remote sensing (RS) image classification. The fixed architecture of CNN with a large number of network parameters is managed by learning through a number of iterations, and, thereby increasing the computational burden. To deal with this issue, an adaptive granulation-based CNN (AGCNN) model is proposed in the present study. AGCNN works in the framework of fuzzy set theoretic data granulation and adaptive learning by upgrading the network architecture to accommodate the information of new samples, and avoids iterative training, unlike conventional CNN. Here, granulation is done both on the 2-D input image and its 1-D representative feature vector output, as obtained after a series of convolution and pooling layers. While the class-dependent fuzzy granulation on input image space exploits more domain knowledge for uncertainty modeling, rough set theoretic reducts computed on them select only the relevant features for input to CNN. During classification of unknown patterns, a new principle of roughness-minimization with weighted membership is adopted on overlapping granules to deal with the ambiguous cases. All these together improve the classification accuracy of AGCNN, while reducing the computational time significantly. The superiority of AGCNN over some state-of-the-art models in terms of different performance metrics is demonstrated for hyperspectral and multispectral images both quantitatively and visually.


I. INTRODUCTION
C ONVOLUTIONAL neural network (CNN) is one of the most extensively used deep neural network (DNN) models for remote sensing image classification and it can process multiple arrays of data, such as multiband remote sensing images with regularly arranged pixels. CNN consists of mainly three specific hierarchically connected structures like convolutional layer, pooling layer, and fully connected neural network. The series of convolutional and pooling layers in the CNN operate on the input image to obtain the informative features of the objects in the input image. These informative features are used to classify the objects using multiple fully connected neural networks. CNN can extract informative features of ill-defined objects in scenelevel/pixel-level remote sensing images [1]. This characteristic of CNN motivated researchers to use it in various remote sensing image analysis like image registration [2], image restoration [3], image fusion [4], image segmentation [5], change detection [6], and land use/land cover classification [7], [8]. Hong et al. [9] used CNN with multimodality learning (MML) to classify the objects in complex scene images. The model used pixelwise classification and spatial information modeling in the said task. Hong et al. [10], presented a mini graph-based CNN to classify the objects in the hyperspectral remote sensing images. The model is computationally faster than the conventional graph-based CNN and it can classify unknown samples without retraining the network and improve the classification performance. Kumar [11] proposed a knowledge encoded CNN model for multispectral remote sensing image classification. Morphological operators were used in the convolutional layers of this model to obtain informative features of the objects. Wu et al. [12] presented a cross-channel reconstruction module-based CNN to classify the objects in hyperspectral and synthetic aperture radar (SAR) images. The model used reconstruction strategy across modalities to learn the features of the objects in the image. Hong et al. [13] developed a shared and specific feature learning-based CNN model to classify the objects in multispectral, hyperspectral, and SAR images. The model is used to decompose multimodal RS data to enable the information blending for heterogeneous data sources. Li et al. [14], proposed a transfer learning-based deep CNN for object classification in high resolution remote sensing images. The model works on the principle of transferring the weight parameters of convolution layers during the learning process to attain better performance. Li et al. [15] used 3D-CNN to classify the objects in high resolution hyperspectral remote sensing images. The concepts like principal component analysis and autoencoder are used in 3D-CNN to deal with the higher dimensionality of the data.
While the performance of the CNN in remote sensing image classification is appreciated, it has limitations like, the model directly operates on the raw images with uncertainty in the pixels that may lead to misclassification of the object. Further, a large number of network parameters in convolutional layers, and a fully connected neural network, and multiple training iterations may lead to overfitting and an increase in computational complexity/time [16].
A. Related Work 1) Uncertainty in the Pixels: The pixels in the raw images have uncertainty and it arises due to the beloningness of a pixel to more than one classes in the region of interest. The uncertainty in the pixels of image occurs due to its acquisition in different spatial, spectral, radiometric, and temporal resolutions [17]. Classifying these raw images without addressing the uncertainty in the pixels would lead to misclassification of the objects and it affects the overall performance of the CNN. This specifies the necessity of addressing the uncertainty in the pixels of remote sensing images. The concepts of fuzzy sets are used to address the uncertainty in the pixels of images. Pal and Mitra [18], suggested class independent granulation (CI) to represent the uncertainty in the pixels using π-type membership function and these membership values are further processed using a fully connected NN. This method is used in many remote sensing image analysis approaches like [19], [20], [21], [22], [23], [24], and [25]. Bastin et al. [26] used fuzzy spectral signature to represent the uncertainty in the pixels of Landsat images. This method used multilayered stacks of membership images for easy visualization of uncertainty in the pixels. Dungan et al. [27] used probability distribution to represent the uncertainty in the pixels of the input image and it was visualized using interactive maps of first-, second-, and third-order statistics. While Zhong et al. [28] used adaptive mimetic fuzzy clustering algorithm and Wu et al. [29] used fuzzy local C-means clustering-based method to deal with the uncertainty of pixels in remote sensing images, Chen et al. [30] applied fuzzy rough set-based adaptive genetic algorithmic method to address this task. These fuzzy-set-based methods use class independent approaches to handle uncertainty in the pixels of input image. These methods do not consider the class-belonging information of the pixels and can result in misclassification of objects in the image. The methods in these studies are used to represent the uncertainty in the pixels of input images in pixel-level classification of multispectral remote sensing images.
In the recent years, advancement in the imaging technology produced images with finer spatial resolution. The objects in these images are classified using patch-based classification methodologies, unlike pixel-level methods. Also, the uncertainty in the pixels of hyperspectral image patches is complex to address due to the finer spatial resolution of images. Recently, a few methods are suggested to deal with the uncertainty in the pixels of images in the patch-based hyperspectral remote sensing image classification. Zhao et al. [31] used autoencoder-based spectral unmixing methods to handle uncertainty in the pixels of hyperspectral remote sensing images. Further, a CNN-based autoencoder network is used to classify these hyperspectral remote sensing images. Li et al. [32] used pixel-by-pixel clustering framework to represent the uncertainty in the pixels of hyperspectral remote sensing images. The model used deep CNN architecture with large number of weight parameters to classify hyperspectral remote sensing images. In the present study, we use class-dependent granulation (CD) [33] for the first time to deal with the uncertainty in the pixels of image patches for the patch-level hyperspectral/multispectral remote sensing image classification. It is used in pixel-based classification of multispectral remote sensing images in the studies [34], [35], [24], and [25]. In CD method, a pixel in the input image is represented with its fuzzy membership to the classes in the dataset [33]. This increases the dimensionality of input image and, thereby increases the computational complexity of the model. In such situations, rough set-based theoretic approach [36] is used to select the informative features out of them by eliminating redundant ones.
2) Adaptive Granulation, Single-Pass Learning, and Roughness of Granules: The image patches in the dataset has diverse features of objects within the class. To learn these features of objects in the image patches, a large number of weight parameters/filter coefficients are used in CNN and this increases its computational time. Recently, the concept of data granulation is used to group the features of the objects (named as granules) in the image patches. These granules are processed using CNN and it is named as granular computing. Granular computing is inspired by human computing in which the computations are performed on the information granules rather than individual samples/patterns. A granule is a group of elements/objects with similar characteristics [37], [38]. The concepts of granulation are applied in data analysis [39], [40], [41], [42], [43]. Zeng et al. [44] suggested a multilevel data granulation-based CNN model for higher resolution remote sensing patch images. The model used granulation to capture the deep spatial and spectral information of objects in the image patches and this improved its performance. Jean et al. [45] used data granulation in the hyperspectral RS image patches. The model used shape/size features of the objects in the image patch to select the less number of filter coefficients in the convolutional layer of CNN and it could generate better performance in less number of iterations. These models used the concept of granulation to consider the information of diverse features of objects in the image patches and select the less number of filter coefficients in convolutional layers of CNN. The granulation-based CNN models produced better performance within less time, unlike conventional CNN.
Recently, Leite et al. [46], [47], introduced data granulation and adaptive learning in a neural network in classifying realtime data, and named as adaptive granular neural network. The work was motivated by adaptive neural networks, suggested by Alexandridis et al. [48] and Palnitkar et al. [49]. The adaptive granular neural network learns the diversified features of the objects in the image patches by updating the fuzzy membership of granules in a single training iteration called single pass. These characteristics of adaptive granular neural network make easy it to learn the features within a few seconds and result in better performance, unlike conventional neural network. This concept is used for remote sensing image classification in the studies like [50], [51], [52], and [53]. While the adaptive granular neural network model is credited for learning the features in the images within a single pass, it has two limitations to be addressed. 1) In the testing stage of adaptive granular neural network, it assigns the unknown sample to a class to which the sample has maximum membership. This process works well with the datasets having equal number of classes. In real life situations, a dataset is likely to have unequal number of samples in its classes. In such situations, considering only the maximum membership of a sample to classify it, will not improve the overall accuracy of model. 2) In case of dataset having overlapping classes, an unknown feature vector may have maximum-membership value to more than one class. This makes the decision-making indecisive in adaptive granular neural network. In our present study, these two limitations are addressed by considering the concepts of weighted class-membership and roughness of overlapping granules in decision-making. We introduce a CNN-based deep architecture with eleven layers involving fuzzy adaptive granulation for hyperspectral/multispectral patch-based image classification. The model possesses the advantages of CD granulation of input image with reduced features, and dynamic granulation that evolves automatically to adapt input features of objects in the image patch. The system achieves better classification accuracy in much less time. We name it an adaptive granulation-based convolutional neural network (AGCNN).

B. Novelty
The novelty of the proposed model and contribution is described as follows.
1) Developing a fuzzy adaptive granulation-based deep network architecture requiring a smaller number of weight parameters and only single pass learning. 2) Incorporating the weighted class-membership,and roughness measure on overlapping granules for handling ambiguous patterns during the decision-making process, where the concept of using roughness measure is unique. 3) Using CD-granulation on input images, followed by computation of rough reducts, to consider only the relevant class-based information of pixels as input features. 4) Demonstrating the superiority of the proposed model over many state-of-the art deep models both quantitatively and visually.

C. Problems Addressed
The concepts like CD-granulation, adaptive granulation, and roughness of overlapping granules used in AGCNN model, address three major issues in remote sensing image classification.
1) Granulation is useful in modeling the indiscernibility in pixels for further processing. In our work, CD-granulation based on class label information handles better the uncertainty in indiscernibility in the pixels of hyperspectral image patches. 2) Adaptive granulation is used to derive salient information from the hyperspectral image patches. Use of adaptive granulation on the image patches, results in fewer weight parameters in the network architecture and enables single pass learning of the network. These improve the performance with less computational time. 3) The hyperspectral images have the pixels of classes with overlapping characteristic and these pixels often possess maximum membership to more than one class; this makes the model indecisive during classification of such pixels.
To reduce this problem, the roughness of overlapping granules is used in the decision-making process. All these factors make the proposed AGCNN superior in terms of performance and computational time.

II. PROPOSED ADAPTIVE GRANULATION-BASED CNN FOR REMOTE SENSING IMAGE CLASSIFICATION
The functional mechanism of the proposed AGCNN model is divided into five modules, as shown in Fig. 1. 1) In fuzzy granulation layer, the pixels of input image are represented with their membership to the granules (Section II-B). 2) In feature selection layer, informative features are selected from granulated features using rough set theoretic approach (Section II-C).
3) The series of convolution and pooling layers (CLs and PLs) are used to obtain 1-D representative feature vector of objects in the image. 4) The feature vectors are used to build the architecture of adaptive granulation-based network. 5) Weighted membership of feature vector and roughness measure on granules are used to label the unknown test images. The layers of AGCNN are shown in Fig. 2.
In Fig. 2, eight layers (L 1 , L 2 , L 3 ,..., and L 8 ) of AGCNN are given with corresponding output at each layer. L 1 is the input image, L 2 is the fuzzy granulation layer with an output (J), L3 is the feature selection layer with an output (K), L 4 is convolution layer with an output (L), L 5 is pooling layer with output (M ), and L 6 and L 7 are consecutive convolutional and pooling layers. L 8 is (1D) representative feature vector of the input image. The remaining three layers (L 9 , L 10 , and L 11 ) of AGCNN model are given in Fig. 5.
The description of the output at each layer of AGCNN is given in Table I. In Table I, the size of input image (I) is considered as (m, n, p) [Example : (9,9,4)], where m is number of rows, n is number of columns, and p is number of bands in the image. In the fuzzy granulation layer, a pixel in I is represented with its membership to c classes, where c is number of classes. The output of granulation layer (J) has a size (m, n, q), where q = p × c and q is number of granulated features. The output of feature selection layer has a size (m, n, r), where r < q, r is the number of selected features out of q. A series of convolutional and pooling layers are applied to informative features (m, n, r) to  obtain the 1-D representative feature vector (1,1,r) of an image. The 1-D feature vectors generated from the images are used to build the architecture of AG-based network (Fig. 5).

A. Input Image
The input image patch (I) is feed forwarded through the series of layers in AGCNN. The size of I is considered based on the spatial resolution of the image. In case of hyperspectral images, the size of I is considered as 9,9,110 because the image is acquired in finer spatial resolution. In case of multispectral images, the size of I is considered as 20,20,4 and this is because the image is acquired in coarser spatial resolution. The size of input image is selected based on the criteria that finer the spatial resolution of the image is, smaller is the size of input image to CNN and the coarser the resolution of the image is, the larger is the size of input image to the CNN.

B. Granulation Layer
In conventional CNN model, the convolution filters are operated on raw images. In the present study, the convolution filters are operated on the membership values of pixels to the granules. In fuzzy granulation layer, the pixels of input image are represented with their memberships to the granules. There exist two types of pixel granulation methods namely, 1) CI granulation and 2) CD granulation.
1) Class Independent: In CI granulation, a pixel (P x) with p features is represented with its membership to three fuzzy granules, namely, low(l), medium(m), and high(h) [18]. The membership of a pixel is computed using the π-type function.
or else (1) α and i are the radius and center point of fuzzy granule, respectively. The membership of a feature is maximum (1) at the mid point of granule. The membership is 0.5 at the cross-over points and decreases to 0 on either side of the center of granule. The membership of P x to the granules is where μ l (f p ), μ m (f p ), and μ h (f p ) are the memberships of P x to the granules l, m, and h for the feature value f p . The membership of each pixel to three granules is computed using the π-type function.
2) Class Dependent: CI granulation does not consider the class belonging information of a pixel. Because of this, classdependent (CD) granulation was suggested by Pal et al. [33]. In CD granulation, the features of a pixel is represented in terms of their membership to the classes in dataset. The membership of feature f j to a class is computed as where the crossover points of membership are P and Q with membership 0.5. R is the center with maximum membership 1. The values P , Q, and R of a granule are computed as R = mean(f ) (i.e., mean of feature f to a class), and , min(f ) and max(f ) are minimum and maximum values of feature f in class. X and Y are called extreme points of granule, where X = R − (Q − P ) and Y = R + (Q − P ). In CD granulation, a pixel P x is represented with its membership to c classes μ c (f p ) is the membership of P x to class c for the feature value f p .

C. Feature Selection Layer
In feature selection layer, rough set theoretic approach is used to select the informative features from the granulated features. The set of such informative features is called reduct. A reduct is a minimum set of features which can discern the samples in the dataset [36]. Let U is a set of pixels in an image, A is a set of features/bands, A = (B, C), where B is a set of conditional features and C is decision feature. U consists of pixels The upper approximation of set B The maximum value of γ B (C) is 1, indicating that the features in B can discern all the pixels in X using C.

D. Adaptive Granulation-Based Network
In conventional CNN, the feature vector is feed forwarded through the fully connected layers of neural networks with fixed architecture. Furthermore, the output error is computed and the weight parameters of CNN are updated by back-propagating the output error for a number of iterations. This increases the time complexity of the CNN. In the present study, we considered an adaptive granulation-based network with less parameters, and the model can learn only in a single pass of training samples.

1) Architecture of Adaptive Granulation-Based Network:
The architecture of adaptive granulation-based network is implemented in three steps: a) creating rectangular granules in  The feature vectors generated from the training images by passing through layers L 1 , L 2 , L 3 ,..., and L 8 are used as the labeled dataset. During training, a granule is created either with the first feature vector, or with the feature vector of a new class, or if the feature vector does not fit within existing granules. A granule is upgraded if a new feature vector falls within it. The creation/upgradation of the granule is shown in Fig. 3(a) and (b). In Fig. 3(a), xth granule of cth class (g xc ) is created along feature axis F 1 . The parameter σ is the standard deviation of the class of feature vector and it is used to obtain the extent of granule. In Fig. 3(b), g xc is updated to accommodate new feature value f new 1 along feature axis F 1 . An example of granules in F 1 -F 2 feature space is shown in Fig. 4. In Fig. 4, A 1 and A 2 are overlapping (ambiguous) regions among granules g 32 , g 2c , and g 14 . Note that a class can have more than one granule.
The architecture of adaptive granulation-based network is built based on the number of granules (shown in Fig. 5). In Fig. 5, the number of input nodes is equal to the number of features in the feature vector. The number of adaptive layer (AL) nodes is equal to the number of granules generated from feature vectors. The number of output layer (OL) nodes is equal to the number of classes in the dataset. The roughness measure layer RML (optional) is used to measure the ambiguity in the overlapping regions of granules when a feature vector has equal membership to more than one class. 2) Training Adaptive Granulation-Based Network: During training, the membership of a feature vector (P ) to a class (c) is obtained by adding its memberships to the granules of class c where, Y c is the membership of P to class c. Y xc is the membership of P to xth granule of class c. The membership of P to a granule is obtained using trapezoidal membership function. Y 1c , Y 2c ......, and Y xc are the outputs at the adaptive layer nodes. The membership of P to a class (Y c ) and the sum of weight parameters (v c ) of class c are feed forwarded to the output layer where, v 1c , v 2c ,...., and v xc are the weight parameters between the granules of class c and output node O c . The output at node where, e ∈ [0,1], g = e + (1 − e), and S and T are the norm and conorm operators.
A max operator is used to find a node with maximum value in output layer. If the output layer node with maximum value represents the true class of P , then the weight parameters are not updated. Otherwise, the weight parameters are updated as and where γ, ξ ∈ [0,1] and these are constants. This process is implemented by passing each feature vector in the training set for only one time. Accordingly, the learning process is named as "single pass." The single pass training in adaptive granulation-based network would reduce the computational time drastically. Further, the adaptive architecture of the network has less number of weight parameters (Example : v 1c , v 2c ,...., and v xc for class c).
3) Testing of Adaptive Granulation-Based Network: In the testing stage, an unknown feature vector is feed-forwarded through the adaptive granulation-based network. The values at the nodes of output layer are obtained, as explained in Section II-D2. The unknown feature vector is assigned to the class denoted by the output layer node with maximum value. a) Weighted membership: In real life situations, a dataset is likely to have unequal number of samples in its classes. To take this in account, we multiply the membership of a feature vector to a class with the relative frequency of occurrences of that class (class probability). We denote it as weighted class membership. In a two-class problem, the relative frequency of occurrence of class 1 (ρ 1 ) is where n 1 and n 2 are the number of feature vectors belonging to class 1 and class 2, respectively. Similarly, the relative frequency of occurrence of class 2 (ρ 2 ) is where ρ 1 + ρ 2 = 1. Therefore, if Y c is the class membership of a feature vector for cth class, then its weighted membership would be ρ c Y c , where ρ c is the relative frequency of occurrence of class c. b) Roughness measure: In case of dataset having overlapping classes, an unknown feature vector may have maximummembership value to more than one class, i.e., for more than one output node. This makes the decision-making indecisive (tie). The problem arises because the said unknown feature vector has equal membership to the granules of different classes, i.e., overlapping granules. In such situations, we use roughness measure of the overlapping granules to classify P . The overlapping granules are defined as the granules of different classes to which the feature vector P has nonzero membership values. Example, in Fig. 6(a), P has nonzero membership to the granules g 32 and g 2c .
The roughness of an overlapping granule g is where |gX| is the cardinality of the lower approximation of g [ (7)], |gX| is the cardinality of upper approximation of g [(8)], and X is the set of feature vectors in g. Lower approximation of g means the set of feature vectors which definitely belongs to g. The upper approximation of g denotes the set of feature vectors which definitely belongs to g, as well as the feature vectors in ambiguous region/overlapping region, together.
In tie situations, P is assigned to the class, where the roughness of the overlapping granule is minimum after adding P to that granule. For example, initially, the roughness values of granules g 32 and g 2c (Fig. 6) are R g 32 = (1-4 6 ) = 1 3 and R g 2c = (1-3 4 ) = 1 4 , respectively. The roughness of the granules g 32 and g 2c is then computed after adding P to g 32 and g 2c , individually. It is found that the changed R g 32 = 3 7 > R g 2c = 2 5 . This means, the changed-roughness of g 2c is less after adding P to g 2c as compared to that in class 2. P is therefore assigned to class c [shown in Fig. 6(b)] to resolve the tie. Note that when |g 32 X| = |g 2c X| and |ḡ 32 X| = |ḡ 2c X|, R g 32 = R g 2c implies P cannot be labeled.

III. RESULTS AND DISCUSSION
The performance of AGCNN is demonstrated with eight remote sensing image datasets. The description of these datasets is given in Table II. 1) Hyperspectral Remote Sensing Images: Four hyperspectral RS image datasets were considered in the present study to evaluate the performance of AGCNN model. The datasets are a) Hydice -Washington DC Mall (HWM), b) AVIRIS -Salinas (ASA), c) ROSIS -Pavia University (RPU), and d) AVIRIS -Indian Pines (AIP). The description of these datasets is given in Table II. The datasets consist of densely overlapped pixels in feature space. The distribution of densely overlapped pixels of the HWM dataset in F 1 , F 2 , and F 3 feature space is given in Fig. 7.
2) Multispectral Remote Sensing Images: In support to hyperspectral RS image datasets, four multispectral RS image datasets were used to test the performance of AGCNN model. These datasets are, a) Landsat OLI -Kolkatta (LOK), b) Sentinel MSI -Visakhapatnam (SMV), c) IRS LISS IV -Hyderabad (ILH), and d) IRS LISS III -Delhi (ILD). The description of these datasets is given in Table II.

A. Model Description
In the present study, six models were considered to demonstrate the performance of AGCNN. The criteria and description of six models are given as follows.   In the present study, a fourfold comparative analysis is implemented among the six models. In the first facet of comparison, the models M 1 , M 2 , and M 3 are compared to know the improvement in the performance of model due to CD granulation. In the second facet, comparison between M 3 and M 4 is performed to know the superiority of CNN with adaptive granulation over conventional CNN with CD granulated input. In the third facet, a comparison among M 4 , M 5 , and M 6 is performed to know the improvement in AGCNN with weighted class-membership and roughness measure in decision-making. In the fourth facet, M 6 is compared with M 1 to know the superiority of AGCNN over conventional CNN with ungranulated input. The models were implemented by the system with configuration Intel(R) Core(TM) i7-2600 CPU 3.40 GHz, 3401 Mhz, 4 Core(s), 8 Logical Processor(s).

B. Training and Testing of Models
Eight datasets are created by considering the image patches of the images given in Table II (In HWM dataset, we considered the size of image patch as 9 × 9 × 191). The dataset is divided into two disjoint sets, namely, training set and test set. The parameters of models from M 1 to M 6 are computed using the training set. The performance of model is evaluated using test set. The division of dataset is done such that 20%, 40%, 60%, and 80% of the image patches are used for training and the remaining 80%, 60%, 40%, and 20% are used for testing the model.

C. Performance Metrics
In the present study, the metrics like, overall accuracy (OA), positive predictive (P P ), sensitivity (Sn), F-score (F ), number of iterations (NI), computational time (T c ), and dispersion measure (Dm) [33] are considered to quantify the performance of six models. Overall accuracy (OA) is the average of accuracies obtained for the models for 20%, 40%, 60%, and 80% of training. Furthermore, the class level performance analysis of the model is implemented using the metrics P P , Sn, F , I, T c , and Dm. These metrics are derived from the confusion matrix (CM). P P (also called as precision/user's accuracy) is the ratio of predicted true positive divided by the sum of true positive and false positive. More the value of P P and Sn for a model, better the performance of model. Sn is the number of correctly predicted positives divided by the total number of predicted positives. F -score is the harmonic mean of P P and Sn. F -score lies between 0 and 1. The model with F score near to 1, indicates better performance. NI is the number of iterations for which the model is trained. The total time for training and testing the model is called computational time (T c ). Dm quantifies the distribution of classified feature vectors among the classes. Lesser the Dm value, better the performance of the model.

D. Performance of Models with Hyperspectral RS Datasets
The performances of six models were tested with four hyperspectral RS datasets.

1) Performance of Models With HWM Dataset:
The architecture of six models after layer (L 8 ) and the number of weight parameters are given in Table III. In Table III, an input image I (9×9×191) is feed-forwarded to all the six models. The MLP in M 1 has 191 input nodes, the number of hidden nodes were considered as 50, and the number of output nodes is equal to the number of classes (6). M 1 is trained for 80 iterations. M 1 does not have a granulation layer, reduction layer, and series of CLs and PLs. In M 2 , image (I) is passed through a granulation layer, reduction layer, two CLs and PLs, and MLP. CD granulation of input image has increased the number of features and these  9. OA(%) and T c (s) of recently published adaptive CNN models ACNN [55] and ACNCNN [54], and M 4 , M 5 , and M 6 for HWM dataset.
features are reduced in the feature selection layer. M 3 is similar to M 2 with CD granulated input image, the number of features generated using CD granulation is more (1146) compared with CI granulation (576). In the reduction layer, these features are reduced to 216 (for M 3 ) and 204 (for M 2 ), respectively. The number of input layer nodes (in M 3 , M 4 , M 5 , and M 6 ) is equal to reduced features 216. The number of adaptive layer nodes is equal to the number of granules generated during the evolving process (108). In Table III The performances of six models were tested with the HWM dataset and the results are given in Table IV. In the first facet of comparison, M 3 with CD granulated image obtained 92.97% OA which is 5.73% and 1.13% better than M 1 and M 2 , respectively. The F -value of M 3 is 0.65 and it is comparatively better than M 1 and M 2 . M 2 is 4.6% better than M 1 and it indicates improvement in the OA due to CI granulated input image. The CD granulation of pixels in the input image improved the performance by 1.13% compared with CI granulation of pixels in the input image. The computational time (T c ) of M 3 is 69 s while the T c of M 2 and M 1 is 64 and 56 s, respectively. The CD granulation of input image generated more number      using rough set theoretic approach, adaptive granulation-based network with weighted membership and roughness measure in decision-making. AGCNN with the three novel characteristics like CD granulation of pixels in the input image, adaptive granulation-based architecture, and weighted class-membership and roughness measure of overlapping granules in decisionmaking produced a step-by-step increment in the OA like 5.73%, 2.15%, and 2.22%, respectively, in comparison with basic CNN model (M 1 ). Pictorial representation of OA(%) (gray bar) and T c (s) (blue bar) from M 1 to M 6 is shown in Fig. 8. In Fig. 8, the computational time is drastically decreased from 69 s (M 3 ) to 3.8s (M 4 ). The decrease in computational time from M 3 to M 4 is due to the adaptive granulation-based network with single-pass learning characteristic in M 4 . The metrics like OA and F provides overall information of the models. In support to OA and F -score, the metrics P P , Sn, and Dm were considered to understand the class-level performance of five models. The performance of M 1 and M 6 in terms P P , Sn, and Dm is given in Table V. In Table V, M 6 has better P P , Sn for six classes compared with M 1 . P P and Sn of M 6 for class 1 are 93.21% and 95.77%, respectively. The P P and Sn of M 1 for class 1 are 83.74% and 90.24%, respectively. The One may note the evolution of improvement over the model M 4 to M 6 with additional features, vis-a-vis the best performed comparing method ACNCNN. For example, M 4 with adaptive granulation-based architecture is 2.95% better than ACNCNN interms of OA. It is ten times faster than ACNCNN. This superiority of M 4 over ACNCNN is due to the ability of the granulation-based network architecture to accommodate the information content of new samples in a single-pass learning. M 5 is 3.98% better than ACNCNN in terms of OA. It is also ten times faster than ACNCNN. The further increase in performance here is due to the weighted class-membership-based decision-making. M 6 that adopts the principle of roughness minimization in deciding on ambiguous patterns further enhances the difference to 5.17% with ten time faster in speed. While the comparing adaptive CNN models were trained for 80 iterations, the proposed AGCNN model is trained for a single iteration. The computational merit of AGCNN over other CNN-based models is due to less number of weight parameters in its architecture and single pass learning. The AGCNN has a few hundreds of weight parameters, while the other CNN models have thousands of weight parameters. AGCNN converges to a global minimum in a single iteration, while the comparing CNN models converge to minimum error in multiple number of iterations. These characteristics of AGCNN make it computationally faster than the existing models. Pictorial representation of OA(%) and T c (s) from M 1 to M 6 is shown in Fig. 8. Fig. 9 shows the same for the comparing deep CNN models with highest and lowest performance, as examples. It clearly depicts the superiority of  [54], and M 6 (best-performer). Fig. 10(a) and (d) show a (zoomed) part of such classified images by M 1 , ACNCNN, M 6 , and class labels for HWM dataset. As seen, the model M 6 could classify the objects in the image much better than M 1 and ACNCNN. M 1 is the worst among the three. For example, the road in Fig. 10(c) is clearly visible compared to those in Fig. 10(a) and (b). For the convenience of readers, as example, the panoramic view (of which the said image is a part) of the classified images by M 1 and M 6 for HWM dataset is shown in Fig. 11(a) and (b), respectively, corresponding to worst and best performance.
Similarly, the agriculture patches in ASA data (Fig. 12), the bridge in LOK data (Fig. 13), and concrete constructions in SMV data (Fig. 14) are seen to be much more distinctly extracted (identified) by M 6 , as compared to other two deep models. All these corroborate the findings, as obtained by the aforesaid quantitative indices, and demonstrate further the superiority of the proposed deep model M 6 .
b) Performance of models M 1 to M 6 with other datasets: The performances of the deep models M 1 to M 6 were also tested on other datasets. The results of M 1 and M 6 are given in Table VII as examples of best performing and worst performing models in terms of F-score, OA, and computation time. The superiority of M 6 over M 1 is also verified using other metrics.

IV. CONCLUSION
An adaptive granulation-based CNN model with eleven layers is proposed for remote sensing image classification. The model AGCNN (M 6 ) considers class belonging information of pixels in image patches through class-dependent granulation, and selection of informative features from the granulated ones in terms of rough set theoretic reducts. The model evolves its architecture automatically to accommodate the information content in new samples during the learning process. During classification, it incorporates the weighted class membership, and roughness measure on overlapping granules representing more than one class. The principle of roughness-minimization on overlapping granules enables appropriate class-assignment to doubtful pixels. The concept of using roughness measure in case of tie is novel. The adaptive granulation characteristics of AGCNN reduces the number of weight parameters and makes it able to learn in a single pass. In our study, the number of weight parameters is seen to be much less (viz., 108) in comparison to thousands of weight parameters in conventional CNN. Further, the single-pass training resulted in much reduced computational time (say, to few seconds), as compared to more than 100 s in conventional CNN.
All these features make the proposed model superior to many state-of-the art adaptive CNN-based models. For example, the AGCNN model M 6 produces 97.34%OA with T c of 3.8 s for HWM dataset compared to OA of 92.17% with T c = 38.36 s by the best performing adaptive model (ACNCNN) recently developed. The proposed model is ten times faster than the recently proposed adaptive CNN model ACNCNN, as tested with remote sensing datasets. Superiority of AGCNN is also demonstrated visually on the output classified images.

A. Limitations
In AGCNN, the feature vector in the overlapping regions of granules is assigned to a class based on the roughness of the constituting granules. This method can therefore label the sample even in the tie situation, as explained in Section II-D3b. However, the model cannot label the sample in the overlapping region if the granules have equal roughness. Also, during the training of AGCNN, the granules evolve to accommodate the input sample. During the evolving process of AGCNN, the size of the granule is increased by a constant value and may lead to the maximum overlapping of granules and hence to misclassification of the sample.

B. Futurescope
The model can be further improved by addressing the aforesaid two drawbacks. The concept of neighbourhood roughness of the overlapping granules can be incorporated to classify the ambiguous samples in the overlapping region for better class labeling. Further, the extent of granules during adaptive granulation can be controlled by the roughness of overlapping neighboring granules. ACKNOWLEDGMENT S. K. Pal acknowledges National Science Chair, SERB-DST, Govt. of India. The authors would like to thank Prof. Landgrebe and Prof. P. Gamba for providing hyperspectral remote sensing datasets.