Change Detection in Hyperdimensional Images Using Untrained Models

—Deep transfer-learning-based change detection meth-ods are dependent on the availability of sensor-speciﬁc pretrained feature extractors. Such feature extractors are not always available duetolackoftrainingdata,especiallyforhyperspectralsensorsand otherhyperdimensionalimages.Moreovermodelstrainedoneasilyavailablemultispectral(RGB/RGB-NIR)imagescannotbereused onsuchhyperdimensionalimagesduetotheirirregularnumberofbands.Whilehyperdimensionalimagesshowlargenumberof spectralbands,theygenerallyshowmuchlessspatialcomplexity,thusreducingtherequirementoflargereceptiveﬁeldsofconvo-lutionﬁlters.Recentworksinthecomputervisionhaveshownthatevenuntraineddeepmodelscanyieldremarkableresultin sometaskslikesuper-resolutionandsurfacereconstruction.Thismotivatesustomakeaboldpropositionthatuntrainedlightweight deepmodel,initializedwithsomeweightinitializationstrategy,canbeusedtoextractusefulsemanticfeaturesfrombi-temporal hyperdimensionalimages.Basedonthisproposition,wedesignanovelchangedetectionframeworkforhyperdimensionalimages byextractingbitemporalfeaturesusinganuntrainedmodelandfurthercomparingtheextractedfeaturesusingdeepchangevector analysistodistinguishchangedpixelsfromtheunchangedones.Wefurtherusethedeepchangehypervectorstoclusterthechanged pixelsintodifferentsemanticgroups.Weconductexperimentsonfourchangedetectiondatasets:threehyperspectraldatasetsanda hyperdimensionalpolarimetricsyntheticapertureradardataset.Theresultsclearlydemonstratethattheproposedmethodissuit-ableforchangedetectioninhyperdimensionalremotesensingdata.


Change Detection in Hyperdimensional Images
Using Untrained Models Sudipan Saha , Member, IEEE, Lukas Kondmann , Qian Song , Member, IEEE, and Xiao Xiang Zhu , Fellow, IEEE Abstract-Deep transfer-learning-based change detection methods are dependent on the availability of sensor-specific pretrained feature extractors.Such feature extractors are not always available due to lack of training data, especially for hyperspectral sensors and other hyperdimensional images.Moreover models trained on easily available multispectral (RGB/RGB-NIR) images cannot be reused on such hyperdimensional images due to their irregular number of bands.While hyperdimensional images show large number of spectral bands, they generally show much less spatial complexity, thus reducing the requirement of large receptive fields of convolution filters.Recent works in the computer vision have shown that even untrained deep models can yield remarkable result in some tasks like super-resolution and surface reconstruction.This motivates us to make a bold proposition that untrained lightweight deep model, initialized with some weight initialization strategy, can be used to extract useful semantic features from bi-temporal hyperdimensional images.Based on this proposition, we design a novel change detection framework for hyperdimensional images by extracting bitemporal features using an untrained model and further comparing the extracted features using deep change vector analysis to distinguish changed pixels from the unchanged ones.We further use the deep change hypervectors to cluster the changed pixels into different semantic groups.We conduct experiments on four change detection datasets: three hyperspectral datasets and a hyperdimensional polarimetric synthetic aperture radar dataset.The results clearly demonstrate that the proposed method is suitable for change detection in hyperdimensional remote sensing data.

I. INTRODUCTION
R ECENTLY deep learning has attracted significant at- tention in earth observation [1].Following this trend, deep-learning-based methods have been developed for change detection (CD) [2], an important topic in earth observation.CD plays pivotal role in several applications, including disaster management [3], [4], urban monitoring [5], and precision agriculture [6].While CD methods can be supervised [7]- [9] or semisupervised [5], unsupervised methods are preferred in the literature [2], [10] as collecting labeled multitemporal data is significantly challenging.Before the emergence of deep learning, change vector analysis (CVA) and its object-based variants [10], [11] were popularly used for unsupervised CD.Deep CVA (DCVA) and other transfer-learning-based methods [2], [3], [12] have embedded the concept of CVA in a transfer learning framework.While the transfer-learning-based methods do not use any training or fine-tuning of the deep model, they depend on the availability of pretrained feature extractor that can be used to capture the semantics of the input images.In more details, such transfer-learning-based methods project the bitemporal images in deep featurespace by using a pretrained deep feature extractor and subsequently compares the images in the projected domain.Thus they perform CD by reusing a deep model that was previously trained for some unrelated task, e.g., image classification.Most deep transfer learning based CD methods are designed for synthetic aperture radar (SAR) amplitude images and multispectral images with few bands.
Remote sensing deals with a plethora of sensors showing different spatial, spectral, and temporal characteristics.In many cases, large number of bands are required to efficiently represent the information in remote sensing images.The most well-known example for this are hyperspectral images that sample a broad range of electromagnetic spectrum in hundreds of spectral bands [13]- [17].Some CD applications require rich spectral information and hyperspectral images can be very useful for such cases, e.g., monitoring of mining activity [18].Inspite of this, less attention has been paid to develop deep transfer learning based CD methods for hyperspectral images [19], [20].This can be attributed to the lack of labeled hyperspectral data that impedes availability of any pretrained network.In more details, a transfer-learning-based hyperspectral CD method can be developed only if a pretrained model is available for the SOME SPACEBORN HYPERSPECTRAL SENSORS [13] same data, which is often unavailable for hyperspectral images.Remarkably, due to the lack of training data, some of the supervised hyperspectral image classification models are trained and tested on pixels from the same image [21].Even if sufficient training data is collected for a particular hyperspectral sensor and geography, this model will not be straightforward applicable for another hyperspectral sensor.Currently there are a large number of hyperspectral sensors with differences in spectral coverage and number of bands, e.g., DLR earth sensing imaging spectrometer (DESIS) have 180 bands while precursore iperspettrale della missione applicativa (PRISMA) have 237 bands [13].Please see Table I for comparison of number of bands of different spaceborn hyperspectral sensors.Due to such differences, a model trained for one hyperspectral sensor cannot be used for transfer-learning-based CD on another hyperspectral sensor.Additionally, unmanned-aerial-vehicle (UAV) based hyperspectral imaging has become increasingly popular in various applications, such as agricultural monitoring [13].Such UAV-based hyperspectral sensors may exhibit spectral coverage entirely different from the satellite-based ones.
In addition to hyperspectral data, another example of hyperdimensional data in remote sensing is polarimetric synthetic aperture radar (PolSAR) image.Compared with the singlepolarimetric SAR data, PolSAR images contain more polarimetric information about the targets and are useful to discriminate double-bounce scatterers (such as buildings) from volume scatterers (such as forest) and surfaces using target decomposition methods [22].Thus, PolSAR data are beneficial for applications such as land classification and building extraction [23].In practical PolSAR applications, usually the decomposed results [23]- [25] instead of the raw PolSAR data are used for further analysis, which constitutes a hyper-dimensional (tens to over one hundred channels) data cuboid.
Models trained for multispectral (RGB/RGB-NIR) or SAR amplitude images cannot be effectively reused for feature extraction of hyperdimensional images due to their irregular number of bands.To transfer RGB-trained models on hyperdimensional images, we require to choose only three bands from hyperdimensional images, thus losing a significant amount of information.Another possible solution is to somehow modify the first layer of the pretrained model.
Ulyanov et al. [26] showed that the structure of a network is often sufficient to capture important low-level features from the images without any training.This is highly relevant for hyperdimensional images since it is challenging to transfer a model trained on RGB images to hyperdimensional images, however, it is trivial to just initialize a model to ingest as many number of image channels as desired.This strategy is certainly not as good as learning complex spatial features with abundant labeled images, however, good enough for CD in hyperdimensional images.Arguably, the spatial complexity of hyperdimensional images is not high in most cases, as can be seen in Table I.This is also evident from the fact that some works in the hyperspectral image classification just use 1D convolution [27].While spatial complexity still has an important role to play for hyperspectral multitemporal analysis, we argue that this is not as critical as in high-resolution multispectral images.This brings forth the possibility whether complexity in low-spatial and high-spectral resolution multitemporal hyperdimensional images can be captured by an untrained deep model, merely initialized with a deep model initialization strategy [28], [29].The likelihood of such possibility is supported by the fact that untrained models have recently shown remarkable performance in some computer vision tasks where the spatial complexity is much more critical than the hyperspectral images, e.g., deep image prior [26].
We propose an unsupervised CD method for hyperdimensional images using an untrained deep model as deep feature extractor.The proposed method does not need any prior knowledge about the input or the arrangement of the spectral bands.In addition to distinguishing the changed pixels from the unchanged ones (binary CD), we also extend the method for multiple CD.The key contributions of this article are as follows.
1) This article shows that even an untrained model, merely initialized with a weight initialization technique [28], can be used to capture the spatio-temporal semantics, especially for hyperdimensional data where pretrained models are generally not available.Based on this, this article proposes a CD method, which can effectively segregate changed pixels from the unchanged ones in the hyperdimensional images.2) This article further extends the method for multiple/ multiclass CD using deep change vector obtained using untrained model to cluster the changed pixels into different groups.3) This article experimentally validates the proposed approach on three bitemporal hyperspectral scenes, as well on a bitemporal hyperdimensional PolSAR data, showing the versatility of the approach.The rest of this article is organized as follows.Some relevant works are discussed in Section II.Section III discusses the proposed method.Section IV presents the datasets and results related to hyperspectral images.Results related to PolSAR data are presented in Section V. Finally, Section VI concludes this article.

II. RELATED WORK
Following the relevance to our work, we briefly discuss in this section about: 1) unsupervised CD; 2) hyperdimensional CD methods; and 3) deep image prior.

A. Unsupervised CD
Unsupervised CD methods are generally based on the concept of pixewise difference operation, i.e., CVA [30] or clustering [31].With the emergence of high-resolution imaging, object-based variants of CVA, e.g., parcel change vector analysis (PCVA) [11], incorporated the notion of spatial context in CVA.Morphological filters have also been employed to capture the object information [32].Deep-learning-based unsupervised CD methods, e.g., DCVA [2] are based on transfer learning.DCVA incorporates CVA with pretrained deep network based feature extraction based on the assumption that a pretrained model is available for the target geography and sensor.In addition to optical images, transfer-learning-based frameworks have also shown success in SAR amplitude image analysis [3].

B. CD in Hyperdimensional Images
Very few deep-learning-based CD methods have been proposed for hyperdimensional (hyperspectral or other hyperdimensional) images [33]- [35].In [33], authors identified high dimension and limited datasets as unique challenges for hyperspectral CD.Toward alleviating these challenges, they devised a preclassification-based end-to-end CD framework.Another supervised framework recurrent 3-D fully convolutional network (Re3FCN) was introduced by Song et al. [35].Re3FCN merges a 3-D fully convolutional network (FCN) and a convolutional long short-term memory.Chen and Zhou [36] proposed a supervised CD method consisting of the following three steps: reduction of spectral dimension, joint affinity tensor construction, and binary (changed or unchanged) classification by CNN.While these works successfully introduce deep learning to the hyperspectral CD, they do not present any unique solution toward circumventing the limited availability of datasets in hyperspectral multitemporal analysis.Their works use pixels from same image for training and evaluation.Using such large supervised networks when training and test pixels belong to same scene may lead to overoptimistic accuracy assessment, as shown by Molinier and Kilpi [37].Thus, it is crucial to design unsupervised/transfer-learning-based approaches, like the ones proposed for multispectral and SAR images [2], [3].In addition to hyperspectral images, hyperdimensional CD has also been studied in the context of PolSAR images [24].To the best of authors' knowledge, all deep-learning-based hyperdimensional CD methods are proposed in context of binary CD, without delving into multiple/multiclass CD.

C. Deep Image Prior
Deep models are generally trained on large labeled datasets.This makes us to believe that the excellent performance of CNNs are due to their capability to learn realistic features or data priors from the data.However, several recent works have shown that this explanation is not entirely correct.In one of such first works, Zhang et al. [38] showed that an image classification network can overfit on the training images even when the labels are randomized.This provides us hints that the success of the deep network is possibly not always due to large amount of labeled data, rather sometimes due to the structure of the network.Further delving into this topic, Ulyanov et al. [26] investigated this phenomenon in context of image generation.They showed that a large amount of the image statistics are captured by the structure of generator CNNs itself.Instead of choosing the usual paradigm of training CNNs on large dataset, they fitted CNNs on single image for image restoration problems.The network weights were randomly initialized.Their simple setup could provide remarkable result for various image restoration problems, e.g., denoising and super-resolution.This phenomenon is remarkable as it demonstrates the power of untrained network.Following this work, several other works have followed similar approach demonstrating success of untrained network for different computer vision problems, including surface reconstruction [39] and photo manipulation [40].Another similar line of research is random projection network [41] that is proposed in the context of high-dimensional data, which implies a network architecture with an input layer that has a huge number of weights, making training infeasible.Random projection network [41] tackles this challenge by prepending the network with an input layer whose weights are initialized with a random projection matrix.

III. PROPOSED METHOD
Let us assume that we have a pair of coregistered hyperdimensional images X 1 and X 2 having B 0 bands, where B 0 is much larger than usual number of bands in a multispectral image.No training label or suitable pretrained network is available to us.Our goal is two fold.
1) Binary CD: Distinguish the changed pixels (Ω c ) from the unchanged ones (ω nc ).2) Multiple CD: Further cluster the changed pixels into a group of semantically meaningful groups.To accomplish the abovementioned goals, we initialize a deep model with number of input channels and kernels in intermediate layers modulated according to the dimension of the X 1 and X 2 .This deep model, while untrained, is initialized with an appropriate weight initialization technique [28].Following this, we use this network to extract a set of features from the bitemporal images.Pixelwise difference is obtained as deep change vector that is thresholded to identify the changed pixels.Once changed pixels are segregated, they are further clustered based on the deep change vectors for multiple CD.The proposed hyperdimensional CD framework is called untrained hyperdimensional multiple DCVA (UHM-DCVA) and is shown in Fig. 1 .

A. Feature Extraction
Deep models trained for multispectral images can ingest input images of few channels/bands, in order of three to ten [42], [43].In contrast, hyperdimensional remote sensing images have B 0 channels that is generally larger than 200.Thus, deep models trained on multispectral images are not suitable to ingest hyperdimensional X 1 and X 2 .To overcome this challenge, we use an untrained model for deep feature extraction from X 1 and X 2 .The model, being untrained, can be initialized with capacity to ingest any number of input channels and subsequently projected to any number of kernels in the successive layers.
Conforming to the dimension of X 1 and X 2 , we design first convolution layer such that it ingests the hyperdimensional image of B 0 channels and projects it to β 0 B 0 kernels where β 0 > 1.We use 3 × 3 filters, i.e., weight of first layer is 3 × 3 × B 0 × β 0 B 0 .In our experiments, we set β 0 = 4.The following convolution layer ingests input of dimension β 0 B 0 and projects it to β 1 β 0 B 0 dimension.For simplicity, we have set β 1 = 1.In this way, more layers can be added to the network.Increasing number of layers capture larger spatial receptive field.Considering the coarse spatial resolution of the most hyperdimensional images, we postulate that network need not be as deeper as it is common in multispectral image analysis (further validated in Section IV).Rectified linear unit (ReLU) function is used between successive convolution layers.Pooling operation and fully connected layers are not used.Hence, the spatial size of the input is preserved through successive layers.Key structure of the network is shown in Table II and Fig. 2.
Alhough untrained, the weights are initialized with the He initialization method [28].Their weight initialization strategy allows the initialized elements to be mutually independent and share the same distribution.Although weight initialization was initially proposed in context of obtaining efficient starting point for better training, we use it to obtain a superior feature extractor  that can be subsequently used as deep feature extractor in proposed CD framework.Note that weight initialization does not involve any training.Once initialized, the deep model is used to extract a set of features from both X 1 and X 2 separately, as detailed in Section III-B.

B. Binary CD
All bands of X 1 and X 2 are normalized to have values between 0 and 1. Untrained model is separately applied on X 1 and X 2 to extract a set of deep features for each pixel in the scene [2].Using same model on both images ensure that two very similar inputs (pixels) are mapped to similar representation in the feature space while dissimilar pixels are mapped to dissimilar feature representation, since they are processed through same set of functions.Furthermore, a variance-based feature selection strategy is applied as in [2].Deep features are extracted from the last layer of the network to form pixel wise deep change hypervector (G) [2] that are obtained as the deep-feature-differences of X 1 and X 2 .Components of G (g d (d = 1, . .., D)) tend to zero for unchanged pixels (ω nc ) while they tend to larger (positive or negative) value for the changed pixels (Ω c ).To segregate Ω c from ω nc , we compute deep magnitude ρ for each pixel as the Euclidean norm of ρ maps the D-dimensional G into a 1-D index, while preserving the main properties of the changes.Unchanged pixels tend to generate smaller ρ in comparison to the changed pixels.This is used to segregate Ω c and ω nc by using a thresholding τ .While any suitable thresholding [44] method can be used, we use Otsu's thresholding [45] to compute τ .Any pixel having ρ > τ is assigned to Ω c and to ω nc otherwise.

C. Multiple CD
Changed pixels (Ω c ) are further analyzed in unsupervised way based on G to segregate different kinds of change without any a priori knowledge about the different kinds of change [2].However, we assume an a priori knowledge about number of kinds of change (K).G is a high-dimensional vector and clustering is challenging in such high-dimensional space [46].To overcome this, we first binarize/discretize the components of G [2], [47].Components of G are likely to be either positive or negative, and different kinds of change are likely to show different patterns on the g d (d = 1, . .., D), components of G. Binarization simplifies the information in G, while preserving information descriptive of clusters.G is binarized to G bin with components greater than 0 set to 1 and components smaller than 0 set to 0. G bin is also D-dimensional like G.
Assuming number of changed pixels (pixels in Ω c ) as N c , we have N c binary vectors of D-dimension each.Conversely, representing each feature as a vector, we have D vectors of N c -dimension each.We expect pixels belonging to same kind of change to exhibit similar binary signature, while pixels belonging to different kinds of change to exhibit dissimilar binary signature.Furthermore, many features exhibit similar binary signature and, thus, redundant for discriminating different types of change.Out of D features, the feature which shows most similarity to other D − 1 features can be defined as the most informative feature.Toward this, R(i, j) measures the correlation distance [48] between two N c -dimensional features i and j, scaled in range 0-1 [2], where 1 represents the farthest features.R d (d (d = 1, . .., D)) measures the informativeness of an individual feature ( In the abovementioned equation, while the term within summation computes distance of a feature from other features, coupled with the negation, R d measures how similar is the feature d to the other D − 1 features.The most informative feature d * is selected by choosing the feature that maximizes R d Chosen d * can be used to group pixels in Ω c into two classes.
Next most informative feature can be selected by following the abovementioned process, but first discarding the most informative feature d * and features made redundant by it.This hierarchical process allows us to select a set of informative features that are further used to cluster Ω c into desired number of classes ω c1 , ω c2 , . .., ω cK .

A. Datasets
We validate the proposed method on the following three publicly available bitemporal hyperspectral scenes [49], [50]. 1) The Santa Barbara bitemporal scene is acquired on 2013 [see Fig.The changed pixels are further grouped into 5 change types: type 1 (5558 pixels), type 2 (1331 pixels), type 3 (79 pixels), type 4 (1557 pixels), and type 5 (1461 pixels), shown in Fig. 7(a).Please note the following.1) For Santa Barbara and Bay Area scene, reference information is not known for a fraction of pixels.However, these datasets are not prepared by us and are publicly available datasets used in previous research works [49], [50].Hence, we follow the reference maps available with those datasets.2) We evaluate binary CD method on all three scenes, however, multiple/multiclass CD method on only Hermiston scene, as multiple change reference map is available for only this scene.

B. Compared Methods
We compared the proposed method to following unsupervised methods.
1) CVA using the hyperdimensional pixel values.The comparison to CVA is crucial to understand whether the proposed method provides any additional benefit over mere pixel difference.2) PCVA [11] that captures the spatial information as superpixel.This comparison helps us to understand whether spatio-temporal context in hyperdimensional images can be simply captured by a superpixel-based analysis.3) Spectral angle mapper Z-score image differencing (SAMZID) [19] that is designed specifically for hyperspectral CD based on spectral angle mapper and image difference.The method, as proposed in [19] originally consists of an unsupervised predictor phase and a supervised learning phase.We exclude the supervised phase and apply thresholding [45] on the change map obtained after unsupervised predictor phase.As proposed in [19], two variants are compared: SAMZID Sin and SAMZID Tan .4) Autoencoding of bitemporal Hyperspectral Images for Change Vector Analysis (AICA) [34]-A deep-learningbased unsupervised CD method proposed for hyperspectral images that combines CVA with autoencoder-based training.5) DCVA [2] with feature extractor pretrained on largescale computer vision dataset using VGG16/VGG19 architecture [42].This comparison is important to understand whether a simple transfer learning approach can be used instead of the proposed method.Pretrained VGG architecture can ingest only three channels.So we just select three optimum (RGB) channels from the hyperspectral image to feed to the network.We use three different configurations: by using first convolutional layer of VGG16 (DCVA3Channels-1), second convolutional layer of VGG16 (DCVA3Channels-2), and fifth convolutional layer of VGG16 (DCVA3Channels-3).6) DCVA as mentioned above, however, in this case, we modulate the first layer of the network by replicating the weights as number of channels of hyperspectral images.
In this way, we can feed the unmodified entire hyperspectral images to the network.We use two different configurations: by using first convolutional layer of VGG16 (DCVAAllChannels-1) and second convolutional layer of VGG16 (DCVAAllChannels-2).
7) A variant of the proposed method using dilated convolutional layers (dilation set as 3) to understand whether the proposed method can benefit from the larger receptive field.8) A 1D variant of the proposed method using 1×1 kernels instead of 3×3 kernels.This helps us to understand whether both the spatial context/spectral information contributed to the CD result.The first two compared methods are from classical CD literature.The third and fourth methods are from hyperspectral CD literature that specifically exploit properties unique to hyperspectral images.The following two methods are based on deep transfer learning.The proposed method is unsupervised, does not require any training or even any pretrained network, thus, not compared to any supervised [36] or preclassification [33] based hyperspectral CD method.The last two methods are variant of the proposed method and are shown on the Santa Barbara scene.

C. Settings and Other Details
The results are reported as average of five runs.Comparison is performed in terms of sensitivity (accuracy in percentage computed over reference changed pixels), specificity (accuracy in percentage computed over reference unchanged pixels), and overall accuracy.In more details, given true positive (TP), true negative (TN), false positive (FP), and false negative (FN), sensitivity is TP/(TP+FN), specificity is TN/(TN+FP), and accuracy is given by (TP+TN)/(TP+TN+FP+FN), all scaled by 100.For multiple CD, kappa score is provided.
We perform a number of additional experiments on the Santa Barbara scene.
1) For the proposed method, we use a five-layer network, however, we provide a comparison of performance as number of layers is changed.2) For the proposed method, we generally use He weight initialization method [28], however, its performance with respect to another weight initialization method [29] is discussed.3) For the proposed method we use Otsu's threshold determination method [45], however, its performance with few other thresholding method is shown.4) We show variation of result as β 0 is varied.

1) Santa Barbara:
We first analyze the impact of increasing number of layers for the proposed method (see Table III).

TABLE IV CD RESULTS FOR THE SANTA BARBARA SCENE
The proposed method's result is reported as average of five runs.
We observe that both sensitivity and specificity gradually increase up to four layers.Sensitivity increases while specificity slightly decreases when five layers are used.No performance gain is observed, rather decreases for six layers.While adding more convolution layers improve the spatial receptive field of the filters and increase the complexity of the filters, considering the coarse resolution of the hyperspectral images this behavior saturates soon.Henceforth, we use five layers for all experiments related to the proposed method.
CVA obtains a sensitivity of 76.92 and specificity of 96.69 [see Fig. 3(d)].Remarkably, PCVA performs worser than CVA, showing that spectral and temporal complexity of hyperspectral bitemporal images cannot be captured by mere superpixel-based representation.Being designed for hyperspectral CD, SAMZID Sin , SAMZID Tan , and AICA outperform CVA and PCVA.DCVAAllChannels-1 and DCVAAllChannels-2 are outperformed by the DCVA3Channels-1 [see Fig. 3(e)] and DCVA3Channels-2.This clearly shows that structure of the network is important.VGGNet architecture, originally proposed for 3-channel input, can work satisfactorily while ingesting only 3 out of 224 spectral bands of AVIRIS sensor.However, attempting to forcefully feed the network with all bands result in decrease in the performance.The proposed method [see Fig. 3(f) and Table IV] clearly outperforms all the compared methods (including its dilated and 1-D variant), obtaining a sensitivity 87.98, specificity of 98.57, and accuracy of 94.40.This shows the superiority of the proposed method to ingest input bitemporal images of arbitrary dimension, which cannot be achieved with transfer learning settings (DCVAAllChannels or DCVA3Channels).The proposed model can capture the change information, which is evident from visualization of two randomly selected features (in deep-difference domain) in Fig. 4. Remarkably, the proposed method's 1-D variant that only captures spectral context outperforms the dilated convolution based variant.This indicates that the spectral information plays more important role on CD than the spatial context information for the considered hyperspectal data.This also partly explains why the proposed unsupervised method outperforms transfer learning from models trained on computer vision data.
The performance of the proposed method may vary if another weight initialization strategy is used instead of the He initialization method [28], e.g., if Xavier weight initialization [29] is used, the proposed method obtains a sensitivity of 80.12% and specificity of 94.27%, which is still superior to most compared methods in Table IV.
For thresholding the Otsu's method [45] is used, as it is popular in the unsupervised CD methods [2], [51].However any other suitable method [52]- [55] can be used with similar result as shown in Table V for the ISODATA method [52], [53] and the Li's method [54].
In Section III-A, we chose β 0 as 4. In Table VI, we show variation of result with different values of β 0 that supports the choice of abovementioned value.
2) Bay Area: The Bay Area scene shows complex urban area along with vegetation patches.As in Santa Barbara, PCVA, DCVAAllChannels-1, and DCVAAllChannels-2 do not

TABLE VIII CD RESULTS FOR THE HERMISTON SCENE
The proposed method's result is reported as average of five runs.obtain satisfactory result.CVA [see Fig. 5(d)], SAMZID Sin , SAMZID Tan , AICA, DCVA3Channels-1 [see Fig. 5(e)] and DCVA3Channels-2 obtain superior result in comparison to them.The proposed method outperforms all of them, in terms of sensitivity, specificity, and accuracy [see Fig. 5(f)].Detailed quantitative results are shown in Table VII.
3) Hermiston: The spatial complexity of Hermiston is lesser compared to the other two scenes.The changes form simple geometric pattern in this scene.Results obtained for this scene is similar to the other two scenes.Quantitative results are shown in Table VIII.The proposed method [see Fig. 6(f)] either outperforms or obtains comparable specificity in comparison to other methods.The proposed method outperforms CVA [see Fig. 6(d)], PCVA, SAMZID Sin , SAMZID Tan , AICA, DCVAAllChannels-1, and DCVAAllChannels-2 also in terms of sensitivity.However, DCVA3Channels-1 and DCVA3Channels-2 obtain superior sensitivity than the proposed method.This relative success of transfer-learning-based setup on this dataset can be attributed to the less spatial complexity of the scene.

E. Multiple CD Results
Multiple CD reference map is only available for Hermiston scene.The reference map is shown in Fig. 7(a).Result obtained by the proposed method, using deep features extracted using untrained model, is shown in Fig. 7(c).It is evident that the proposed method is able to detect the important semantic changes.There is certainly overlap between the classes shown in blue and red.However, it is clear from Figs. 6(a) and (b) that the blue and red classes represent similar semantic notion, making it difficult for the unsupervised multiple CD method to differentiate them.
To understand whether the proposed multiple/multiclass CD scheme benefits from using the untrained model as feature extractor, we compare it to result obtained by using original hyperspectral data [see Fig. 7(b)].The proposed method is visually superior than this baseline.The proposed method obtains a kappa of 0.80, in comparison to 0.72, obtained using the original hyperspectral data.

V. RESULTS ON DECOMPOSED POLSAR DATA
The decomposed POLSAR bitemporal data is a pair of 138 band real-valued data acquired using UAVSAR over an urban area in San Francisco city on September 2009, and May 2015, first presented in work by Najafi et al. [24].We use the same set of methods as for hyperspectral CD for comparison except those specifically designed for hyperspectral images (SAMZID and AICA) and DCVA3Channels-1/2 as there are no available R, G, B bands in this case.The proposed method quantitatively outperforms all compared methods, as tabulated in Table IX.

VI. CONCLUSION
In this work, we presented an unsupervised CD method for hyperdimensional images.Labeled training data is scarce for hyperdimensional images and models trained on multispectral sensors cannot be directly applied on them, due to mismatch of dimension.The proposed method overcomes this problem by simply using an untrained model for feature extraction from bitemporal hyperdimensional images.As the feature extractor model is untrained, it can be initialized with as many number of input channels as desired with appropriate weight initialization technique.Moreover, the number of filters in the subsequent layers can also be chosen in a flexible manner, as there is no training involved.Extensive experiments on four hyperdimensional datasets show the superiority of the proposed approach.The proposed approach is also capable of clustering the changed pixels into semantically meaningful groups, as shown for Hermiston dataset.While the idea seems bold and new in context of remote sensing, similar idea has been verified before in the computer vision and machine learning literature, e.g., deep image prior.The proposed approach benefits from the fact that hyperdimensional images generally exhibit less spatial complexity due to the cost of generating higher resolution in both spectral and spatial domain.Thus, the applicability of the method to very high spatial resolution hyperdimensional sensors may not be straightforward and will be investigated in future work.Our future work will also investigate untrained models in the context of the hyperspectral image classification.As a final note, the proposed approach should not be seen as a competitor to the supervised methods, rather as a complementary to them.
3(a)] and 2014 [see Fig. 3(b)] with the AVIRIS sensor (224 spectral bands) over the Santa Barbara region in California, United States.The spatial dimension of the images are 984 × 740 pixels.Reference information is known for only 132 552 pixels, out of which 80 418 pixels are unchanged and 52 134 pixels are changed [see Fig. 3(c)].2) The Bay Area bitemporal scene is acquired on 2013 [see Fig. 5(a)] and 2015 [see Fig. 5(b)] with the AVIRIS sensor (224 spectral bands) over the area surrounding the city of Patterson (California).The spatial dimension of the images are 500 × 500 pixels.Reference information is known for only 60 610 pixels, out of which 29 393 pixels are unchanged and 31 217 pixels are changed [see Fig. 5(c)].3) The Hermiston scene [see Figs.6(a) and (b)] is acquired on the years 2004 and 2007 with the Hyperion sensor (242 spectral bands) over the Hermiston City area in Oregon, United States.Bands B001-B007, B058-B076, and B225-242 are not calibrated, hence, we exclude them from our processing.The spatial dimension of the images are 390 × 200 pixels.A total of 68 014 pixels are labeled as unchanged.Remaining pixels are changed [see Fig. 6(c)].

Fig. 4 .
Fig. 4. Visualization of two randomly selected features, as generated by the proposed model, on the Santa Barbara scene.It is evident that the features capture the change information.

TABLE I NUMBER
OF BANDS AND GROUND SAMPLING DISTANCE (GSD) FOR

TABLE II KEY
STRUCTURE OF FIVE-LAYER UNTRAINED FEATURE EXTRACTOR NETWORK ASSUMING NUMBER OF CHANNELS IN INPUT IMAGE IS 224All convolution layers are followed by ReLU activation.

TABLE III PERFORMANCE
VARIATION OF THE PROPOSED METHOD ON THE SANTA BARBARA SCENE AS NUMBER OF LAYERS ARE VARIED All results are reported as average of five runs.

TABLE V VARIATION
OF THE RESULT FOR SANTA BARBARA SCENE AS THRESHOLD DETERMINATION SCHEME IS VARIED

TABLE VI VARIATION
OF THE RESULT FOR SANTA BARBARA SCENE AS β 0 IS VARIED TABLE VII CD RESULTS FOR THE BAY AREA SCENEThe proposed method's result is reported as average of five runs.

TABLE IX CD
RESULTS FOR SAN FRANCISCO POLSAR SCENEThe Proposed method's result is reported as average of five runs.