A Comprehensive Tensor Framework for the Clustering of Hyperspectral Paper Data With an Application to Forensic Document Analysis

In forensic document analysis, the authenticity of a document must be properly checked in the context of suspected forgery. Hyperspectral Imaging (HSI) is a non-invasive way of detecting fraudulent papers in a multipage document. The occurrence of a forged paper in a multi-page document may have a substantial difference from rest of the papers in its age, type, color, texture, and so on. Each pixel in an HSI data can be used as the material fingerprint for the spatial point it corresponds to. Hence, hyperspectral data of paper samples made of the same substance have similar characteristics and can be grouped into a single cluster. Similarly, paper samples made of different substances have different spectral properties. This paper relies on this heuristic and proposes a tensor based clustering framework for hyperspectral paper data, with an application to detect the forged papers in multi-page documents. Information embedded in the hyperspectral patches of the papers to be clustered is arranged into individual lateral slices of a third-order tensor in this framework. Further, this work employs the self-expressiveness property of submodules and an objective function is formulated to extract self-expressive representation tensor with low multirank and f-diagonal structure. Objective function of the proposed method incorporates $l_{\frac {1}{2}}$ -induced Tensor Nuclear Norm (TNN) and $l_{\frac {1}{2}}$ regularization to impart better low rankness and f-diagonal structure to the representation tensor. Experimental results of the proposed method were compared to the state-of-the-art subspace clustering approaches. The results demonstrate improved performance of the proposed method over the existing clustering algorithms.


I. INTRODUCTION
Hyperspectral sensors generate the hyperspectral image (HSI) of a spatial scene in hundreds of spectral bands [1]. Usually, an HSI data is characterized by a large number of images captured at different wavelengths [2]. In the HSI data of a given object/material, the information embedded in various spectral bands is varied; as the material absorbs or reflects differently at different wavelengths [3]. Since the HSI data contains a plethora of information both in its spectral and spatial bands, it has got a lot of interest in sectors such as geology, remote sensing [4], agriculture [5], forensic research [6], and The associate editor coordinating the review of this manuscript and approving it for publication was Jad Nasreddine . so on. HSI techniques were previously used mainly in various fields of satellite imaging and remote sensing [7]. Later, HSI approaches have been widely employed in a number of applications such as food quality inspection [2], medical imaging [8], forensic analysis, and material science [9]. Furthermore, when compared to other invasive methods, HSI methodologies have considered to be a good candidate for non-invasive analysis in characterizing the material properties [6], [10].
An HSI data is typically represented by a three dimensional data cube with two spatial domains and one spectral domain [4], [5]. The spectral information are stacked along the third dimension [5]. Hence, a single HSI pixel can be viewed as an N dimensional vector, p ∈ R N , where N denotes the total number of spectral bands [5]. In general, each point or a pixel of an HSI data characterizes a given material with its spectrum at that point. As a result, HSI pixel information can be employed as a material fingerprint for the object/material under consideration, which can be used to differentiate the characteristics of different materials [6].
As a non-destructive contactless method and having enriched with the information from numerous hyperspectral bands, HSI methods have got a substantial recognition in the field of forensic document analysis [6]. In recent days, there have been a considerable number of cases reported on fraudulent manipulation of legal documents. In the context of a suspected forgery, the authenticity of the document need to be examined. This is normally done by examining both the ink and the paper that have been used to create the suspected document [6], [9]. Suppose, the inclusion of distinct paper type is identified in a multipage document, and which may be suspected of being forged. Hyperspectral analysis can be effectively used to detect the fraudulent paper/papers in the above described scenario. A hyperspectral paper data is the one that has been subjected to the HSI imaging technique [6]. Consider the hyperspectral data of a single paper, and assume that the entire portion that paper is composed of the same material. Then, the spectrum that corresponds to different sample areas of hyperspectral paper data would share similar spectral properties, unless or otherwise any portion of the paper under consideration is forged [6], [10]. To put it another way, the hyperspectral data of paper samples made of the same substance can have similar characteristics, and these samples may be grouped into a single cluster [4]. On the other hand, if paper samples considered are made of different material, the spectrum of these samples exhibit different spectral properties, and those samples may fall into different clusters [6]. The occurrence of a forged item, whether one or many, in a multipage document may have differed from the rest of the papers due to a variety of factors such as material, type, age, texture, color, and so on [6]. HSI methods, in comparison to three channel imaging techniques, can better detect and evaluate different objects by analyzing their spectral fingerprints through a wider spectrum [11]. As a result, the challenge of document analysis in a suspected forgery can be effectively addressed by the clustering analysis of hyperspectral paper data samples [6], [10].

II. RELATED WORKS
Because of the large dimensionality of hyperspectral data, supervised classification with prior labelling of the data would be difficult to implement [12]. Unsupervised clustering methods, which do not require prior labelling but partition datapoints based on their inherent similarity, are used to overcome this challenge [12]. In the literature on hyperspectral data clustering, many techniques have been identified, that adapt conventional unsupervised clustering methods. In remote sensing applications, methods such as fuzzy c-means [13], k-means [14], and spectral clustering [15] have been used to classify pixels for the clustering of hyperspectral images [16]. However, due to the large dimensionality of hyperspectral data, limited performance of them were observed. Methods such as PCA [17], ICA [18], and LDA [19] were used in certain works to reduce the dimensionality of hyperspectral data [6], [20]. However in some works, t-Stochastic Neighbouring Embedding (t-SNE) has been used in studies such as clustering of hyperspectral paper data and hyperspectral ink data, and has surpassed the aforementioned methods [6], [9].
In recent decades, principles of subspace modeling have been applied in the field of hyperspectral image clustering [12]. Due to the multidimensionality and highly correlated information contained in neighbouring spectral bands, hyperspectral data can be structured as multiple low dimensional subspaces embedded within a large dimensional space [12]. This assumption is based on the fact that, HSI data often contains large homogeneous regions, and the pixels inside those regions can have similar spectral properties [1]. With the advancement in Sparse Representation (SR) and Low Rank Representation (LRR) models, many works have been proposed to meet the challenge of HSI clustering, in recent decades [21]. Because of its robustness, Sparse Subspace Clustering (SSC) was used in HSI clustering [22]. Exploiting the abundance of spatial information and high spectral correlation, Zhang et al. proposed a spectral-spatial SSC (S 4 C) for the effective grouping of HSI data [4]. Similarly. an l 2 norm regularized SSC proposed by Zhai et al. was employed in the field of hyperspectral remote sensing imagery [16], [23]. Then, Whang et al. introduced a Fast High Order SSC (FHoSSC) with Cumulative Markov Random Field (MRF) for subspace segmentation that took advantage of superpixels [16]. A number of Low Rank Subspace Clustering (LRSC) approaches have been proposed for the clustering of hyperspectral imagery, based on LRR techniques [12]. In addition, in works such as [24] and [25], hypergraphs have been used to obtain an accurate information about the manifold structure. Following that Xu et al. proposed a superpixel based LRSC that leverages hypergraphs for the classification of hyperspectral images [12].
A number of works have recently been published that exploit the properties of higher order tensors by utilizing the multilinear algebra and abstract algebra [26]. Since the invention of the t-product proposed by Kilmer et al. [27], the multiplication of third order tensors has become substantially simplified. Some recent works introduced the concept of free submodules by assuming that datapoints from a large dimensional space are lying near a union of free submodules [28]. In Sparse and Low Rank Submodule Clustering (SLRSmC), Kernfeld et al. employed Union of Free Submodules (UoFS) model in their work, which is based on the self expressiveness property of free submodules [29]. Relying on this self expressive representation, Wu et al. proposed a Structure Constrained Low Rank Submodule Clustering (SCLRSmC) framework for clustering of 2D images [26]. In SCLRSmC, images to be clustered are stacked into the lateral slices of a third order data tensor. Further, the data tensor has been In this, three different types of papers are shown among which paper 1 and paper 2 are of same type. The FIGURE at the right side shows hyperspectral data samples fetched from different portions. It is assumed that samples with different spectra would lie in different submodules. N s represents number of spectral bands. modeled as a t-product of the data tensor and a structured low rank coefficient tensor, based on the self expressive representation [26].
According to the past literature on HSI data clustering, it is identified that the capabilities of tensor frameworks have not been properly exploited in majority of the existing works. Moreover, due to the multiple spectral bands present in hyperspectral data, most of the proposed works suffer from increased computational complexity [30]. Tensor based frameworks, on the other hand, are relatively easy to implement, and computations can be simplified using multilinear algebra tools [26]. In addition, depending on the problem and its needs, datapoints can be arranged into appropriate slices of a tensor in different orientations [31]. Since, the diverse information of a hyperspectral data is embedded in its hundreds of spectral bands, a tensor based framework is a good candidate for the execution of hyperspectral data clustering. Above all, a comprehensive framework for clustering hyperspectral paper data that fully explores the capabilities of a third order tensor space has not yet to been properly employed. Based on the aspects aforementioned, it is evident that there is still room for a tensor based clustering framework that could effectively address the challenge of clustering of hyperspectral paper data.
In this work, we assume that if every single paper in a multipage document is made of the same material, then attributes such as texture, color, and age would also be identical throughout the entire portion of that paper under consideration [6]. Based on the preceding assumption, it is obvious that the hyperspectral samples of a paper data fetched from different parts of a single paper will have similar spectral signatures and those samples will be lying in a single submodule [6], [26]. Hence, we further believe that if all of the pages in a multipage document are made of the same material, then hyperspectral data samples taken from various parts of all those papers in that document would almost undoubtedly be found in a single submodule. Therefore, a forged paper can be easily identified because the hyperspectral data samples of the forged item will be placed in a different submodule. We plotted the normalized reflectance spectrum of hyperspectral data of various papers presented in FIGURE 2 (a), to underline the above-mentioned aspect. It has been observed that the reflectance spectra of those papers differ significantly from one another and are easily distinguishable (FIGURE 2 (b)). Hence, there will be a high probability for those samples from various papers to be lying in different clusters/submodules. We also plotted the reflectance spectra of a number of specimens collected from the same paper. We used five hyperspectral samples with dimensions ∈ R S 1 ×S 3 ×N s , (S 1 = 10, S 3 = 10, N s = 186) from Paper 1 and Paper 12, shown in FIGURE 2 (a) for this. FIGURE 2 (c) shows that the spectra of the samples taken from same paper (say, Paper 1 or Paper 12) are overlaid on one another. The indistinguishable spectra of those samples from the same paper are very likely to belong to a single submodule.
On the basis of this heuristic described above, we propose a submodule clustering framework for hyperspectral paper data by embedding each hyperspectral data sample in a third order tensor space. An overview of the proposed framework is given in FIGURE 1 with an illustration of hyperspectral data of four papers arranged in a tiled format. Amongst, Paper 1 and Paper 2 are assumed to be composed of the same material. Furthermore, the fetched hyperspectral 3D patches/blocks ∈ R S 1 ×S 3 ×N s and its different submodules are also shown in FIGURE 1. Now, we list the major contributions of the proposed method, 1) We develop a comprehensive tensor based framework for the clustering hyperspectral paper data. First, the 3D patches ∈ R S 1 ×S 3 ×N s from hyperspectral paper data are arranged into lateral slices of a third order tensor, termed input data tensor in this framework. Employing t-product, the third order input data tensor is then self-expressively represented as the product of the input data tensor itself and a low rank structured coefficient tensor. 2) In a hyperspectral paper data, we use the heuristic that the three dimensional paper patches fetched from different locations of the same paper may have stronger correlations, whereas patches taken from distinct papers may exhibit lower correlations. In the proposed method, we incorporate this heuristic by employing a dissimilarity matrix which can clearly capture the different correlation exist in between the hyperspectral paper data samples/patches. 3) To the best of our knowledge, the aforementioned methodology is the primarily one that incorporate a complete tensor based framework for the clustering hyperspectral paper data.

III. PRELIMINARIES AND TECHNICAL BACKGROUND
This section illustrates notations and mathematical concepts used in our paper. Further, important terms, its notations and corresponding descriptions are given in TABLE 1. Following that, the mathematical preliminaries and their expressions are presented.
D. TENSOR MULTIRANK [26] The multirank of a tensor, X ∈ R n 1 ×n 2 ×n 3 is a vector, p ∈ R n 3 with the k th element is equal to the rank of the k th frontal slice of X , where X represents the Fourier transform of X . Then, X = fft(X , 3) denotes the DFT along the third dimension for the tensor, X [28]. [26] The t-SVD of a third order tensor X ∈ R n 1 ×n 2 ×n 3 is given by, X = U * * V T where, U ∈ R n 1 ×n 1 ×n 3 and V ∈ R n 2 ×n 2 ×n 3 are the orthogonal tensors. Then, ∈ R n 1 ×n 2 ×n 3 is an f-diagonal tensor, where its frontal slices contain diagonal matrices [26]. The t-SVD of tensor, X can be found out using the SVDs of frontal slices of its Fourier tensor, X [26]. For the frontal slice X (k) of the Fourier tensor X , the SVD is given by, U(:, :, k) * (:, :, k) * V(:, :, k), for k = 1, 2, . . . , n 3 . Consider a tensor, X ∈ R n 1 ×n 2 ×n 3 with t-SVD X = U * * V T , then its l 1 2 -induced TNN can be expressed as, The solution for l 1 2 -induced TNN can be deduced from a number of steps. Each frontal slice, (k) consists of s 1 ≥ s 2 . . . ≥ s N ≥ 0 singular values at its diagonal positions and those values can be represented by a vector s, where, s = (s 1 , s 2 , . . . , s N ) ∈ R N . First, apply half thresholding function stated in Eq: (1) onto each members of s ∈ R N . This is accomplished with the half thresholding operator H λ, 1 2 (.), as proposed in [32]. The half thresholding operator H λ, 1 2 (.) is a non-linear mapping function and for any vector, s = (s 1 , s 2 , . . . , s N ), it can be expressed as, where, 'th represents the threshold [32], [33]. Then, the entire procedure to find the l 1 2 -induced TNN can be summarized in Algorithm 1.

IV. PROPOSED METHOD
This session begins with an illustration of the selfexpressiveness property of free submodules [26]. According to the self-expressive representation, a member or datapoint : in a submodule can be represented as the t-linear combination of other members in the same submodule. To put it in another way, consider K n 3 to be a set of tube fibers belongs to R 1×1×n 3 that forms a commutative ring under regular addition and t-product [26]. The self-expressive representation of submodules can be expressed as the t-linear combination of oriented matrices and mode-3 fibers by making use of the multilinear algebra. An oriented matrix with dimensions of n 1 × 1 × n 3 , can be formed from a matrix of size n 1 × n 3 by twisting it perpendicular to a page [29], [34]. Let K n 1 n 3 denote the set of n 1 × 1 × n 3 oriented matrices. Further, a single oriented matrix can also be viewed as a one dimensional vector with dimension n 1 and the elements becomes 1 × 1 × n 3 tube fibers [26]. The set of oriented matrices can then be considered as an n 1 dimensional free module over the ring K n 3 [26], [29]. Consider a generating set , and any element − → X ∈ K n 1 n 3 can be uniquely represented as a t-linear combination of the − → D i s [26]. Mathematically, it can be represented as, where, − → z i ∈ K n 3 [26]. This is the same as generalizing vector space over a field [26]. Hence, with t-product, linear combination for submodules can be executed with the corresponding coefficients as mode-3 or tube fibers [29], [34]. Hence, consider F n 1 n 3 , a free submodule which is a subset of the module K n 1 n 3 . Then, consider a set of L free submodules, { l F n 1 n 3 } L l=1 and any element corresponding to a free submodule F can be represented as t-linear combination of elements in the union of L free submodules [26]. In such a representation, non-zero tube fibers represent the coefficients correspond to other elements belong to the same submodule and the zero tube fibers represent, coefficients correspond to elements from other free submodule [26]. Circular convolution is defined within t-product in the spatial domain. However, it can be replaced and simplified by multiplication in the Fourier domain using Discrete Fourier Transform (DFT) [26].
. . , N represent a set of hyperspectral pixels, that has been sliced from a hyperspectral paper data. In the above depiction, S 1 × S 3 represents the spatial dimension of the hyperspectral patch, and N s represents the number of spectral bands. Then, N denotes the total number of samples taken from different pages from a multipage document. Further, consider a matrix B i ∈ R S 1 S 3 ×N s , where i = 1, 2, . . . , N which is to encapsulate the spectral information of each hyperspectral data sample X i ∈ R S 1 ×S 3 ×N s . The information contained in each spectral sample of dimension S 1 × S 3 of X i ∈ R S 1 ×S 3 ×N s is then encoded into the respective columns of the matrix B i ∈ R S 1 S 3 ×N s . In other words, each spectral sample of dimension R S 1 ×S 3 will be reformed into a column vector with dimension R S 1 S 3 , and there will be N s such column vectors for a single hyperspectral patch, X i ∈ R S 1 ×S 3 ×N s . Then, the matrix B i consists of all the information contained in a hyperspectral patch or sliced from a hyperspectral paper data. The entire process described above is then repeated for each of the N hyperspectral data samples to be clustered.
Consider a third order tensor X ∈ R S 1 S 3 ×N ×N s , to integrate the information contained in each hyperspectral data sample, X i ∈ R S 1 ×S 3 ×N s , where i = 1, 2, . . . N , into a third order tensor space. Consequently, by the process of twisting, each matrix B i can be converted into an oriented matrix, − → B i ∈ R S 1 S 3 ×1×N s as illustrated in FIGURE 4. The aforementioned process is repeated for N hyperspectral data samples, a col- can be made. The set of oriented matrices is then organized into the lateral slices of a three-dimensional tensor X ∈ R S 1 S 3 ×N ×N s , and this process is illustrated in FIGURE 5. By this process, each lateral slice, X (:, i, :) ∈ R S 1 S 3 ×1×N s of the three dimensional data tensor, X holds the diverse information contained in respective hyperspectral data sample.

B. PROBLEM FORMULATION
To solve the clustering problem, the next step is to create a self-expressive representation for the hyperspectral paper data. For this task, we assume that the hyperspectral paper data placed in the lateral slices belong to a union of L free submodules [26]. In the self expressive representation, a datapoint belonging to a submodule can be expressed as a t-linear combination of other datapoints in the same submodule [26]. On the other hand, a coefficient tensor Z will exist, where, the relationship between the data tensor X can be expressed in terms of t-product such that X = X * Z [26], [31], [35]. In the above representation, the data tensor, X itself acts as a dictionary. The coefficient tensor Z would have evolved with a low tensor multirank and an f-block diagonal structure for improved representation and reduced computational cost [26], [34]. Hence, the proposed method imposes low tensor multirank and a structure constraint on the coefficient tensor. Finding the tensor multirank, on the other side, is nonconvex, thus its strong convex surrogate Tensor Nuclear Norm (TNN) is a good candidate. Then, TNN of Z, denoted by Z can be expressed as the sum of the singular values of all the frontal slices of Z [26]. Many recent methods such as [26], [31], [36] have employed TNN to impose the low tensor rank constraint on the representation tensor in their optimization problems.
In order to find the TNN of a particular tensor, l 1 norm is employed to compute the absolute sum of singular values of its frontal slices. The acceptance of l 1 minimization is due to the fact that it is convex and the sparser solution can be obtained with less computational bottleneck [32]. But in recent studies, it is observed that l q (0 < q < 1) regularization techniques provide more sparser solutions compared to l 1 norm [32], [33]. For a vector x ∈ R N , l q regularization problem from the observation y = Ax, can be represented as, where, y ∈ R m , A ∈ R m×N . Then, x q represents the l q quasi-norm and is defined by, q . The unit ball representations of various norms are illustrated in FIGURE 6 in which l 2 norm has the spherical shape, whereas in l 1 norm, it is of diamond shaped. It is observed from FIGURE 6 (a) and FIGURE 6 (b) that l 1 regularization provides a sparser solution compared to l 2 norm. However, as the value of q is again reduced, the unit ball can assume the shape as shown in FIGURE 6 (c) in which there is higher probability for the y = Ax line to coincide with axes. Hence, VOLUME 10, 2022 FIGURE 5. Self expressiveness representation (X = X Z) of hyperspectral data using union of free submodules approach. Coefficient tensor Z ∈ R N×N×N s is represented with low multirank and f-diagonal structure.
the probability of achieving sparser solution is higher as the value of q is changed from 0 to 1. For q ∈ [ 1 2 , 1), solution will be sparser for smaller value of q. But no significant change is observed in the performance for q ∈ [0, 1 2 ) [32], [33], [37]. Hence, fixing the index q = 1 2 , l 1 2 regularization has been chosen as an improved regularization technique which yields more sparser solution than l 1 minimization [32], [33].
Moreover, iterative half thresholding algorithm proposed by Xu et al. provided a fast solution and convergence to the l 1 2 approach, despite the non-convex nature of l 1 2 norm [32]. Furthermore, within some constraints, Xu et al. had also verified the convergence of the half thresholding algorithm to a stationary point by a dynamic system methodology [33]. The strong sparsity inducing ability of l 1 2 regularization was successfully implemented in many sparsity problems [37]. Motivated from the aforementioned successful approaches and benefited from the strong theoretical background, to obtain more accurate tensor low rank representation, l 1 2 -induced TNN is employed in the proposed method by replacing l 1 2 norm in place of l 1 norm in the expression of TNN. The formulation of l 1 2 -induced TNN has already been detailed in Section III. Furthermore, an appropriate block diagonal structure for the representation tensor encourages clustering of multi-view data and improves clustering algorithm performance. Obviously, in the samples of the hyperspectral paper data, objects belonging to a single submodule have strong correlations, while objects belonging to distinct submodules have lower correlations [26]. Hence, the correlation between different datapoints in the hyperspectral data can be captured with a dissimilarity matrix, denoted by P DM ∈ [0, 1] N ×N , where the entries of P DM is given by [38], where, i, j = {1, 2, . . . , N } and is the Pearson rank correlation coefficient represented by r x i x j [39]. Generally, Pearson rank correlation coefficient indicates a measure of linear relationship between the two variables. Unlike Euclidean scores, the above metric shows how closely two variables are correlated. The value, r x i x j = +1, indicates the positive correlation between the variables x i and x j , whereas r x i x j = −1 stands for a negative correlation. Further, µ x i and µ x j represent the sample mean of x i and x j . In the above expression, x i = (vec(X (:, i, :)) and x j = (vec(X (:, j, :)), where, vec(.) indicates the vectorization of the lateral slices into a one dimensional vector [39]. The proposed method integrates the following aspects into its optimization problem.
1) The proposed method incorporates l 1 2 -induced TNN to impart better low rankness on the representation tensor Z.
2) Integrating the dissimilarity matrix within the proposed method depicts the higher correlations occur between members of the same submodule and lower correlations for those exist in distinct submodules. Furthermore, it aids in the better capturing of f-diagonal structure. 3) Since l 1 2 regularization can give a more sparse solution than l 1 norm, the submodule structure constraint is modified using l 1 2 norm. Furthermore, using the abilities of l 1 2 -induced TNN and l 1 2 regularization, a single stage optimization problem is formulated to obtain a better self-expressive representation to retrieve the underlying clusters. Combining all the above, the proposed optimization problem can be reformulated as, where, .  represents l 1 2 norm. Further, . F denotes the Frobenius norm. Finally, X represents the third order data tensor X ∈ R S 1 S 3 ×N ×N s . Further, we employ variable splitting for Z, into the above equation such that Z = C and Z = Q.
In the above expression, λ 1 and λ 2 denote the regularization parameters. The above constrained equation is transformed into a unconstrained one using Augmented Lagrangian (AL) Method [26], given by, where, tensors G 1 and G 2 are the Lagrangian multipliers, µ ≥ 0 denotes the penalty parameter and ., . denotes the inner product. The above problem can be solved by iteratively minimizing the Lagrangian L over one tensor while keeping the others constant. C Subproblem: The update expression for C is given by, The subproblem of updating C can be transformed into the following form, The solution to the above subproblem is obtained by, where, H τ [.] is the singular value half thresholding operator and τ = 1 µ is the threshold value. Q Subproblem: The update expression for Q is given by, The above equation can be decomposed into N s expressions and the k th frontal slice of Q can be updated by, where, Q (k) [j+1] is the k th frontal slice/matrix of Q. The solution to the above subproblem is given by half thresholding operator [37], where, H λ 2 P DM µ i is the halfthresholding operator [32]. Here, m,n is the (m, n) th element of k th frontal slice/matrix of Q. Z Subproblem: The subproblem for updating Z is given by, It can be written as, Taking Fourier Transform on both sides, the above equation can be rewritten as, where, Z, P µ [j] respectively and ⊗ indicates the slicewise multiplication. The analytic solution for the update of the k th frontal slice is given by, In the algorithm, stopping criterion is measured by the following condition as (20), shown at the bottom of the next page. The proposed method can be summarized in Algorithm 2.

A. HYPERSPECTRAL PAPER DATASET PREPARATION
The hyperspectral images of the papers to be clustered are captured using a push-broom hyperspectral camera, HySpex VNIR-1800 with wavelength ranges from 400 nm to 1000 nm with a spectral sampling of 3.18 nm. Further, a pre-processing software, HySpex RAD [9] is used to perform basic camera corrections including dark current subtraction, sensor correction and radiometric calibration. Then, the arrangement for creating hyperspectral paper data of those papers are illustrated in FIGURE 7 (a), where the papers are arranged in a tiled format. This dataset is prepared using papers of various colors, thicknesses, textures, ages, and manufacturers. The types of papers are given in TABLE 2. The obtained hyperspectral paper data has a spatial dimension of 7500×1800 and consists of 186 spectral bands. The hyperspectral paper data sample areas, R S 1 ×S 3 ×N s are selected from a wide range such as R 10×10×186 to R 50×50×186 for creating the input data tensor X ∈ R S 1 S 3 ×N ×N s . Please refer Section IV-A which describes the process of arranging the hyperspectral patches into the lateral slices of a third order input data tensor.

B. EXPERIMENTAL RESULTS
This section presents the Experimental results of the proposed method and the state-of-the-arts. Sparse Subspace Clustering (SSC) [40], Low Rank Subspace Clustering (LRSC) [41], Least Square Regression (LSR) [42], Structure Constrained-Low Rank Representation (SCLRR) [38], Structure Constrained Low Rank Submodule Clustering (SCLRSmC) [26] and l 0 -LRSC [43] are the state-of-theart clustering algorithms chosen for comparison. Similarly, Accuracy (ACC), Normalized Mutual Information (NMI), Purity, Adjusted Rand Index (ARI), F-score, Precision, and Recall are the quality metrics that have been employed for evaluating the algorithms. All of the metrics described above have already been defined and presented in various papers. Accuracy, NMI definitions and equations can be found in [44] and in [45]. Then, in [31] and [46], expressions for Purity and ARI are given. Similarly, F-score, Precision and Recall measures are expressed in terms of True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN), where the expressions of the aforementioned can be found in [47], [48]. Most of these metrics have widely been used in the clustering methods described in the literature as well as in the state-of-the arts [34], [49]. The values of these metrics are normalized in the range [0, 1], with 1 indicating perfect clustering. However, in practise, higher values of these measures close to 1 imply good clustering results. In this work, all of the algorithms have been subjected to at least 20 trials, and the evaluation metrics are presented in terms of mean and standard deviations (m ± σ ).
We compare the performance of our proposed method on different hyperspectral sample sizes ∈ R S 1 ×S 3 ×N s , where the spatial dimensions S 1 × S 3 have been varied from 10 × 10 to 50 × 50. Furthermore, the number of papers selected for clustering have been divided into four cases. In Case I, the first ten papers (Paper 1 to Paper 10) listed in TABLE 2 are considered, whereas for case II, the first twenty papers, i.e. HP proofing paper to 65 gr/m 2 white paper, and so on. All of    Similarly, for the data tensor X ∈ R S 1 ×S 3 ×N ×N s , composed of several hyperspectral paper data patches of dimensions ∈ R 20×20×186 , proposed method have shown better performance over the other methods. The comparison results are tabulated in TABLE 4. For the data tensor X ∈ R 400×400×186 , in Case IV, proposed method have obtained the ACC value of (0.9075 ± 0.0027) and F-score value of (0.9005 ± 0.0025) respectively. Among the methods we compared, SCLRSmC produces comparatively good results for all cases mentioned in TABLE 4. Methods such as LRSC, LSR, shows severe decline in their performance for increased number of papers taken for clustering. Other methods, SCLRR and l 0 -LRSC produces good results for Case I and Case II of TABLE 4, but fails for increasing dimensions of the data tensor X .
We further tested our method by changing the spatial dimensions of the hyperspectral patch S 1 × S 3 in varying dimensions as 30 × 30, 40 × 40 and 50 × 50 respectively. The obtained results are reported in TABLE 5. Overall, the proposed method performs well under the conditions described in TABLE 3, TABLE 4 and TABLE 5. Moreover, it could outperform the existing methods and maintains consistent performance throughout the different scenarios considered in this work. The proposed method surpasses existing methods due to a number of the following factors. The tensorial arrangement of the hyperspectral samples could help in stacking them into distinct lateral slices. Moreover, each lateral slice of the data tensor X accommodates the information from all the spectral bands. Also, the dissimilarity matrix employed in the proposed method aids to get a proper f-diagonal structure for the representation tensor Z which can clearly showcase the high correlation that exists in intra-cluster datapoints and less correlation that exists between datapoints that belong to inter clusters. In addition, a comparison study of the affinity matrices generated by the proposed method and other stateof-the-art methods was also conducted and the learned affinity matrices are given in FIGURE 8. The proposed method could generate an affinity matrix with an accurate block diagonal structure, as can be seen in FIGURE 8 (g). This is also one of the reasons for the proposed method's consistent behaviour in producing good clustering outcomes. Among the compared methods, SCLRSmC and l 0 -LRSC produces comparatively better affinity matrices. At the same time, as seen in FIGUREs 8(a), (b) and (c), methods such as SSC [40], LRSC [41] and LSR [42] were failed to maintain the required block diagonal structure for their corresponding affinity matrices.
Hence, from the evaluation results, it is clear that proposed method can be used in the context of detecting forged one in a multi-page document. The method we proposed could effectively cluster the hyperspectral samples selected from different papers. As proposed in this work, it is observed that hyperspectral samples with similar spectral properties have been grouped into a single cluster, while those with a different spectral properties have been grouped into respective clusters. On the other hand, this results precisely demonstrates the truthfulness of the heuristic we proposed in our study. The forged paper/papers in a multipage document will be mapped into some other clusters and the original papers of the documents as a whole will be mapped into a single cluster and thereby the forged papers can be detected easily.

C. COMPARISON OF EXECUTION TIME
Even-though the proposed method consistently producing good clustering results, it has been also realized that, execution time of the proposed method increases with increasing dimensions of the data tensor X . The time required for the proposed method for various dimensions have been shown in FIGURE 9 (a). For data tensor, X ∈ R 1600×186×400 , composed by hyperspectral patches of 40 × 40 × 186, the computational time of the proposed method is nearly 3000 Seconds. Moreover, for X ∈ R 2500×186×400 , the execution time reaches to 5069 seconds. Hence, the computational time of the proposed method increases in a linear fashion with increasing dimensions of hyperspectral slices as represented in FIGURE 9 (a). Similarly, we made an analysis of the computational time required for all the algorithms. FIGURE 9 (b) shows the computational time comparison of all the algorithms with the data tensor X ∈ R S 1 S 3 ×N ×N s , where S 1 S 3 = 10, N s = 186 and N varies from 100 to 400. Similarly, in FIGURE 9 (c), we selected hyperspectral patches ∈ R 20×20×186 for all cases reported in TABLE 4.
Among the compared methods, SCLRSmC, takes much computational time as shown in FIGURE 9 (b). Methods, such as SSC, LRSC and l 0 -LRSC consume less execution time but the results are considerably reduced at varying scenarios as aforementioned. Hence, a disadvantage of the proposed method is huge computational time required for its execution. But, since the proposed method incorporates   the information embedded in all spectral bands, this may be acceptable to some extent. The problem of redundancy in spectral bands can be reduced by incorporating a simultaneous band selection to the proposed method. Then, the hyperspectral samples can be represented by minimum number of spectral bands and thereby the computational time of the proposed method can be reduced to a greater extent. However, we address this challenge in our future work.

D. PARAMETER TUNING, CONVERGENCE AND COMPUTATIONAL COMPLEXITY
The optimum values of the regularization parameters λ 1 and λ 2 have been determined by a grid search to achieve the best clustering results. For the proposed method, we fixed λ 1 = 0.0085 and λ 2 = 0.0045 for all the experiments. We evaluated the convergence behaviour of the proposed method against the evaluation metrics ACC as well as the representation error term, X − X Z ∞ which are presented in FIGUREs 10 (a) and (b) respectively. It has been observed that the proposed algorithm shows a good convergence rate and converges quickly within 10-20 iterations. The computational complexity of the proposed method lies in C ∈ R N ×N ×N s and Q ∈ R N ×N ×N s updates. Then, C update involves l 1 2 -induced TNN which requires, VOLUME 10, 2022 O(N 2 N s log 2 N s + 1 2 N s N 2 + N 3 ) and Q update with l 1 2 regularization requires O(N 2 N s ) operations per iterations. Overall the proposed method bears moderate computational complexity.

VI. CONCLUSION
A tensor framework for clustering of hyperspectral paper data with an application to forensic document analysis has been proposed. In the proposed framework, spectral information from the hyperspectral patches fetched from papers to be clustered are stacked into lateral slices of a third order tensor. Objective function of the proposed method incorporates l 1 2 -induced TNN which improves the low rankness of the representation tensor. Similarly, the structural constraint employed by means of l 1 2 regularization and the dissimilarity matrix facilitates to achieve f-diagonal structure of the representation tensor. The optimization problem formulated has been solved using Inexact Augmented Lagrangian Method. The proposed method has been evaluated and further compared with state-of-the-art clustering techniques. The results show that proposed method produces consistent clustering results and outperforms the other methods.