Unsupervised Functional Link Artificial Neural Networks for Cluster Analysis

In this paper, we propose a novel method of cluster analysis called unsupervised functional link artificial neural networks (UFLANNs), which inherit the best characteristics of functional link artificial neural networks and self-organizing feature maps (SOFMs). UFLANNs adopt three types of basis functions such as Chebyshev, Legendre orthogonal polynomials, and power series for mapping the input data into a new feature space with higher dimensions, where the objects are clustered based on the principle of competitive learning of SOFMs. The effectiveness of this algorithm has been tested with various artificial and real-life datasets including remote sensing images. A thorough comparison with other popular clustering algorithms shows that the proposed method is promising in revealing clusters from many complex datasets.


I. INTRODUCTION
An unsupervised learning system evolves to extract potential characteristics or regularities from underlying data, without being stated what outputs or class labels associated with the given feature vectors are desired [1], [2]. In other words, the learning system perceives and categorizes persistent input vectors without any feedback from the environment in particular an external supervisor or critic. Thus this type of learning is popularly and frequently employed for data clustering [3]- [6], feature extraction [7], and similarity detection [8]. In a nutshell, this paper focuses on developing an unsupervised learning of data clustering.
The associate editor coordinating the review of this manuscript and approving it for publication was Noor Zaman .
A metric space is a tuple (X , d), where X is a set and d : XxX → [0, ∞) is a metric and satisfy the axioms like (i) d (x, y) = 0 (ii) d (x, y) = d (y, x) and (iii) d (x, z) ≤ d (x, y) + d (y, z). A set P ⊆ X is defined together with a parameter 'k'. The main aim is to find 'k' points s. t., k ∈ C ⊆ P so that maximum distance of a point in P to the closest point in C is optimized (in this case minimized). In other words, minimize the cost r c ∞ (P) = Max p∈P d (p, C). Formally, clustering with 'k' means problem is to find a set C of k points such that r C ∞ (P) is minimized i.e., r opt ∞ (P, k) = min c,|c|=k r C ∞ (P). That is every unit in a cluster is in distance at most r C ∞ (P) from it's respective centre of gravity (mean) and clustering with k center is treated as an intractable problem. As of now, we do not have polynomial-time exact algorithms for solving intractable problems. Therefore, it is an urgent need to give attention to approximation algorithms. This is one motivation for developing an UFLANN.
In neural networks, unsupervised learning attempts to learn for responding to different feature vectors with different parts of the network structure [4], [42]. With no available information in connection to the desired outputs, unsupervised learning in artificial neural networks update weights vector only based on given input vectors and their connectedness [5].
The competitive learning network is a very popular approach in various areas to achieve this type of unsupervised learning of data clustering [42], [43]. All input neurons 'i' are connected to all output neurons 'j' with weights w ij . The number of inputs is the size of the input vector, while the number of outputs is equivalent to the number of clusters that the data points are to be divided into. A cluster center's position is specified by the weight vector associated with the connections to the corresponding output neuron. The output neuron with the largest activation must be selected for further processing point what is implied by ''competitive'' or ''winner take all'' [1], [34]. The Euclidean distance is used as a dissimilarity measure for competitive learning [43]. The weights of the output neuron with the lowest activation are updated using equation (1).
where w k , x, t, and η are denoted as weight vector corresponding to a k th neuron of the output layer, input vector, iteration number, and learning rate, respectively. A competitive learning network performs the data clustering on the given input instances. When the process is completed, the input data is divided into disjoint clusters (in the case of hard clustering) such that similarities between sample points in the same cluster are larger than those in different clusters. A main constraint of competitive learning is that the random initialization of weight vectors may be far away from any input vectors and, in the sequel, it never gets updated. Such type of situation can be prevented by initializing the weight vectors to instances from the input data itself, thereby ensuring that all of the weight vectors get updated when all the input samples are presented. An alternative approach would be to update the weight vectors of both the winning and losing neurons, but use a significantly smaller learning rate η for the losers; this is commonly referred to as ''leaky-learning''.
It is highly desired to change dynamically the learning rate η in the weight update formula of equation (1). A setting of an initial large value of η explores the search space widely; later on, a progressively smaller value refines the weights vector. The operation is very similar to the temperature cooling strategy of simulated annealing [44].
Competitive learning lacks the ability to add new clusters when it is deemed to be necessary. Furthermore, if the learning rate η is a constant, competitive learning does not guarantee stability in revealing clusters; the winning unit responds to particular patterns may continue changing during this training. On the other hand, η, if decrementing with time, may become too small to update cluster centers when new data of a different probabilistic nature are presented. Carpenter and Grossberg [23] referred to such an occurrence as the ''stability-plasticity dilemma'', which is common in designing a machine learning system. Adaptive Resonance Theory (ART) introduced by Grossberg proposes a solution to the above dilemma. Based on ART, Carpenter and Grossberg, proposed a series of similar networks, including ART1 [24], ART2 [24], ART3 [24], [25] and ARTMAP [24], [26].
On the other hand, if the output neurons of a competitive learning network are arranged geometrically (such as in a one-dimensional array or two-dimensional arrays), then we can update the weight vectors of the winners as well as the neighbouring losers. Such a capability corresponds to the notion of Kohonen feature maps [27], [28]. The above unsupervised neural networks have suffered from many difficulties like slow learning rate, trapping in local optimality, not scale well for patterns with a large number of elements, etc. Hence our second motivations ignite us to cope-up with these problems. It may be an alternative to enhance the original representation right from the start, in a linearly independent manner, so that hyper-planes for separation might be learned more readily [47], [48]. One way of enhancing the initial representation at a pattern is to describe it in a space of increased dimensions through functional links. It has also been observed that supervised FLANN has achieved remarkable success in many areas of pattern classification [29]- [31]. Therefore, if we can combine the best attributes of supervised FLANN and competitive learning networks like SOFM [32] for uncovering clusters then some of the problems of traditional clustering approaches including SOFMs can be easily addressed.
Further, in the process of searching an efficient, robust, and scalable clustering algorithm, recently owing to the development of deep learning, a set of new algorithms have been added under the umbrella of deep clustering e.g., Autoencoder [52], Generative Adversarial Network (GAN), [57] Variational Autoencoder (VAE) [58], Deep Subspace Clustering [56], and a few more presented in [51], [53], [54]. From the perspective of network architecture a detailed survey of deep clustering can be found in [55]. Although a series of developments have been made in the area of deep clustering but its computational complexity in the absence of a very powerful computing and visualizing unit opens up avenues for development of new clustering algorithms and this makes the third motivation of this journey for developing UFLANN.
Like SOFM, UFLANN is also trained by the method of competition learning. It learns a weight vector configuration without being told explicitly of the existence of clusters at the input, then it is said to undergo a process of selforganized or unsupervised learning. This is to be contrasted to supervised learning like gradient based learning, delta rule, back-propagation, etc., [40], [41].
The advantages of our method in discovering natural clusters over SOFM can be realized easily because instead of grouping around until we find a suitable sequence of transformation, we enhance the original representation right from the start and the hyper-planes for separation have been learnt more rapidly. In our UFLANN, we enhance the initial representation of a pattern by describing it in a space of increased dimensions.
The rest of the paper is set out as follows. In Section II, we discuss the preliminary materials like FLANN and SOFM. Our contributed work is presented in Section III. The experimental details and conclusions are presented in Sections IV and V, respectively followed by a list of very useful references.

A. FLANNs
The widely used functional neural network was designed to be computationally more feasible and efficient that the multi-layer perceptron. This architecture consists of a single layer of neurons using polynomial functions to expand the feature space to increase variance and faster convergence allowing the algorithm to fit more complex functions. The introduced non-linearity at each of the neurons which makes it easier to back-propagate the gradient as in the general neural networks, increasing the neuron's layers leads to increasing computational complexity. Thus, the introduced nonlinearity whatever they maybe would help us to learn a better representation of the data. A simple FLANN model [33] with a pattern of two features is shown in Figure 1. For this, in a single layer FLANN consisting of a two-dimensional input sample vector x = [x 1 , x 2 ] T is mapped to a higher dimensional space by functional expansion using trigonometric functions = [(x 1 , sin x 1 , sin 2x 1 , cos x 1 , cos 2x 1 ), (x 2 , sin x 2 , sin 2x 2 , cos x 2 , cos 2x 2 ), (x 1 * x 2 )] T . The expanded feature space is mapped to output activation O k using weight vector w k , the linear sum for which can be calculated as: θ k , is the inductive bias to the output. FLANN obtains the solution for the weights iteratively comparing the acquired sum to the ground truth class labels O k . The error in learning of FLANN is back-propagated by calculating cost functions as, where t k , is the desired output ε k is the final error and f is a non-linear activation, usually a logistic function to squish values between a specific range. The final weights are updated using the following policy, where, change is calculated as, where, −η (t) is the learning rate.

B. SOFMs
Kohonen's Self-organizing feature map (SOFMs) [32], [34] is an artificial neural network based on unsupervised learning. It works on a competitive learning algorithm which learns to fit the given datasets overtime and comes up with a learned representation of a map in the form of a 2D lattice to compress the vector space such that they best approximate the density distribution of the data as shown in Figure 2(b). They also simultaneously distribute the quantization prototypes on the rigid lattice by preserving their neighbourhood relations in the data space, a learned SOFM has relevant clustering information which can be extracted and used to find relationships among the data points, making them widely used for cluster extraction and data analysis tasks [35]- [37]. Let an input vector x = [x 1 , x 2 , x 3 . . . x n ], be mapped to each cell v in the rectangular/ hexagonal structure as shown in Figure 2(a). The mxn lattice of neurons is typically smaller than, the x. Each of the i th neuron comprises of weights w i , VOLUME 8, 2020 where, w i = [w 1 , w 2 , w 3 . . . w m ] T ∈ R n , the adaptation mapping criterion is based on the Euclidean distance norm i.e. x, w i ∈ R 2 . The minimum distance between x, and the weighted neurons is defined by w c . The updation of weight vectors is inspired by biological neurons that affect spatial neighbouring cells using lateral feedback connections and interactions.
Thus, in the neighbourhood ρ c (w c , k, t), around the updated cell c at a given time step t, the ripples of the revision of weights is propagated in a diminishing locality of radius. So, shrinking with time towards the end the radius gets constrained to the best-matching-unit (''winner neuron'') that gets updated, given by: 1. At each time t, present an input x(t), and select the winner, 2. Update the weights of the winner and its neighbors, Until the map converges.
Here, η(t) is the suitable learning rate, ρ c (w c , t) is the neighbourhood updating function with exponential decay.
The performance of a map for this optimization task is commonly measured by the mean squared error (MSE): Here || . . . || is the Euclidian distance norm. The algorithm can be briefly described as below: 1. Each neuron's weights are initialized either by randomly or by using some prior domain knowledge. 2. An input vector is selected at random from the set of training data. 3. Every neuron is examined to calculate which one's weights are with proximity of the input vector. The winning node is commonly known as the ''Best Matching Unit (BMU)''. 4. Then the neighbourhood of the BMU is computed. The amount of neighbours gradually decreases over time. 5. The weight of the winning neuron is rewarded with becoming more like the sample vector. The neighbours also become more like the sample vector. The closer a node is to the BMU, the more its weights get updated and the farther away the neighbour is from the BMU, the less it learns. 6. Repeat steps 2 to 6 until a stopping criterion is reached.

III. UNSUPERVISED FUNCTIONAL LINK NEURAL NETWORKS
Recall that a number of neural network models for unsupervised learning particularly clustering have been proposed so far [5]. In pattern recognition problem the pattern vector tend to form clusters, a center for each cluster, and therefore the first step for a typical unsupervised learning technique is the estimate of these clusters. The clusters are usually separated by regions of low pattern density. We present here a new method of unsupervised learning in higher-order neural networks known as unsupervised FLANNs. The newly proposed method reveals clusters through two major phases. In first phase we map the given input vectors in to higher dimensions. The details of mapping lower to higher dimensions of the input vectors are given below.
In this work, we have chosen three recurrences in respect of Chebyshev polynomial, Legendre polynomial, and power series for generating the first three polynomials for each category.
The recurrence relation for the Chebyshev polynomials is defined as: In the Chebyshev approximation the average error can be large but the maximum error is minimized. The Chebyshev approximations of a function are said to be min-max approximations of the function.
The Legendre polynomial forms an L 2 ([−1, 1]) -orthogonal set of polynomials and is also good choices for approximation. The following recurrence relation can generate the Legendre polynomials.
Similarly, the recurrence relation in respect of power series can be generated as follows: In all the above cases, we have considered the first three polynomials for functional expansion. The reason is that the Principal Component Analysis (PCA) of more added values from other higher-order polynomials, contributes very less in increasing the accuracy or right shape of clusters. Alongside, if we consider more number of polynomials for functional expansion in the method then it leads to higher computational cost.
Now it is to be noted that each of the above polynomial have their merits and demerits, hence to address some of the demerits of the said polynomials, we have taken all three polynomials for functional expansion coherently. For example, let the input vector x =<x 1 , x 2 >. Then we map this feature vector to a higher dimension by using these three polynomials combined as follows. That is a two-dimensional vector is mapped to an 18-dimensional vector. In general if we have a feature vector of n-dimensions, then the new dimension of the feature space will be 3 × (n × 3).
In the second phase of our proposed method a competitive learning networks also known as Kohonen's feature maps or topology preserving maps are used for revealing the natural clusters. Figure 3 depicts the architecture of the proposed method.
A step-by-step description of the proposed unsupervised FLANN to discover clusters is as follows: 1] The given set of input vectors mapped to a set of vectors with higher dimensions using the first three polynomials from each category like Chebyshev polynomials, Legendre polynomials, and power series.
x = x f , where x is the input vector and x f is the vector obtained after functional expansion.
2] The connection weights of the network are initialized using the PCA of the input data obtained after functional expansion.
3] Select the winning output neuron as the one with the highest similarity measure between all weight vectors w i and the vector x f . If the Euclidean distance metric is chosen as the dissimilarity measure, then the winning neuron 'c' satisfies the equation given below. x where, the index 'c' refers to the winning neuron. 4] Let Nc denote a set of index corresponding to a neighbourhood around 'c'. The weights of the winner and its neighbouring units are then updated by: w i = η(i) ( x f − w i ) , i ∈ Nc, where η(i) is a small positive learning rate. Instead of defining the neighbourhood of a winning unit we use here a neighbourhood function like Gaussian function ρ c (w c , i) around a winning unit 'c'. The Gaussian function is defined around 'c' is as follows: where, ρ i and ρ c are the position of the output units 'i' and 'c', respectively, and σ (i) reflects the scope of the neighbourhood radius which is a monotonically or exponentially decreasing function of time. By using the neighbourhood function, the update formula can be rewritten as: where, i is the index for all the output units.
To achieve a better convergence, the learning rate η(i) and size of the neighbourhood (or σ (i)) should be decreased gradually with each time. 5] Finally, the set of output neurons will have the learned representations of the clustered data. So, new data can be now passed through the network and mapped to the consecutive neurons for classification.

IV. EXPERIMENTAL DETAILS
To investigate the advantages of UFLANN in comparison to other unsupervised learning approaches, numerous experimental scenarios were considered. Both real-world and synthetic datasets were considered to compare against the learning tasks and multiple metrics were employed to study thoroughly the benefits of the proposed methodology.

A. DATASET DESCRIPTION
This section broadly describes the datasets that were used. In terms of the real-world datasets, the basic Iris dataset [38], [39] was used (referred to as Dataset 1), where the input features -sepal length, sepal width, petal length, and petal width served as the input to the unsupervised algorithms. The class division served into three main classes-Setosa, Versicolour, Virginica. The dataset was divided into two mutually exclusive parts-90% for training and 10% for testing purposes. Though training and testing terms in the case of unsupervised learning is confusing but here we have considered 90% for class discovery and then 10% for testing. Figure 4 shows the synthetic data (Dataset 2 and Dataset 3) for which the considered network had m = 25 neurons applied over L = 750 points on the 2D plane. Figure.4(a) was designed to cover the edge cases for types of data made use of analyzing effectively if neighbourhood function was optimizing to the exponential decay in η(t) & η(ν, k, t) for (t)th epoch(that is, t = 1, 2, 3 . . . . t max ; t max = 10000). It was developed to represent concentric circles, using: This generates the individual data points, where h and k are the centers and r is the radius of the concentric circles. Similarly in Figure 4(b), it was made to generate half-moons using line spacing on data points.
For outer circle: x = cos(random(0, π/2)) (23) y = sin(random((0, π/2)) For inner circle:  For Figures 5 and 6 also known as the famous spatial Chain-link datasets were generated using the same methodology as in equations (18) and (19), (L = 2000 points on the 3D plane, as offered in Fundamental Clustering Problems Suit [4]) but the axes are changed and in the Figure 6 Gaussian noise is added to investigate the effect on UFLANN's results. The learning algorithm had m = 49 neurons applied over the data in both cases. A data of a total of 112 experiments was averaged to provide for the current noise in Figure 6. This dataset is preferred as a benchmark to justify unsupervised clustering because according to Pearson's coefficient data is non-linearly-separable, so its PCA projection to a vectorial sub-space is impossible without data loss (σ 1 = 33.95, σ 2 = 33.46, σ 3 = 32.57). A few satellite images were scraped off from the internet to visualize and compare the clustering power of the provided approach, in terms of colour clustering for segmentation tasks. This has been mentioned in the latter part of the paper.
In addition to the above datasets, we have applied our proposed method over two well known datasets MNIST and CIFAR10 that are very popular in the field of clustering. MNIST [45]: A collection of 32 X 32 black and white pictures denoting handwritten digits (0-9), with ≈ 60000 images with labeled annotations. This dataset is the extension of the NIST dataset.
CIFAR-10 [46]: This consists of another 60000, 32 × 32 colour images in 10 classes, with 6000 images per class. These images belong to 10 real-life objects like airplane, cat, deer, etc. The dataset is divided into five training and one testing batch. The test batch contains 1,000 randomly selected images from each class. The training set contains the rest of the images in a random order.

B. EVALUATION METRICS
The performance of a clustering algorithm depends on the type of dataset that we are using and the type of improvement in performance that is to be achieved. For our research, we consider the common clustering metrics that have been used to compare the performance of such algorithms.
Firstly, we use the standardized accuracy evaluation metric because the synthetic/real data we generated/ scraped had predefined classes that could be used to treat it as a standard classification problem, and penalizing the metric for wrongly generated predictions. Thus, the first metric is unsupervised clustering accuracy (ACC): where, y i is the ground-truth label, c i is the cluster assignment generated by the algorithm, and the map is a mapping function that ranges over all possible one-to-one mappings between assignments and labels. Secondly, we use the Completeness-Score, which can be considered as Completeness metric of a cluster labelling when the ground truth label is given, a clustering result satisfies completeness if all the elements of a given cluster  are data points that are members of the same class. This can be regarded as a useful metric because it is independent of the absolute values of the labels: the permutation of the class or cluster label values won't change the score value in any way.
Thirdly, we used the Adjusted Rand Index, a form of the Rand index that is adjusted for the chance grouping of elements. The calculation of Rand Index mostly relies on the calculation of class-based contingency matrix thus for a set of N elements where S = {S 0 , S 1 , S 2 . . . . . . .S n } with partitions  Thus for two such subsets, the Rand Index is: tp -True positive; tn-True Negative; fp-False Positives; fn -False Negatives.
But, the Adjusted Rand Index is the corrected-for-chance version of the Rand index, the baseline uses the expected similarity of all pairwise comparisons between clustering specified by a random model, usually, Rand Index yields values between 0 and +1 but adjusted Rand Index can give negative results for unexpected values.
We also use the Silhouette Coefficient which is mostly used for data where the ground truth labels are not known, but we have not included the results as it did not seem to show the considerable difference for different clustering which is evident as it is calculated using the mean intra-cluster distance   a and the mean nearest-cluster distance b for each sample. Thus, Silhouette Coefficient for a sample is: In our work, we recognize that the use of popular clustering metrics for famous concentric circles problems would prove incompetent.

C. ENVIRONMENTAL PARAMETERS
The training for the proposed methodology was carried out on an Intel Core i5, 7 th Gen, 8.00GB RAM, and a Microsoft Windows 10 Home Edition based personal computer. The programming was carried out in Python 3.6, in Anaconda 3 development environment with Spyder. The datasets were divided into 90-10 training is to testing ratio for all data. The hyper-parameters used during experimentation have been specified in Section D. To validate our method on the MNIST [45] and CIFAR10 dataset [46], the input is flattened in to 784 units. Each row works as a feature vector and taken as an input to clustering algorithm.

D. RESULTS AND DISCUSSION
The performance of the proposed unsupervised FLANN as already discussed in Section III is compared with various competitive baselines and methods to prove the efficiency as tabulated in Table 1. Again the proposed method was also applied over the well-known data set MNIST and CIFAR10, the accuracy and the completeness score is presented in the Table 4 and 5. Figure 7 shows the plot for the ''averaged experiments''. This is because BIRCH and K-means perform well only on linearly separable data. Moreover, Figure 7(b) the additional 2D lattice maps provide insights into the data density structure of particular clusters in our approach. As we can see, UFLANN fairs over all the other approaches in terms of data representation by efficiently segregating the data. This shows the formation of two categories as far as the number of detected clusters (mean ± std. deviation) = (2.0 ± 0.0), which is a perfect match as represented by the symbols. Figure 8(a), (b) is the averaged result of 55 experiments done, on Figure 5, provides clear evidence on the presence of 4 and 2 cluster categories for K-means and Mini-Batch K-means respectively. The performance of clustering algorithms varies because of the random initialization of the starting point. Thus, to evaluate how sensitive the unsupervised FLANN is to the initialization, UFLANN has been executed 50-80 times on each dataset described in subsection A of IV. Table 2 helps us to compare the accuracy of different clustering algorithms against the same set of datasets. Similarly, in Table 3 we compare the silhouette coefficient of various datasets against the set of algorithms considered in this work. The Iris dataset (Dataset 1) has not been considered in the table as it had been considered earlier and our approach showed a fairly competitive baseline w.r.t it. We have replaced the competitive baselines with Agglomerative and DBSCAN clustering algorithms as they proved to be more useful in our experiments. In both cases, our algorithm fairs over the other baselines.
Howsoever, most competitive baselines come very close to the proposed methodology. But, further investigation into these algorithms revealed flaws that were exposed during our experiments -making them fail. Further in this section of the paper, we have discussed how minor changes in the used datasets, breakdown of the competitive baselines, and how our algorithm significantly would outperform the considered baselines in each case.
By applying our proposed method on the MNIST data set we got high accuracy, which is tabulated in Table 4. As the MNIST dataset consist of digits with black back ground, hence it made easy for UFLANN to identify the underlying function representing the digit. Again UFLANN gives better result in comparison to all other algorithms as the input provided to the UFLANN is passing through the Functional Expansion Network which represents the input in the form of different polynomial functions like legendry polynomial, Chebyshev and power series. While calculating completeness score as shown in Table 4 it is clear that the proposed method could able to classify the dataset properly which proves that there is no imbalance or bias towards a certain cluster. So, it can be derived that the proposed model has learned the underlying relationship between the data properly and act as a good classifier for each of the cluster. Fig. 9 describes the use of Hierarchical Clustering using agglomerative technique with the Euclidian affinity and linkage ward. The drawback of this approach is the algorithm itself utilizing- min {d (a, b) : a ∈ A, b ∈ B} , A, B → obsv. i.e. greedy approach for calculating core cluster center for intra-cluster variance rather than expanding or learning vector subspace or explicitly-differentiable features. Experiments were carried out using different values of threshold d coef = 0.1, 0.2, 0.3, 0.4 . . . .2.0, and k = 0, 1 . . . 4 clusters under observation to see how the algorithm behaves. This leads to unusual clustering patterns as visualized in dendograms in Figure 9(b).
Thus, the clustering categories were to be amongst, (4.0 ± 2.0) as observed. Density-based clustering was analyzed as a competitive baseline due to the efficiency in near-perfect classification as shown in Figure. 10. Through various observations, it can be analyzed that the concept of the algorithm heavily banks on recursively finding all its density connected points and assigning them to the same cluster as the core point. Thus, if the cluster capacity of 1 st cluster was reduced to less than 1/2 of other, then the algorithm fails as in Figure 10  multiple values of Gaussian noise the average result of which was taken after analysis. Finally, the results of our proposed methodology and the comparison with the basic SOFM are seen in Figure 11. The 2D grid of neurons by using a partial BP algorithm on the current neuron and the nearby neighbourhood of neurons, thus we see that the clustering is much more efficient, as seen in Figure 11(b). The results provided here are the best case are derived from 62 out of the 85 trials carried out (73%), furthering our study into the non-linearity of representation and performance over the previous dataset we experimented by trying to scatter the points of a specific cluster over a more generalized area i.e. adding noise for allowing expanding the vector subspace which would affect such that the distance metric of features between the clusters. Finally, we can see the advantages of UFLANN in Figure 12 where it out performs the basic SOFM algorithm as the expansion of the functional link allow the learning neurons to better map the data from Figure 6 as seen in Figure 12  Clustering on Image Data: The technique of reduction of image size by losing out on ''less-important'' information like colours is a concept introduced earlier. This technique generally utilizes unsupervised algorithms to learn information and can be used to segment satellite images by performing colour segmentation on images. As we proposed unsupervised FLANN which belongs to the plethora of such algorithms by experimental analysis we apply the proposed methodology to images and compare it with previous efforts to do the same. Figure 13 gives the vague conception of the clustering methodology of K-means, the most common pitfall in this is the specification of clusters beforehand which does not give the algorithm enough dynamicity to adapt to more complex data. The images are generally converted to HSV format for giving ease to clustering but this may lead to important feature loss in terms of small images with similar colors. Figure 14 shows the application of hierarchical clustering on such data, since the algorithm requires the consideration of all points in the linkage table for complex data like images the number of features are exponentially greater thus the image Yet, this method also leads to reducing the quality of the image and hence a reduction in the overall quality of image leading to loss of features. Density-based clustering as used earlier as a baseline is generally not commonly used for such tasks as the pixels clutter around regions of high pixel concentration as seen in Figure 14 (b).
For images we improved our algorithm such that the data to the algorithm in the form of batches which also is a hyper parameter effectively making use of the T-based clustering [4] so that data points are very near to each other do not get cluster in different clusters.
In Figure 15, using our proposed analogy, we see that the pixel values of the image increase overall and hence giving a more enhance image because the distances between neighbouring pixels values increase exponentially but this also has its pitfalls that it increases the Gaussian noise thus we mean-normalize the data as in Figure 15(c), Figure 16(c) which wouldn't cause a great decrease in performance because it was a linear transformation but it would allow us to learn more features suppress the unnecessary small features which are getting added to increase the unnecessary number of clusters which are present. The increase in the distance between different features which would effectively allow us to segment the image based on the color representations. We also saw that increasing the number of neurons allows us to learn more complex colors and improve distinguishing between the color representations of the images. UFLANN will especially aid the situation of low lighting or dim photography as it expands the feature space and then contracts bleak color representations which become separable this is also illustrated in Figure 16.

E. TIME COMPLEXITY ANALYSIS
In this section, we get into the details of UFLANN, and a brief analysis of our algorithm in comparison with the benchmark algorithms in consideration. The time complexity of the SOM is considered to be O(NC) i.e. N is the input vector size and C is the number of document presentation cycles. So subsequently UFLANN in worst case performs a given classification in O(k.n.m) time complexity for a given sample where ''k'' is the number of polynomial expansions and n, m are the sizes of the 2D lattice, where the input vector VOLUME 8, 2020 is compared to each of the weights to find the BMU. The average time taken by the algorithm is 0.026s for inference, compare that to the SOFM algorithm that takes 0.021s, and the sole FLANN algorithm that takes 0.015s for non-image data. The slight increase in time can compensate for the gain in accuracy achieved by the network. When compared with other benchmark algorithms for image datasets, it can achieve inference faster than most algorithms except simple ones like k-means which again is due to time-accuracy trade-off.

V. CONCLUSION
The proposed approach UFLANN has designed by inheriting the best attributes of FLANN and the competitive learning mechanism of SOFM. The experimental design of the proposed approach throws the light that UFLANN is better in accuracy than other algorithms taken in this paper for comparison. The results obtained show that this methodology can be used as a complete solution to clustering with acceptable accuracy for both numerical and image datasets. The main issues of colour image segmentation is being systematically addressed in this paper including perceptual uniformity in colour representation, colour reduction by mean normalization and clustering in unsupervised segmentation. The UFLANN algorithm shall serve very useful in photo sensitive base colour image segmentation in low-lighting conditions. The approach shows good results and has a straight-forward application in the vision domain. In nutshell, we provide a novel unsupervised learning based on FLANN that can find its application in a variety of fields like remote sensing image processing, earthquake studies, outliers detection, insurance, marketing, etc.