Toward More Efficient WMSN Data Search Combined FJLT Dimension Expansion With PCA Dimension Reduction

With the rapid development of 5G technology, the scales and dimensions of the data that are processed by Wireless Multimedia Sensor Network (WMSN) applications will be larger than ever before. Such high-dimensional data search becomes very difficult for WMSN applications. This paper proposes a more efficient WMSN data search algorithm that is based on the fruit fly olfactory neural framework, combined with the Fast Johnson-Lindenstrauss Transform (FJLT) and the Principal Component Analysis (PCA), called Fast Johnson-Lindenstrauss Transform Combine Principal Component Analysis-based Fly Locality-Sensitive Hashing (FP-FLSH). First of all, the data features are quantified numerically. Then, the fruit fly olfactory nervous system framework is used to project the data to a higher dimensional metric space using the low distortion projection FJLT. Finally, the dimensionality reduction process adopts PCA strategy to retain the maximum amount of information, and constructs its search index structure. Experiments are conducted on three larger scale benchmark data sets, and the results are as follows. Compared with the current mainstream search algorithms, the proposed method exhibits more efficient performance and can be effectively applied to WMSN applications.


I. INTRODUCTION
Wireless Multimedia Sensor Network (WMSN) is a new kind of sensor network, which has been widely used in security monitoring, intelligent transportation, environmental monitoring and other fields. WMSN sensor nodes are equipped with cameras, microphones and other sensors, which can collect and process video, audio, image and other multimedia data from the physical environment. However, with the development of 5g technology, the dimensions and scale of these WMSN data are larger than before [1]. Such highdimensional data search becomes very difficult for WMSN applications. A large number of studies show that fast search [2]- [4] has great application potential for WMSN applications, so it is of great significance to build a search structure with good performance.
The associate editor coordinating the review of this manuscript and approving it for publication was Anandakumar Haldorai .
At present, many researches on WMSN data search have been performed in [5]- [8]. These studies show that WMSN data search consists of three steps: (1) Preprocessing the original monitoring data, (2) Building an index structure using the standardized data, and (3) Mapping the query object into the index structure to obtain the query result. Building a good index structure using standardized data is a basic step. There are some researches on index construction in [9]- [12]. They can be divided into spatial partitioning based methods, random projection based hash methods and learning based hash methods. Although these methods have made some progress in query accuracy, there are problems of storage space and calculation cost in high dimensional space (data dimension exceeds 100 dimensions), which are mainly reflected in three aspects: Problem 1: The conventional tree index structure [13], [14] can perform well for small-scale data search. However, the performance of these methods will degrade when processing WMSN data. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Problem 2: The random projection hash structure needs to build a long index code to achieve good search performance, but this method consumes considerable memory resources.
Problem 3: The learning hash structure requires a long training time and consumes considerable time resources while achieving good search performance.
Aiming at all these problems, this paper proposes a new data search method called the FP-FLSH. Unlike latest data search methods, the main contributions of this paper are the following.
1) The FP-FLSH method that is proposed in this paper uses a low distortion projection method called the FJLT for dimensional expansion. Based on the olfactory system of bionic fruit fly, the characterized WMSN data are projected to a higher dimensional metric space. 2) A high-quality index code solution is proposed. We use the PCA method to reduce the dimensions and retain the features with the most information. 3) This method has better robustness and retrieval precision for WMSN data, and can perform well without constructing long index codes. Compared with the FLSH method in Science [15], our FP-FLSH method is more efficient.
The rest of this paper is organized as follows. Section II describes the related work. Section III proposes the novel FP-FLSH algorithm. Section IV discusses the distance preserving properties of the FP-FLSH algorithm. Section V conducts extensive experiments and compares the results with some mainstream algorithms using three real large-scale data sets. The conclusion and future work are given in Section VI.

II. RELATED WORK
The research on index structure of WMSN data is critical in many areas, such as information retrieval, machine learning, and pattern recognition. In general, data index structures can be divided into three categories.
In the first category, the index structure is based on spatial partitioning. Among the most representatives of such methods one can mention the KD-Tree [13], R-Tree [14], etc. These algorithms perform well when the data dimension does not exceed 20 dimensions. However, when dealing with high-dimensional data, they encounter ''dimensional curses'', and the performances of these algorithms will significantly decrease to be even lower than those of linear queries [16]. To solve the problem of data search in high dimensional space, many scholars have studied the approximate neighbor search [17]- [19]. The use of the approximate neighbor search optimizes the time complexity of the algorithm with respect to the similarity query.
In the second category, the index structure is a hash method based on random projection. Among the most representatives of such methods one can mention the Locality-sensitive Hashing (LSH) [20], p-Stable LSH [21], Order statistics LSH [22], etc. Indyk of Stanford University proposed a method called LSH [20]. Here, the condition of hashing function family H and a hash function h belonging to H is determined as follows. Let r represent the distance between two points, p represent the probability of a hash collision at two points, D(x, y) represent the distance between point x and point y, X represent collection and Pr R represent the probability that two points are mapped to the same bucket after hashing, when a function R satisfies the following conditions, it is called (r 1 , r 2 , p 1 , p 2 )−sensitive, where r 1 < r 2 , p 1 > p 2 . Given ∀x, y ∈ X , we have the following: Given a set of points P in a metric space (X , D) containing n data points, the LSH algorithm will look for L different hash functions and map each data point x to a length L hash code: Datar of Stanford University proposed an improved LSH based on a p-stable distribution [21]. By adopting a stable distribution under different dimensions, the algorithm can adapt to different distance metrics. Subsequently, Mayank and Panigrahy of Stanford University successively proposed new search algorithms [23], [24] in Euclidean space. To map similar data points with higher probability to similar hash codes, traditional LSH usually requires a large number of hash tables to meet this requirement, but this undoubtedly increases the computational complexity and memory occupancy. To solve this problem, Order statistics LSH [22] is proposed by kave eshghi of Hewlett Packard Laboratories in the United States. This method uses the property of rank distribution to develop a locality-sensitive hashing family, which has good collision rate property for cosine measure, but it takes longer to process a query. The FJLT method [25] proposed by NirAilon and Bernard Chazelle of the Princeton University. This method is based on Heisenberg's principle of random fourier [26] change to preprocess sparse projection, but this low distortion nesting is only demonstrated in theory, and does not show its superiority in practical application.
In the third category, the index structure is a hash method based on learning. Among the most representatives of such methods one can mention the Kernelized Locality-Sensitive Hashing (KLSH) [27], Spherical Hashing (SPH) [28], Principal Component Hashing (PCH) [29], etc. Brian of UCBerkeley proposed the KLSH algorithm [27]. Here, the kernel method is introduced into the index structure. By mining the internal structure of the data, additional training time is accepted in order to increase the retrieval precision. Jae-PilHeo of Korea Advanced Institute of Science and Technology proposed the SPH [28]. Here, the hash bits are determined by projecting the data onto a hypersphere instead of a hyperplane to maintain the spatial consistency of the original data points. PCH [29] was proposed by Jing of Microsoft Research Asia. Here, the method of principal component hashing is introduced into the retrieval to enhance the robustness of the algorithm to different data distributions.
In the past few years, some new research works have emerged in this field, including the Deep Convolutional Hashing (DCH) method [30] that was proposed by Sapkota, the HashNet method [31] and the Deep Visual-Semantic Quantization (DVSQ) method [32] that was proposed by Cao. Although the index structure based on a deep neural network has certain advantages with respect to the retrieval accuracy, its training time for large-scale data sets is longer, and the training quality has higher sensitivity to network parameters; therefore, there are still obstacles to its practical application. Sanjoy proposed a novel random projection based on the FLSH method [15] in Science. The hash process of the fruit fly's olfactory nerve is used to simulate the data sets. There are still some problems with FLSH. We find that although the FLSH uses the fruit fly olfactory nerves to simulate the hash process, its random sparse matrix will cause great loss of similarity, and the winner-passing strategy is not a feasible method.
Therefore, this paper proposes the FP-FLSH algorithm that is based on the fruit fly olfactory neural framework combined with the FJLT mapping and the PCA algorithm. The method can minimize the similarity loss of data.

III. FP-FLSH: LOCALITY-SENSITIVE HASHING ALGORITHM A. OVERALL FRAMEWORK OF FP-FLSH
The FP-FLSH method proposed in this paper can provide an effective solution for WMSN data search, the method consists of three basic modules, as shown in Fig. 1.

1) DATA FEATURE PROCESSING
Characteristic processing of WMSN data. In this process, the feature vectors of the image data are composed by extracting various features, including color features, texture features, shape features, etc. Audio data are transformed into feature vectors by extracting frequency-domain features and wavelet features. The feature extraction of video adopts Word2vec model, which does not depend on video tag.

2) FJLT MAPPING
The FJLT mapping consists of the Sparse-JL Matrix, the Walsh-Hadamard Matrix, and the Diagonal Matrix. The WMSN data after characterization are projected to higher dimensional spaces by FJLT mapping.

3) RETAIN MAIN COMPONENTS BY PCA
The projected data adopt PCA strategy to preserve the data features with the largest amount of information to reduce the similarity loss of data objects and construct high-quality index codes.

B. RANDOM PROJECTION BASED ON FJLT DIMENSION EXPANSION
The FJLT is a low distortion linear map with a random distribution of R d to R m . The random embedding ϕ ∼ FJLT is usually composed of three real-valued matrices, as shown in Fig. 2. ϕ = PHD, where P and D are random matrices, and H is a decision matrix. It is well known that sparse matrices are not suitable for low distortion nesting. Especially the input data are sparse vectors, the variance of the estimator is too high, which inevitably causes the data to lose a large amount of precision. Matrix P is essentially a sparse matrix, and so we cannot use it as a fast JL transform alone. We use the Heisenberg principle of the Fourier transform to overcome this obstacle. The mapping HD ensures that the data are smooth. Since the HD is orthogonal, the Euclidean norm is kept constant. Therefore, this Fourier transform-based random projection will minimize the distortion and enhance the distance preservation of the conversion matrix, resulting in a conversion matrix with higher time efficiency and precision. 1) Matrix P: It is a sparse matrix. It is an m × d matrix and its elements are independently distributed. m = αd, where α is the parameter, d is the original dimension. With probability 1 − q set P ij to 0 and otherwise set P ij from a normal distribution of expectation 0 and variance q −1 . The sparsity constant q is given as: therefore, P ij can be expressed as: 2) Matrix H : It is a d × d normalized matrix: here, i, j is the dot product of the m − bit vector i, j in binary (modulo 2) VOLUME 8, 2020 3) Matrix D: It is a diagonal matrix of d × d, with probability 1/2 set D ii to 1, otherwise set D ii to -1. For the input data X ∈ R n×d , u = HDX : The FJLT mapping is used to project the data into a higher dimensional metric space, thereby bionically simulating the olfactory nervous system of the fruit fly. Compared with the sparse mapping, the FJLT mapping has better coverage, which can better preserve the accuracy of the data.

C. PCA ALGORITHM BASED FEATURE EXTRACTION
We keep the characteristics of data to the greatest extent for the data that are projected into high-dimensional space. Then, the issue returns to how to extract the principal components of the projected data dimension, which is also a dimensionality reduction process. Here, the key is how to choose the base that retains the most information. Alternatively, if we have a set of ndimensional vectors, and we need to reduce them to k-dimensional (k < n) vectors. Then, we must choose k bases to retain more features. Suppose there are only two fields a and b, and then they are grouped into rows by matrix X : Then, multiply X by the transpose of X and multiply the factor by 1/m: it can be seen that the two elements on the diagonal of this matrix are the variances of the two fields. According to the matrix multiplication algorithm, this process can be generalized. There are m n-dimensional data records. We arrange them in an n × m matrix by column and let C = 1 m XX T , where C is symmetric and the diagonal line of the matrix is the variance of each field. The i-th row j-th column and the j-th row i-th column elements are the same, thus representing the covariance of the two fields i and j. Based on the above derivation, we find that achieving the optimization goal is equivalent to diagonalizing the covariance matrix. The elements other than the diagonal are zero, and the elements are arranged on the diagonal from top to bottom so that optimization is achieved. We further observe the relationship between the original matrix and the covariance matrix after the base transformation. Let the covariance matrix corresponding to the original matrix X be C, and P is a set of matrices consisting of rows. Let Y = PX and Y is X to P base-transformed data. Thus, the optimization goal becomes finding a matrix P that satisfies PCP T as a diagonal matrix, and the diagonal elements are arranged in order from largest to smallest. Then, the first k rows of P are the bases that are sought. According to the above, it is known that the covariance matrix C is a symmetric matrix. In linear algebra, the real symmetric moment has a series of very good properties.
1) The eigenvectors corresponding to the different eigenvalues of the real symmetric matrix must be orthogonal. 2) If the characteristic vector is λ, the multiplicity is r.
Then, r linearly independent eigenvectors must correspond to λ so that the r eigenvector units can be orthogonal. Therefore, a real symmetric matrix of n rows and n columns must find n unit orthogonal eigenvectors. Let the n eigenvectors be e 1 , e 2 , . . . , e n . Then, form them into a matrix by column: Then, for the covariance matrix C, the following conclusions are made: where is a diagonal matrix, and the diagonal elements are the eigenvalues corresponding to each feature vector (which may be repeated). Therefore, we obtain the matrix P that we needed: P is a matrix in which the feature vectors of the covariance matrix are unitized and arranged in rows, where each row is a feature vector of C. If P aligns the feature vectors from top to bottom according to the eigenvalues in , then multiply the matrix consisting of the first k rows of P by the original matrix X . The resulting matrix Y is the one that holds the most information.
As for the determination of the number of principal components, namely the value of k, if k is too large, the compression rate of the data is too low, in the limit case, k = n (only rotation projects different bases). On the contrary, if k is too small, the approximation error is too large. We usually consider the percentage of variance to determine the value of k. In general, let λ 1 , λ 2 , . . . , λ n represent the eigenvalues of (in order of large to small), so that λ j is the eigenvalue of the corresponding eigenvector e j . If we retain the first k components, the variance of the reservation can be calculated as k j=1 λ j n j=1 λ j . We can determine the minimum the value of k that satisfies the following conditions: in practical applications, according to the conclusion of [33], τ is a very small value. PCA is used to preserve the data features with the most information, and the main information remains after the dimension shrinks. This can maximize the query accuracy of the query mechanism in the WMSN.

D. FRUIT FLY OLFACTORY NEURAL FRAMEWORK
University of California scholars Sanjoy, Charles and Sake released the FLSH algorithm in Science [15]. The FLSH algorithm is inspired by the olfactory nervous system of a fruit fly, and it is a new variant that combines the olfactory nervous system of the fruit fly with locality-sensitive hashing. The whole process is mainly divided into three steps. The first step is to preprocess the input data, which is used in many computing pipelines. The second step involves the expansion of the number of neurons. Here, we amplify the Projection Neurons (PNs) into Kenyon Cells (KCs) through a sparse matrix M . M is an m × d sparse binary random matrix and m = 20d. Each row in M ij is independently selected. For each row in M ij , p represents the probability: In the third step, strong inhibition feedback using a single inhibitory neuron is used by the winner-passing mechanism. The values of the first k Kenyon cells (KCs) are retained (usually k is 5% of m) and the rest are set as zero. This winnerpassing mechanism produces a sparse vector Z ∈ R m (called a label) with the following: However, we find that sparse binary random matrices are not suitable for low-distortion data projection, and the neuron suppression strategy of the winner-passing mechanism largely sacrifices the similarity between data objects. Therefore, this paper models FLSH and proposes the framework model of the fruit fly olfactory nervous system. The model is divided into two parts. Firstly, the extracted data are mapped to higher dimensional spaces. Secondly, the data feature with the most information are retained and used as indexes. The most important contribution of the localitysensitive hashing strategy using the olfactory nervous system framework of the fruit fly is to change the traditional localitysensitive hashing construction method using data indexes, establish the connection between the cognitive neural body and the approximate neighbor search. This provides a new idea for hash search. The label generated by the fruit fly neural framework model can maintain the expected distance of the input odor, minimize the loss of similarity, and optimize the accuracy of the FP-FLSH algorithm.
Lemma 1: If two inputs n, n ∈ R d get projected to z, z ∈ R m , formula (2) shows that q is a sparse constant, we have: by formula (14), the distance between the data after mapping is determined by the parameter m and parameter q. The value of the parameter q is determined by the formula (2), so we only discuss the parameter m. when parameter m is large enough, the variance z is tightly concentrated around its expected value, such that formula (15) shows that the first step of fruit fly olfactory neural framework produces tags that preserve distances of input data in expectation. In the second step, we retain the data features with the largest amount of information through PCA, and retain the main information after the dimension is shrunk.
Together, these results demonstrate that the fruit fly olfactory neural framework model can improve the accuracy of the FP-FLSH algorithm. The computational complexity of the entire FP-FLSH algorithm is determined by the third step that uses PCA method, which preserves the top k data features with the most information. Its time complexity is O(kn 2 ). Therefore, the computational complexity of the FP-FLSH algorithm has a square order relationship with the amount of data.

IV. DISTANCE PRESERVATION ANALYSIS OF FP-FLSH
The conventional method of hashing is to reduce the dimension of data, so as to achieve the purpose of fast search, but the similarity of data will be greatly affected in the process of hashing. In the FP-FLSH algorithm that is proposed in this paper, in order to establish the index structure, the data are first projected to a higher-dimensional metric space using the FJLT projection transformation. Then, using the PCA method, the data features with the most information are retained and used as an index. To illustrate that our proposed FP-FLSH algorithm has good distance preservation, we introduce the concept of the maximum regression similarity.
Definition: Set the size of data before dimension change as n and dimension as l. The dimension after hash mapping and dimension reduction is g, the similarity loss parameter is ε 1 , and its corresponding maximum regression similarity is TR 1 = 1 − ε 1 . This function represents the maximum retention of data object similarity. The dimension after hash mapping and dimension reduction is S = v * g, and the similarity loss parameter is ε 2 , and its corresponding maximum regression similarity is TR 2 = 1 − ε 2 . Then the Algorithm 1 FP-FLSH Input: Query Data Q = (q 1 , q 2 , .., q d ) ∈ R 1×d n: the amount of data r: the number of approximate nearest neighbor searches d: initial dimension of data k: the number of hash codes retained by PCA m: the dimensions by FJLT mapping 1: Dataset Feature Processing→ X 2: X By FJLT mapping into Y = (y 1 , y 2 , .., y m ) ∈ R n×m y = FJLT (Q), Q ∈ X 3: Y By PCA into M = (m 1 , m 2 , .., m k ) ∈ R n×k p = PCA(y), y ∈ Y 4: R = Query(p, M , r) Output: R, R is the neighbor of Q difference value of the maximum regression similarity is TR = TR 2 − TR 1 .
Theorem: Data dimension expansion can get higher maximum regression similarity than data dimension reduction, in which the data dimension is set as l, the dimension after expansion set as V 2, and the dimension after reduction set as V 1. Proof: according to formula (16), when n and g are determined, the value of TR increases with the increase of v, that is to say, the maximum regression similarity of the extended dimension is better than that of the reduced dimension. When n and v are determined, the value of TR decreases with the increase of g, when g → +∞, TR → 0, So the similarity loss parameter is 0. According to the above theorem, we can conclude that the maximum regression similarity can be improved by expanding the dimension compared to reducing the dimension. Expanding the dimensions of the data can reduce the loss of similarity between data. Moreover, according to the findings of neurosensory (fruit fly olfactory) [34], the olfactory nerve of fruit fly will transmit different odors to more neurons to improve olfactory ability, so as to judge different odors. The algorithm in this paper expands the data dimension and has better distance preservation.

V. EXPERIMENT AND RESULT ANALYSIS A. EXPERIMENTAL PLANNING
In the experimental part, we test the performance of the proposed FP-FLSH algorithm on the approximate neighbor search task of WMSN data. We randomly select one percent of them as the query points, and the returned points are located in the first two percent of the nearest point set from the query points (as measured by the Euclidean distance), which are considered to be the query objects. Then, we approximate the neighboring data points. All data points in the database are sorted according to their Euclidean distance to the query. We repeated them 20 times and averaged the results of the searches to represent the accuracy for each set of experiments.
Next, we will perform three rounds of experiments on the algorithm according to the above criteria.
Experiment 1: Projecting the data to a higher dimension using the FJLT matrix. The projected dimension will directly affect the accuracy of the neighborhood search, select different α, and determine the optimal value of α. Experiment 2: The data are projected into the highdimensional space using the FJLT mapping and the sparse binary mapping, respectively. The neighbor search is directly performed, and the distance between the FJLT mapping and the sparse binary mapping is compared.

B. EXPERIMENTAL DATA SETS
To demonstrate the retrieval performance of the algorithm under different data distributions, we use WMSN data sets from three different fields for the comparison experiments. These three data sets are commonly used evaluation data sets in the WMSN field.
1) SIFT: The image data that contains 10,000 SIFT features, each of which is represented by a 128-dimensional vector. 2) MNIST: The handwritten digitally identified data contains 10,000 MNIST features, each of which is represented by a 784-dimensional vector. 3) GLOVE: The word data contains 10,000 GLOVE features, each of which is represented by a 100-dimensional vector.

Experiment 1:
This experiment is designed to investigate the magnified dimension m = αd after the data are projected through the FJLT matrix and determine the optimal value of α. During the experiment, α is set to 1, 2, 4, 6, and 10 for the three different data sets. In addition, using the PCA method, the reserved hash code lengths are 16-bit, 32-bit, and 64-bit. In this way, the optimal value of α is determined. That is, the optimal precision of the neighbor search is determined.
Through the experiments,the experimental results are shown in Fig. 3. The data are projected to a higher dimensional space by the FJLT matrix. The larger the α is, the higher the accuracy of the neighbor search. However, when α is greater than 6, which means that the data are projected more  (a) Comparing the coverage of the two projection matrices using the SIFT data, (b) Comparing the coverage of the two projection matrices using the GLOVE data, and (c) Comparing the two projection matrices using the MNIST data. than 6 times, the accuracy of the neighbor search does not significantly increase. Moreover, the larger the dimension of the data expansion is, the more the time complexity and space complexity of the algorithm will increase. Therefore, α = 6 is the best value. We will use the FJLT matrix to project the initial data dimension to 6 times the original dimension, which will achieve the best results. Experiment 2: This experiment is designed to compare the distance preservation of the two mapping when expanding the data. During the experiment, FJLT mapping and sparse binary mapping are used to expand the dimension of the data to 1, 2, 4, 8, and 16 times their own dimension, then we directly perform the neighbor query and compare the distance preservation of the two mapping extension dimensions.
Through experiments, the experimental results are shown in Fig. 4. On the three different data sets, the FJLT mapping is better than the sparse binary mapping in the expandsion process of data features. Therefore, using the FJLT mapping to project data into a higher dimensional metric space is more suitable for the fruit fly olfactory neural network simulation algorithm. Compared with the traditional sparse mapping, the FJLT mapping has better coverage, which can better preserve the accuracy of the data.

Experiment 3 (Comparative Experiment):
The FP-FLSH algorithm is compared with the following algorithms. The experimental results are shown in Fig. 5. 1) LSH [20]: This algorithm is a hashing algorithm based on random projection. The projection vector maps similar data to the same hash bucket in Hamming space. 2) DSH [35]: Unlike existing random projection-based hashing methods, density hashing attempts to use the geometry of the data to guide the selection of the projection (hash table). 3) SPH [28]: The hash bit is determined by projecting the input data onto a hypersphere instead of a hyperplane to maintain the spatial consistency of the original data points. 4) ITQ [36]: It reduces the mapping error of the index structure by rotating the reduced dimension data. 5) FLSH [15]: The fruit fly olfactory system is utilized in the projection strategy and selection and the fruit fly olfactory system is combined with LSH. 6) RS-FLSH [37]: A new sample-based Drosophila localsensitive hashing model is proposed to simulate the randomness of the synaptic establishment between neurons. 7) BCH-LSH [38]: The source data are mapped to the hash space by using the characteristics of the BCH code design's distance. From the experimental results, we can see that the retrieval accuracy of FP-FLSH has achieved better results compared with the other LSH algorithms. Fig. 5 shows the performance VOLUME 8, 2020  of the algorithm in the comparative experiments on the three data sets as a function of the length of the code. On the three data sets, when a shorter coding length is taken, the search performance of FP-FLSH is still better than the mainstream hashing algorithms, which means that our proposed algorithm is suitable for scenarios with low memory usage. On the MNIST data set, the search performance of the FP-FLSH algorithm is better than the mainstream hashing algorithms, indicating that the FP-FLSH algorithm is also applicable to the higher-dimensional data set. It also shows the FP-FLSH algorithm proposed in this paper is a good combination of locality-sensitive hashing and fruit fly olfactory nervous system.
Through the comparison experiments on the three data sets, we can see that FP-FLSH has better robustness to data with different scales, different dimensions and different distributions. The FP-FLSH method can optimize the data index structure, which can improve the search performance of the algorithm.
In this section, we have carried out experiments on the proposed FP-FLSH method. Experimental results show that FP-FLSH method is suitable for WMSN data search.

VI. CONCLUSION AND FUTURE WORK
With the rapid development of current 5G technology, the WMSN has presented broad application prospects in many fields. Due to its large scale and high dimension, the practical application of the good performance index structure must be established. In this paper, a novel FP-FLSH method for WMSN data is proposed. The method utilizes the fruit fly olfactory neural framework and uses the lowdistortion random projection FJLT method to project the data into a higher-dimensional metric space. The PCA method is used to preserve the most informative features. Experiments on three real-world data sets have been carried out and show that the FP-FLSH algorithm that is proposed in this paper has some advantages compared with the latest LSH algorithm.
In future work, we will devote ourselves to studying the low distortion projection matrix and finding a more suitable projection matrix based on the fruit fly olfactory system to further improve our query accuracy. In addition, the secondary extraction strategy for data object features after projection will also be a focus of future research work. We will further enhance the performance of the WMSN data search in conjunction with the work of this group.