Improved Genetic Algorithm to Optimize the Wi-Fi Indoor Positioning Based on Artificial Neural Network

In order to make more effective use of Wi-Fi fingerprint data to position an object, an improved adaptive genetic algorithm (IAGA) is proposed to optimize the BP (Back Propagation) neural network, namely, IAGA-BP. In this method, the selection, crossover and mutation operations of the genetic algorithm are used to optimize the weights and biases of the BP neural network. On the one hand, the proposed algorithm improves the selection operator in the adaptive genetic algorithm on the basis of preserving the optimal strategy. That is, the population of each generation will be sorted according to the adaptability from the highest to the lowest, then the highest 20% of the population will be directly inherited to the next generation while the worst 20% will be eliminated. The remaining 80% of the population will be selected by a roulette algorithm based on the selection probability of each individual, as to ensure the population volume unchanging. On the other hand, the crossover and mutation probability formulas in the adaptive genetic algorithm are improved. The crossover and mutation rates will be adjusted to preserve superior individuals and genes according to the level of individual adaptability and the current evolution stage of the population. The simulation results show that compared with the traditional Wi-Fi positioning method, the proposed Wi-Fi positioning method has a faster convergence speed and better positioning accuracy of 2.48 meters.


I. INTRODUCTION
With the rapid development of cellular network technology and smart phones, location-based service has been studying by numerous researchers in home and abroad. GPS is the most popular solution for outdoor positioning, due to its splendid performance in the open area. However, the indoor accuracy of GPS drops severely due to signal fading and the multipath effect [1], [2]. With the increasing development of Wi-Fi infrastructure built in major cities recently, Wi-Fi has become the most common indoor network [3]. It plays an increasingly important role in indoor positioning technology.
At present, the method of RSS (Received Signal Strength) from multiple Wi-Fi APs (Access Points) is commonly used The associate editor coordinating the review of this manuscript and approving it for publication was Chun-Wei Tsai . in the indoor positioning technology. This method can be further divided into RSS ranging positioning and RSS location fingerprint positioning [4]. Literature [5] proposed a sensor fusion framework using the smartphone inertial measurement unit (IMU) sensor data and RSS measurements. The method of establishing fingerprint database is to set up several fingerprint reference points in the location area, then collect RSS from APs, and finally, the obtained RSS is used to establish the fingerprint database. The establishment diagram of Wi-Fi fingerprint database is shown in Fig.1.
The traditional indoor positioning algorithm uses collected RSS data and the corresponding distance recorded to fit the distance path loss model. Due to the time-varying of the signal, the RSS values are not specific. Although the data is processed with smoothing and denoising, there is also a large error. Based on this situation, the artificial neural network  algorithm is used to solve the mapping problem of fingerprint data and distance.
Neural network is an intelligent algorithm model that simulates the thinking mode of the human brain. By learning a large amount of data to find out the rules, neural network is a highly efficient tool to solve nonlinear problems [6]. BP (Back Propagation) neural network is a multi-layer feedforward network, and its structure is shown in Fig.2. It adopts the error function gradient descent method for machine learning. The learning process mainly includes the forward propagation of input signal and the back propagation of error [7]. In the forward propagation, the input signal from the input layer is processed in the hidden layer and transmitted to the output layer. If the actual output is inconsistent with the expected output, the error back propagation process will be carried out. In the back propagation, the error is transferred from the output layer to the input layer, and the link weight coefficient is modified continuously, finally resulting in error reduction [8]. However, BP neural network also has some defects, such as slow convergence speed and easy to fall into local optimum [9].
Genetic algorithm is a global optimization random probability search algorithm formed by learning from the evolution process of populations in nature and simulating the inheritance and variation of individual chromosomes in populations. The genetic algorithm maps the problem to be solved into the living environment of the population, and each individual in the population represents a potential solution of the problem [10]. The algorithm measures the adaptability of individuals to the environment through fitness value, so as to reflect the pros and cons of each individual. Superior genes are identified according to the fitness of individuals and are passed on to the next generation through selection, crossover and variation. At this point, the algorithm completes the simulation of the evolution process of the natural population. With the iteration of the algorithm, the overall fitness value of the population will be improved continually, and the optimal individual in the population will keep approaching the optimal solution of the problem [11]. It was first proposed by Holland, J.H. of the University of Michigan in the United States [12]. However, when it carries out crossover and mutation operations, the advantages and disadvantages of individual operators are not considered, and only the fixed crossover rate and mutation rate are adopted. The defect of this algorithm is the limited convergence speed. Therefore, Srinivas and Patnaik [13] proposed the AGA (Adaptive Genetic Algorithm), in which individuals in the population can adjust the crossover and mutation probability accordingly to the surrounding environment. However, in the early stage of population evolution, individual with high fitness has a low probability of surviving and therefore, evolving, hence the optimal solution obtained is not necessarily the global optimal, which increases the possibility of the evolution process moving towards local convergence. In recent years, many researchers have studied and improved the genetic algorithm. In [14], differential evolutionary variation strategy was introduced into the genetic algorithm to increase the diversity. In [15], simulated annealing method is used to select the population and improve its diversity.
So far, there have been some researches on genetic algorithm combined with BP neural network. In [16] and [17], the selection operation adopts the optimal preservation strategy, and the crossover operation and mutation operation adopt fixed crossover rate and mutation rate. These methods do not consider the influence of population evolution on individuals. Literature [18] adopts mixed coding method. Each chromosome is made up of control codes, weight codes and threshold codes. Control codes are made in the form of binary codes, while weight codes and threshold codes are done by means of real coding. In the process of crossover and mutation, only to take operation of weight code and threshold code, take no action of the control code. Literature [19] design adaptive operators, whose mutation rate and crossover rate are automatically produced with inverse proportion to variance of population fitness. However, there are few researches on the application of genetic algorithm combined with neural network in the field of indoor positioning. Based on the current literature, there are two related articles found. Paper [20] proposed a cascading artificial neural network localization model aided with genetic algorithm. The main idea of this paper is the cascading artificial neural network-based positioning model while not the improvement of genetic algorithm. Considering the same time complexity (one neural network model), the estimation error is about 2.8 meters which is higher than our method. In [21], a method to predict pedestrian step size based on neural network and genetic algorithm was proposed. However, this paper adopts fixed crossover rate and mutation rate are not used to improve the genetic algorithm. Compared with these literatures, this paper has improved the genetic algorithm as follows. VOLUME 8, 2020 Our proposed algorithm comprehensively considered the level of population evolution. The cosine similarity of the crossover individuals is calculated to adjust the crossover rate and mutation rate, so as to avoid the slow evolution speed of the population at the initial evolution stage and the annihilation of superior genes at the later stage. This method will avoid the evolution process falling into the local minimum.
This paper proposes a positioning method combined with machine learning, which takes the data in the Wi-Fi fingerprint database as the training set to train the BP neural network positioning model. In the model training stage, this paper uses the genetic algorithm to optimize the initial weights and biases of BP neural network to overcome the disadvantages of BP neural network, such as slow convergence speed and an easiness to fall into the local minimum. This paper focuses on the optimization of the improved genetic operator selection, adaptive crossover rate and mutation rate, which effectively improves the optimization ability of the genetic algorithm. After the model is trained, the RSS signal data is set to be the input value of the model as the test set to obtain the predicted coordinates, and when the set termination conditions are met, the evolution process will be ended. The positioning error of the proposed method is 1.07 meters less than that of the BP neural network method proposed by [22].
As for the non-ANN positioning methods, paper [23] proposed the KNN positioning algorithm of gaussian function weighting, with an average error of 2.68 meters. In [24], a new weighted adaptive position estimation algorithm was proposed, and the mean error was 2.98 meters. The error in this paper is 0.2 meters and 0.5 meters less than them respectively.
The contributions of this study include two aspects as follows: First, we theoretically and experimentally validated the feasibility of genetic algorithm to optimize BP neural network, thus providing an improved adaptive genetic algorithm optimizing of the initial weights and biases of BP neural network.
Second, we provide a new solution to achieve Wi-Fi fingerprint positioning. To the best of knowledge, our proposed study is the first method used to optimize BP neural network for indoor Wi-Fi positioning by using an improved genetic algorithm. The proposed positioning method is combined together with the improved BP neural network will help analyze the features of the Wi-Fi fingerprint data. As for the reason that we avoided using the CNN (Convolutional Neural Networks), it is because the CNN is mainly used in natural language processing and video image processing. A large number of data sets are required to train a CNN model while the BP neural network will be sufficient to train a small RSS data set model.

II. IMPROVED ADAPTIVE GENETIC ALGORITHM
Aiming at the defects of BP neural network, the initial weights and biases of BP neural network are optimized by using the global searching ability of genetic algorithm.
Genetic algorithm simulates the process of biological evolution which includes selection, crossover and mutation. The optimized BP neural network can better predict the output of the function, reduce the possibility of BP search falling into local optimization, and improve the speed and stability of search solution.

A. CODING AND FITNESS FUNCTION SELECTION
This paper utilizes real coding rules. In the searching process, the reciprocal of mean square error is used as the criteria to evaluate the individual. The fitness function is shown as where x is the individual of the population corresponding to the weights and biases of BP neural network. E(x) is the mean square error of the corresponding BP neural network simulation output, and the larger the fitness is, the better the individual x is. C is a constant.

B. GENETIC OPERATIONS 1) IMPROVED SELECTION OPERATOR
The traditional genetic algorithms often use fitness ratio method and optimal individual preservation strategy to select individuals. Although this method has a high selection probability of good parents, it still has a random error, so that individuals with high fitness are eliminated, reflecting poor competitiveness [25]. Therefore, this paper proposes a new selection operation method, which can effectively select the superior individuals in the population. The algorithm steps are as follows: a) To determine an initial population and calculate the fitness value of each individual in the population. b) To sort the individuals in the population in descending fitness order. c) To copy the top 20% of individuals in the population to the next generation and eliminate the last 20%.
d) The remaining population individuals are selected by the roulette algorithm to ensure the amount of the population remains unchanged in the evolutionary process.
The roulette algorithm is also known as the proportional selection method, the basic idea is that the probability of an individual being selected is proportional to the fitness of the individual [26]. The process of selecting individuals using the roulette algorithm is shown as follows: a) To calculate the cumulative probability of each individual using (2).
where x i represents the i th individual in the population, f (x i ) represents the fitness value of x i , and N j=1 f (x j ) represents the sum of the selection probability of all individuals in the population. b) To calculate the cumulative probability of each individual using (3).
where i j=1 f (x j ) represents the sum of the selection probabilities of all the individuals before the i th individual. c) To generate a random number between 0 and 1, that is, r. If q i > r, individual x i is selected.

2) CROSSOVER OPERATION
The crossover operation of genetic algorithm means that two matched chromosomes exchange their part genes with each other according to a certain crossover probability, thus forming two new individuals [27]. The cross formulas are used as follows, where w mi and w ni represent the i th bit of the m and n genes respectively, and b is the random number between [0, 1].

3) MUTATION OPERATION
The mutation operation of genetic algorithm is the process to obtain a new individual from the mutation of a randomly selected individual from the population according to a certain mutation probability [28]. The mutation formulas are shown as follows, and where, w max and w min are the highest and lowest value of the gene w mn respectively, r and r 2 are random numbers between [0, 1], g is the current iteration number, and G max is the maximum iteration number.

C. ADAPTIVE GENETIC ALGORITHM
When carrying out crossover and mutation operations, the standard genetic algorithm does not consider the advantages and disadvantages of individual operators, but only adopts fixed crossover and mutation rates, which greatly limits the convergence speed of the algorithm. Therefore, Srinivas and Patnaik [13] proposed AGA, that is, individuals in the population can adjust the crossover and mutation probability adaptively according to the surrounding environment, and the specific formulas are shown as follows: and p m = where f is the larger fitness value of the two individuals to be crossed, f max is the maximum fitness of the population. f avg represents the average fitness of the population, f represents the fitness of the mutant and k 1 -k 4 are the adaptive control parameters. According to (7) and (8), when the individual fitness is lower than the average fitness of the contemporary population, a higher crossover and mutation probability is adopted. When the individual is close to the maximum fitness of the contemporary population, a lower crossover and mutation probability is adopted to reduce the destruction of superior genes. However, in the early stages of the population evolution, individuals with high fitness rarely get the opportunity of evolving, and the optimal solution obtained is not necessarily the global optimal, which increases the possibility of evolution moving towards local convergence [29].

D. IMPROVED ADAPTIVE GENETIC ALGORITHM (IAGA)
In this paper, the calculation methods of the crossover rate and the mutation rate in adaptive genetic algorithms are improved as follows, and where p c is the crossover probability, p m is the mutation probability, f is the larger fitness value of the two parental individuals of the crossover operation, f is the fitness value of the individual performing the mutation operation, f max is the maximum fitness of individual population, f avg is the mean fitness of all individuals whose fitness is greater than the population average fitness, x i and y i are the two individuals to be crossed, s c is the cosine similarity of the gene encoding of the two individuals to be crossed. k 1 -k 6 are adaptive control parameters, and they are constants between (0-1), iter is the number of contemporary population evolution, and T is the number of evolution generation. The improved formulas comprehensively consider the level of population evolution, and use the cosine similarity method to calculate the cosine similarity of the crossover individuals, so as to adjust the crossover rate and mutation rate, avoiding the slow evolution of the population at the initial stages of evolution and the destruction of superior individual genes at the later stages of evolution, which is conducive to the evolution of the population in a better direction.

III. WI-FI POSITIONING BASED ON IAGA-BP ALGORITHM
Wi-Fi positioning method based on IAGA-BP algorithm is mainly divided into two stages [30]. The first step is the model training stage. Before model training, the transfer function, training function, network performance function and various parameters of the BP neural network should be determined. Once that is completed, the prepared training data set will be input into the BP neural network for learning. The second step is the positioning stage, that is, the test data set is input into the trained model to verify the accuracy of the positioning model. BP neural network optimized by genetic algorithm can be divided into three parts: structure determination of the BP neural network, the optimization of the genetic algorithm and the prediction of BP neural network. The specific process is shown in Fig.3.
Amongst them, the BP neural network structure is determined according to the number of input and output parameters, and then the length of each individual of the genetic algorithm is determined. The process of genetic algorithm optimization uses the genetic algorithm to optimize the weights and biases of BP neural network. Each individual in the population contains a network weight and bias. Fitness function is used to calculate the fitness value of each individual. The genetic algorithm finds the optimal fitness value by selection, crossover and mutation, then the BP neural network prediction use the genetic algorithm to obtain the optimal individuals to assign the initial weights and biases to the network, and the prediction results obtained by the network training can verify the quality of the model.

A. MODEL TRAINING STAGE 1) INPUT LAYER AND OUTPUT LAYER
The model in this paper takes the RSS values at each reference location from 16 APs as the input data, and the coordinates (X , Y ) of each reference location are used as the output data, so the number of nodes in the input layer is 16 and the number of nodes in the output layer is 2.

2) HIDDEN LAYER
Relevant studies show that a neural network with a hidden layer can approach a nonlinear function with arbitrary accuracy as long as there are enough hidden nodes [31]. In this paper, a BP neural network with one hidden layer is used to establish the prediction model. At present, there is no definitive formula to determine the number of hidden layer neurons, only some empirical formulas can be used to help determine the number. It takes experience and multiple experiments to determine the number of neurons. This paper refers to the following empirical formula [32] to select the number of hidden layer neurons, where, n is the number of neurons in the input layer, m is the number of neurons in the output layer, and α is a constant between [1,10]. According to the (14), the number of neurons can be between 5 and 15. In this experiment, the number of neurons in hidden layer is 13.

3) MODEL DETERMINATION
This paper uses MATLAB neural network toolbox for network training. In this model, the Levenberg-Marquardt (LM) algorithm is used in the network training to update the weight and bias values, while the initial weight and bias are optimized by the genetic algorithm proposed in this paper. The specific process is shown in Fig.3. Mean square error (MSE) is the loss function to estimate the fitting performance. The training data need to be normalized before inputting into the neural network. Network parameters need to be set before training, the maximum number of neural network iterations is 5000, the expected error is 0.00000001, and the learning rate is 0.01. After the neural network reaches the expected error during repeated learning, the learning is completed and the model training is finished.

B. POSITIONING TEST STAGE
The normalized test data set (AP's RSS values at different locations) is input into the trained model, and the model  outputs the results according to the characteristics of the test data. Then the obtained data are de-normalized to get the predicted data, that is, the (X , Y ) coordinates corresponding to each test data.

A. EXPERIMENTAL DATA SET
This paper uses the data set published by Barsocchi et al. [33] in the 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN2016). RSS fingerprint data at 325 different locations were collected using mobile phone. After deleting the Wi-Fi APs with low RSS values, the remaining 16 APs are used as the input data samples, and the corresponding (X , Y ) coordinates were used as the output labels. 260 data sets were randomly selected as the training set and the remaining 65 data sets as the test set. The RSS heat maps for the training and testing are shown in Fig.4 and Fig.5.

B. EXPERIMENTAL RESULTS AND ANALYSIS
The data set with 65 samples mentioned above is used to test the positioning performance of the BP neural network optimized by the improved genetic algorithm. Firstly, the  normalized data set with 260 samples were trained with the improved algorithm. Fig.6 shows the changing curves of population fitness based on AGA and IAGA with the number of iterations increasing. The trained neural network model is used to predict the position of 65 groups of data. The actual coordinates and estimated coordinates are shown in Fig.7, the neural network training error curve is shown in Fig.8, and the regression analysis figure is shown in Fig.9.
As shown in Fig.6, the performance curve of AGA-based BP neural network tends to be stable in the 23rd iteration, while the proposed IAGA algorithm tends to be stable in the 15th iteration. At the same time, the fitness level of AGA is about 0.36, while the proposed IAGA reaches to about 0.4, and the higher fitness level can find better weights and biases. Fig.7 shows the estimated position of BP neural network optimized by the proposed algorithm. The test data is a random sample of fingerprint data collected in two orthogonal corridors. The actual scene of the points with large offsets in Fig.7 is the staircase in the corridor, where the signal strength is weak, thus affecting the experimental accuracy. Although some estimated values are a bit far from the real position, in general, the location results are quite close to the VOLUME 8, 2020  real coordinates to a large extent. Since the data set used in this paper is provided by IPIN, the positioning results of this paper are compared with the latest positioning accuracy of the competition. The average error is as low as 2.4848 meters, which is 53.23% lower than the 3.8 meters of the champion's result in IPIN2019 Smartphone-based group [34]. Compared with Ref [20], under the same time complexity (one neural network model), the estimation error is about 2.8 meters which is higher than our method. Compared with the traditional positioning method, the positioning error of the proposed method is 1.07 meters less than that of the BP neural network method proposed by [22]. As for the non-ANN positioning methods, paper [23] proposed the KNN positioning algorithm of gaussian function weighting, with an average error of 2.68 meters. In [24], the average error of a new weighted adaptive position estimation algorithm is 2.98 meters. Their errors are 0.2 meters and 0.5 meters higher than the error in this paper, respectively. The algorithm proposed in this paper greatly improves the accuracy of Wi-Fi fingerprint location. Fig.8 shows that the optimized BP neural network reaches the set error accuracy of 0.017164 after 7 iterations. Figure 9 shows the regression analysis of neural network training and prediction. Some data are used for training, some data are used to verify the training, and the rest of the data are used for testing. The upper left figure is the regression of training, the upper right is the verification, the lower left is the test, and the lower right is overall result. The fitting degree of each stage is fine, which indicates that the trained BP neural network model works well.

V. CONCLUSION
In this paper, an improved adaptive genetic algorithm is proposed to optimize the BP neural network, so that a new Wi-Fi positioning model is realized. The genetic algorithm is combined with BP neural network to overcome the disadvantages of BP neural network, such as slow convergence speed and easy to fall into the local minimum. This paper focuses on the optimization of improved genetic operator selection, adaptive crossover rate and mutation rate, which effectively improves the optimization ability of genetic algorithm. Applying this model to Wi-Fi fingerprint positioning will help analyze the features of the Wi-Fi fingerprint data. The experimental results show that the IAGA-BP model can effectively improve the accuracy of Wi-Fi positioning and accelerate the convergence speed of the neural network. In the work that we will do following this paper, we will consider the fusion of Wi-Fi fingerprint location and smartphone sensor location using the particle filter. In addition, we will combine the particle filter method with the resampling method based on reinforcement learning to provide robustness for location failure. CHUNLEI WU received the Ph.D. degree majoring in computer application technology from the Ocean University of China, in 2014. He is currently an Associate Professor with the College of Computer and Communication, China University of Petroleum at East China. His current interests include image and video processing, and machine learning. He has authored or coauthored more than 30 journal articles and conference papers and textbooks. VOLUME 8, 2020