Optimized Data Association Based on Gaussian Mixture Model

Data association is the foundation of state estimation in mobile robot simultaneous localization and mapping. Aiming at the problems of false association, high computational complexity in joint compatible branch and bound algorithm, we propose an optimized joint compatible branch and bound data association algorithm based on Gaussian mixture clustering. Firstly, the local association strategy is adopted to limit data association in local region, so as to reduce the number of features involved in data association at the current moment. Secondly, the Gaussian mixture clustering algorithm is used in local areas to group the observed values at the current moment, so as to get several groups that have little correlation with each other. Finally, joint compatible branch and bound data association algorithm is used in each group for data association, and the optimal solution is obtained according to mutual exclusion criteria and optimal criteria. The experiment results verify that the algorithm improved the accuracy of data association, reduced the computational complexity and improved the efficiency of data association.


I. INTRODUCTION
Simultaneous localization and mapping (SLAM) technology is the key to realize the robot navigation. It means of a mobile robot in an unknown environment constantly sense the environment using its own sensor starting from the starting point. According to the sensor data to estimate its own position for achieving self-positioning [1]- [4]. Meanwhile, the environment map is built incrementally on the basis of localization to achieve its positioning and navigation. There are several solutions proposed to solve SLAM problems, including laser SLAM like GMapping [5], and LOAM [5], [6], visual SLAM like LSD-SLAM [7], ORB-SLAM2 [8], and RTAB-MAP [6], [9], semantic SLAM like DS-SLAM [7], [10], and DynaSLAM [11]. Data association is a key problem of mobile robot SLAM, which is also known as consistency problem.
Specifically, the purpose of data association is to determine whether there is a correspondence between the measured The associate editor coordinating the review of this manuscript and approving it for publication was Ran Cheng . values obtained by the mobile robot and the map features in the existing environment map under different time and environment region. And it determines whether the measured values come from the same entity in the environment based on the above correspondence. A few data association failures will cause the algorithm to diverge. Therefore, it is vital to study the algorithm of data association in order to ensure the accuracy of SLAM state estimation [8], [9], [10], [12]- [14].
In the related research fields of mobile robot SLAM, the data association methods that have been widely applied are independent compatible nearest neighbor (ICNN) algorithm [11], [15] and joint compatible branch and bound (JCBB) algorithm. The advantage of the ICNN algorithm is that the implementation process is relatively easy and it has a strong real-time performance. However, due to not fully consider the correlation of each eigenvalue, its performance will be greatly affected when the environment changes. Compared with the ICNN algorithm, the JCBB algorithm proposed by Castellanos et al. [12], [16] improved the correlation accuracy. However, it requires a large amount of computation in the process of algorithm implementation, and its real-time performance is poor. Researchers continually optimize the data association algorithm, and then put forward the probability data association algorithm, joint maximum likelihood association algorithm and multi-hypothesis tracking algorithm [12], [16]. These algorithms improve the computational efficiency of mobile robot SLAM data association algorithm and solve the data association problem of multitarget tracking [13], [14], [17], [18].
Singer and Stein [15], [19] proposed multiple hypotheses JCBB algorithm to solve the ambiguous data association problem and improved map accuracy. Neira and Tardos [16], [20] adopted mutual exclusion criteria and optimal criteria to improve the accuracy of association. Blackman [17], [21] introduce k-means clustering algorithm to SLAM data association algorithm (KJCBB) for further optimization, which is based on k-means clustering algorithm to cluster the observed values. The number of groups is determined according to the region of the robot. So, it is likely that adjacent landmarks that should be in the same observation group will be misclassified into different observation groups, eventually led to the error association results. Dallil et al. [18], [22] proposed a fast JCBB for mobile robot SLAM based on density-based spatial clustering of application with noise. As a result, the association accuracy was improved and the running time of the algorithm was significantly reduced. Wang and Englot [19], [23] proposed a Gaussian mixture model based JCBB data association algorithm (GEMJCBB) grouping the measurement to reduce the computation complexity.
The above studies have contributed to the optimization of SLAM data association algorithm and provided theoretical basis. A new optimized data association algorithm based on Gaussian mixture model [20], [21], [24], [25] is proposed to solve the problem of false association, complicated calculation problem in the current SLAM data association algorithm. Firstly, the local association strategy is adopted to limit data association in local region, so as to reduce the number of features involved in data association at the current moment. Secondly, the Gaussian mixture clustering algorithm is used in local areas to group the observed values at the current moment, so as to get several groups that have little correlation with each other. Finally, JCBB data association algorithm is used in each group for data association, and the optimal solution is obtained according to mutual exclusion criteria and optimal criteria. The algorithm can group the observed data using Gaussian mixture clustering algorithm. Its advantage lies in: 1. Reduce the number of observations and map features participating in association at the same time, thus greatly reducing the computational complexity of JCBB algorithm; 2. The Gaussian mixture model is used to make a reasonable grouping for the local association region, which does not need to be obtained empirically according to the environment, thus improving the accuracy of the algorithm. In section 2, the SLAM data association problem is introduced. In section 3 we introduce the proposed SLAM data association algorithm. Experiment results and analysis are given in section 4. In addition, the conclusions and the future work are drawn in the last section.

II. SLAM DATA ASSOCIATION MODEL A. SLAM DATA ASSOCIATION
The mobile robot SLAM technology is not only related to the status of the robot itself, but also related to the information of the external environment. The problem of SLAM data association involves three kinds of correspondence. The relationship between the observation obtained by its sensors while the robot is in different time or different region; The relationship between observations and existing environmental map features; The relationship between existing environment map features. Through comparative analysis of the above three relationships, it is determined whether the observations and eigenvalues come from the same entity in the environment region. The above process can be regarded as applying the observation-feature matching character to search the environment state space. There are three possibilities for a robot to get a new observation; One is the built environment characteristics information; Second, the new environmental characteristics; The third is the virtual set, that is, the observed value is not a reflection of real physical road signs, but caused by sensor noise or specular reflection.
Suppose there are n map features F = {F 1 , F 2 , · · · , F n } in the environment area of the robot. The laser sensor measure m observation Z = {Z 1 , Z 2 , · · · , Z m }. The hypothesis needs to be established by applying SLAM data association technology H m = {j 1 , j 2 , · · · , j m }, Matching each observation Z i with a feature F ji , When the sensor measurements Z i do not match any of the features in the map,ji = 0. Where, the measured value Z i and the corresponding feature F ji are related by the measurement function f i ji (x, y) = 0, indicating that the relative position of the measured value and the corresponding feature must be 0.

B. JOINT COMPATIBLE BRANCH AND BOUND ALGORITHM
The JCBB algorithm is one of the independent matching association algorithms based on single observation in the research field of mobile robot SLAM. During the implementation of the algorithm, the joint compatibility test method is used to combine the observed features and map features acquired by the mobile robot. While the branch and bound method is used to search the associated solution space.
Under the associated hypothesis set H m = {j 1 , j 2 , · · · , j m }, the joint observation equation of map features is expressed as follows: Joint new information is expressed by the following formula: Then the covariance of joint new interest can be expressed as: In (2), Then the joint compatible test criterion is as follows: If the above equation is true, all observed features and map features are considered joint and compatible.
The main purpose of applying branch and bound criterion in the data association problem of mobile robot SLAM is to traverse the solution space and obtain the best solution vector. The joint facultative condition in (4) is used as the criterion for traversing the branch of the correlation interpretation tree. The search order is determined according to its mahalanobis distance, and the whole associated solution space is divided into several subsets. Among the subsets, the monotonic nonsubtraction rule of the number of pairs is used as the condition of delimitation. Finally, the largest association hypothesis in the number of pairs is selected as the optimal association solution.

III. OPTIMIZED DATA ASSOCIATION BASED ON GAUSSIAN MIXTURE CLUSTERING
The observed value solution space in the traditional JCBB SLAM data association algorithm is described according to the interpretation tree model. For obtaining the best data association results, the incremental computational search combining branch and bound with compatibility is adopted. This process takes all sensors into account to obtain the relationship between the observed value. Therefore, the accuracy of data association is higher, and robustness is stronger. But JCBB algorithm associates all of the observed value in the current moment with the environmental characteristics environmental characteristics that already exist on the map. In large dense environments, the number of features increases rapidly over time, resulting in high computational complexity of the algorithm. The algorithm can be improved by reducing the number of observations and map features participating in association at the same time. Firstly, the local association region is obtained to reduce the number of environmental characteristics participating in association at the same time. In addition, the Gaussian mixture clustering algorithm is adopted grouping the observed values to reduce the number of observed values. The specific implementation is as follows:

A. LOCAL ASSOCIATION STRATEGY
In the actual situation, the observation range of the robot is limited. It is not necessary for observations to do data association every known characteristics. So, the observed feature that is far away from the robot can be ignored by setting an association threshold in advance to set the features falling within the association threshold as the objects that the target may associate with. The association threshold is set to r +d, in which r is the effective scanning distance of the laser sensor, d is the distance for compensation. The introduction of compensation distance enables the local map to contain the environmental map features that match the observed values as comprehensively as possible. Local associated areas represent as follows: where: (x f , −y f ) represents the position of the feature point, and (x r , −y r ) represents the position coordinates of the robot. As shown in Fig. 1, the associated local area is a dotted circle with r + d as the radius and the robot as the center. Dots represent existing features in the map, and asterisks represent newly observed observations by the sensor. The preprocessing procedure is used to obtain the local association region, so that the number of environmental map features involved in mobile robot SLAM data association can be effectively reduced at a single moment.

B. GAUSSIAN MIXTURE CLUSTERING
The observed value generally presents the obvious distribution while mobile robot travel in the environment. In article [21], the k-means clustering algorithm are applied to group the observed data. It is easy to lead to local optimization while the overall observed data is large. And meanwhile, when dealing with practical problems, the algorithm is sensitive to noise and outliers. K-means is only applicable to numerical type data. However, non-convex data and irregular shape of the cluster cannot be solved effectively. The article [22] then use density-based method system to solve this problem. This method can not only find any shape of clustering, but also process the data with noise more accurately. The results obtained by density-based clustering method are related to the fixed parameters used to identify clustering. The algorithm will still be affected by the sparsity even with the same standard. In other words, it will be divided into multiple classes if the data is relatively sparse and merged to one if relatively dense. In recent years, we have witnessed several researches on clustering algorithm related to GMM, such as [22]- [31]. The probability density function in the Gaussian mixture model (GMM) plays a very important role in simplifying the processing steps of data and allocating accurate processing results to each Gaussian mixture. In the process of parameter estimation, GMM applies the maximum expectation (EM) algorithm, which improves the operation efficiency of data analysis [32], [33]. Therefore, we adopt Gaussian mixture clustering algorithm to group the observed values. The clustering grouping diagram of data association is shown in Fig. 2. Firstly, the observed eigenvalue set Z = {Z 1 , Z 2 , · · · , Z m } and Gaussian mixture number k were taken as the input, and then the Gaussian mixture distribution model was initialized. The selection of k value in the Gaussian mixture model is determined according to the environment. According to the range of observed values scanned by the laser, the number of groups is generally 3 to 5. The Gaussian mixture distribution (GMM) considers that data is generated from several single Gaussian distribution models (GSM), sigma is the variance of the model. The probability density function is where π k is the weight factor, represents the probability of choosing the ith mixed ingredient, and k i=1 π i = 1. Single Gaussian distribution N (z; µ, σ ) represents a cluster grouping in the Gaussian mixture distribution, as follows: The Gaussian mixture component of the generated sample z j (j = 1, · · · , m) is expressed as a random variable q j , and the prior probability P q j = i of q j corresponds to z j (j = 1, · · · , m). According to Bayes theorem, the posterior probability density function of sample z j (j = 1, · · · , m) generated according to the ith GMM is now expressed as z j (j = 1, · · · , m): Gaussian mixture clustering divides the observation feature set Z into k components, denoted by Group = {Group 1 , · · · , Group k }, and the component mark of each observation sample is denoted by: Gaussian mixture model is a clustering algorithm with k clustering centers. The calculation performance of Gaussian function is relatively good. If the classification of the samples in the system is unknown, it is possible to calculate (π, µ, σ ) given only the sample points. For a given observation data set Z, the model parameters can be solved by maximum likelihood estimation.
The number of data information points observed by the laser sensor is m, and it obeys a certain distribution Pr (z i ; λ). The goal of the algorithm is to get the parameter λ so that the probability m i=1 Pr (z i ; l) of the generated data point is maximize, where m i=1 Pr (z i ; λ) is called the likelihood function. In general, the probability value of a single data point is relatively small, and the product of the probability values of m data points is smaller. So floating point underflow is very easy to occur. Therefore, the logarithm of likelihood function is normally taken as m i=1 ln Pr (z i ; λ) known as loglikelihood function. The log-likelihood function of GMM is as follows: Therefore, the maximum expected value algorithm can be applied to solve the iterative optimization of the observation data set model, and then the classification of observation data samples can be distinguished according to the model that has completed training. The specific steps are expressed as follows: Firstly, select one randomly from the groups of k clusters; Then, the observed data samples were substituted into the cluster selected in the previous step, and whether the data samples belonged to this category was determined. If not, the cluster group was re-selected.
Expectation step: assuming that model parameters are known, calculate the Expectation of implicit variables taking q 1 , q 2 , · · · respectively, that is, the probability of Q Taking q 1 , q 2 , · · · respectively. This step in the Gaussian mixture model is to calculate the probability γ ji of the observed sample data points generated by each cluster group.
Maximization step: maximum likelihood method is considered to calculate model parameters. Take the γ ji obtained by the Expectation operation step as the probability that the observed value sample data point Z i is generated by grouping the kth cluster. If LL (Z ) is maximized by (π i , µ i , σ i ) , then we can let ∂LL (Z ) /∂µ i = 0 to get The sample weight is the posterior probability of each sample belonging to the component, and let ∂LL (Z ) /∂σ i = 0 to get π i > 0 and, k i=1 π i = 1, maximize LL (Z ) and get According to (11) to (13) to update the model parameters, and according to (9) to determine the component z j , divide z j into the corresponding component, namely Group λ j = Group λ j ∪ z j , and finally get the division result of component Group = {Group 1 , · · · , Group k }.
Every step of the mobile robot motion, through the GMM algorithm about all of the observed value is divided into several small correlation grouping. Then for each grouping JCBB data correlation method is used to get local association results. Each Group represented as explained in the tree layer. And Select the best pairings from the explanation tree as the final result. In this paper, the abbreviation of the proposed algorithm is GEOJCBB. The algorithm description is shown in Fig. 3.

C. OPTIMIZED METHOD
In practice, JCBB data association have false association problem. One is that different observed features match the same features on the map. The other is that JCBB accepts an associative solution that is not optimal when there are multiple associative solutions with the maximum matched logarithm. In order to address the problem, we introduce mutual exclusion rule and optimal criterion.
Mutual exclusion rule: a map feature is allowed to be associated with only one observed value. This means that after a map feature is matched, another observation will be rejected if it is matched with the map feature again, and the observation can only look for the unmatched map feature to be correlated or treated as a new feature.
Optimal criterion: if the pairing number of multiple associative solutions is equal to the maximum pairing number, the associative solution with the minimum joint mahalanobis distance is selected as the final associative result.
where,n, H i , H k represents the maximum pairing number, represents the ith associative solution and the optimal associative solution.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
The simulation environment shown in Fig. 4 was designed on the simulation experiment platform developed by Neira et al. [34] to verify the improvement in association performance of the GEOJCBB data association algorithm proposed in this paper. The " * " in the figure represents the existing features in the environmental space, and the solid line represents the theoretical planning path of the robot. In order to verify the high efficiency and accuracy of the algorithm in the association results in this paper, the positioning accuracy, association time complexity and correlation performance were respectively compared and analyzed with JCBB data association algorithm, KJCBB data association algorithm and GEMJCBB data association algorithm.

A. PATH FITTING RESULT
The data association and path fitting results of JCBB data association algorithm, KJCBB data association algorithm, GEMJCBB data association algorithm, and GEOJCBB data association algorithm are shown in Fig. 5. In Fig. 5, a, b, c, and d correspond to JCBB data association algorithm, KJCBB data association algorithm, GEMJCBB data association algorithm, and GEOJCBB data association algorithm respectively. The solid green line is the actual path of the robot. The blue '' * '' indicates the actual features in the environment. The red ''+'' is the predictive feature; and the black line is the estimated path. As is shown in Fig. 5, the estimated path fits the true path well in the four algorithms and have no obvious difference visually.
The error curves of pose estimation in Fig. 6 and Fig. 7 describe the X-axis and Y-axis errors of the actual path and the estimated path in simulation environment. The mean estimation errors of SLAM in X-axis direction based on JCBB, KJCBB, GEMJCBB, and GEOJCBB algorithms were respectively 0.5184, 0.4352, 0.3895, and 0.3029. The mean estimation errors in the Y-axis direction were 0.6306, 0.5450, 0.3951, and 0.2389. Simulation results show that the GEOJCBB based SLAM data association algorithm proposed in this paper provides more reliable association results, which improves the estimation accuracy of the pose of SLAM compared with the other three algorithms.

B. ASSOCIATION EFFICIENCY
In the simulation environment, the robot moves counterclockwise and uniformly from the initial state. Twenty Monte Carlo simulation experiments were conducted for SLAM based on   four association algorithms. Table. 1 shows the average association time of four association algorithms in 20 simulation experiments.
According to Table. 1, the average association time of the GEOJCBB SLAM data association algorithm proposed in this paper in simulation environment is 135.0218seconds. The average association time of the algorithm in The analysis of the simulation results shows that the average association time of the algorithm proposed in this paper are both less than the other three algorithms. Specifically, KJCBB, GEMJCBB, and GEOJCBB all grouped observations, reducing the dimension of observation measurement in the joint compatible calculation, thus reducing the complexity of calculation. Meanwhile, GEOJCBB adopts Gaussian mixture model to cluster and group the observed values. The probability density function in GMM can accurately distribute the observed sample data to each mixed component and simplify the data processing steps. At the same time, GMM referred to the idea of EM algorithm when evaluating parameters, which significantly improved the data analysis speed of the algorithm. In addition, the GEOJCBB algorithm proposed in this paper delimits the local association area in the data preprocessing operation, reducing the number of environmental features participating in the data association at the same time, so as to further improve the efficiency of the algorithm.

C. ASSOCIATION PERFORMANCE
The four indexes to evaluate the performance of data association are: the accuracy of association (TP rate, TPR), the accuracy of new road signs added to the map (TN rate, TNR), the error rate of association (FP rate, FPR) and the missing rate of association (FN rate, FNR). Where, TP, TN, FP and FN are defined as follows: TP: true positive, means the correct measurement-characteristic matching logarithm detected.TN: true negative, which represents the unmatched logarithm of the correct rejection, namely the number of new environment features detected. FP: false positive, indicating the measurement of detected errors-logarithm of feature matching. FN: false negative means the correct measurementcharacteristic matching logarithm that was not detected.
The number of observations at the current moment is Total: Precision is the percentage of the samples with positive prediction. It is calculated as follows: Recall, True Positive Rate (TPR), represent the Positive sample Rate in the judgment pair samples, that is, the proportion of the detected correct observation-feature matching logarithm in the total observation-feature matching logarithm, and the calculation formula is as follows: In order to make a more comprehensive and scientific evaluation algorithm, the most common method is F-measure. F-measure is the weighted harmonic average of Precision and Recall: In (19), α equals 1. Therefore, the F-measure in this paper is: Fig. 8 shows the data association performance of JCBB, KJCBB, GEMJCBB, and GEOJCBB algorithms in simulation environment.    Table. 2, the F-measure of the proposed algorithm in the simulation environment is improved compared with the other three algorithms.

V. CONCLUSION
Data association is the foundation of state estimation in mobile robot SLAM JCBB algorithm is currently a common SLAM data association algorithm that can be obtain reliable results. But in a wide range of dense environment, the number of map environment features and observed values participating in the association increases rapidly with time, leading to the increase of computation amount of the JCBB algorithm. In practical application, a good data association algorithm should not only have high accuracy, but also the characteristics of superior real-time performance and low computational complexity. This paper presents a SLAM data association algorithm based on Gaussian mixture clustering. The algorithm can be improved by reducing the number of observations and map features participating in the association at the same time. Firstly, the local association strategy is adopted to limit data association in local region, so as to reduce the number of features involved in data association at the current moment. Secondly, the Gaussian mixture clustering algorithm is used in local areas to group the observed values at the current moment, so as to get several groups that have little correlation with each other. Finally, JCBB data association algorithm is used in each group for data association, and the optimal solution is obtained according to mutual exclusion criteria and optimal criteria. Experimental results show that the proposed JCBB algorithm based on Gaussian mixture clustering optimization can obtain accurate association results, reduce computational complexity, improve algorithm efficiency, and provide a reliable guarantee for localization of mobile robot SLAM. However, the experiment was realized in simulation environment. In the future, we will implement the experiment in the real word.
DINGQI REN was born in Hebei, China. He is currently pursuing the master's degree with the Beijing University of Technology. His research interests include robotics and artificial intelligence, and the main research work now is robot navigation problem.
XIAOQING ZHU received the Ph.D. degree in control science and engineering from the Beijing University of Technology, in 2015. He is currently the Deputy Dean of College of Artificial Intelligence and automation with the Faculty of Information Technology, Beijing University of Technology, and he is also a Guest Professor with the College of Electronic and Information, Nanchang Institute of Technology. His research interests include robotics and machine learning.
SHAODA LIU was born in Hebei, China. He is currently pursuing the master's degree with the Beijing University of Technology. His research interests include robotics and artificial intelligence, and the main research work now is robot navigation problem.