Normal Information Diffusion Distribution and Its Application in Inferring the Optimal Probability Density Functions of the Event Coordinates From the Microseismic or Acoustic Emission Sources

To obtain optimal probability density functions (PDFs) or cumulative density functions (CDFs) of the event coordinates from the microseismic or acoustic emission sources, the normal information diffusion (NID) method based on the “ $3\sigma $ ” truncated interval is introduced. Six sets of different data of the event coordinates from the locating sources are used to illustrate the goodness-of-fit of the NID method, log-logistic (3P) method, lognormal method, and normal method. The results show that the Kolmogorov-Smirnov (K-S) and chi-square test values of the NID distributions (NIDDs) are always less than those of the log-logistic (3P) distributions (LLD3s), lognormal distributions (LNDs), and normal distributions (NDs); the cumulative probability values of the NIDDs are equal to 1, while those of the LLD3s, LNDs, and NDs are less than 1; the curves of the NIDDs have multimodal feature and can reflect the fluctuation of the event coordinates’ data. The conclusion can be drawn that the NIDDs are the optimal PDFs or CDFs of the event coordinates from the microseismic or acoustic emission sources. In the locating methods of the microseismic or acoustic emission sources, it is suggested that the NID method can be further used to improve the locating accuracy.


I. INTRODUCTION
Microseismic monitoring technology is one of the most effective means to monitor and analyze the stability of largescale rock mass, slope, and tunnelling [1]- [7]. At present, there is a lack of in-depth study on the basic problems of microseismic monitoring technology; especially the key and difficult problems of the location of microseismic sources are not well solved. So far, there are more than ten major positioning methods, including geometric methods, physical methods and mathematical methods [8]- [13]. In studying that how to determine the event coordinate position of a microseismic source or an acoustic emission source, Dong et al. [11]- [13] proposed the three-dimensional analytical solutions for an unknown velocity system and probability density function (TDAS-UVS-PDF) method. In the TDAS-UVS-PDF method, The associate editor coordinating the review of this manuscript and approving it for publication was Aniruddha Datta. the probability density function was selected from among the log-logistic, log-logistic (3P), logistic, and normal distributions. Dong considered that the optimal probability distribution was the log-logistic (3P) distribution (LLD3) based on hypothesis testing [12], [13]. There were three issues when using the above distributions to fit these analytical solutions. (1) It was presupposed that the analytical solutions could fit certain kinds of known distributions, and hypothesis testing was performed on these finite distributions, which caused the obtained optimal probability distribution to be very limited [14]- [16]; (2) When the amount of data in a set of analytical solutions was small, the sample had a large dispersion or volatility and the unimodal distributions could not well reflect the dispersion or volatility; (3) The limited sample data were distributed in a finite interval, and the distribution intervals of the above distributions were often infinite or semi-infinite intervals. Obviously, when using the above distributions to fit these analytical solutions, the problem of mismatched intervals objectively exists. Considering the above three issues, it is doubtful whether the LLD3 is the optimal probability distribution of the event coordinates from the microseismic or acoustic emission sources when using the TDAS-UVS-PDF method. Therefore, this paper focuses on these three issues to obtain the optimal probability distributions of analytical solutions from microseismic or acoustic emission sources.
In recent years, the kernel density estimation method has been rapidly developed [17] and has been widely used in the field of natural disaster risk analysis [18]- [20]. In addition, this method has also been applied in other fields and has had great effects [21]- [23]. The normal information diffusion (NID) method is a kind of kernel density estimation method. Its greatest advantage is the ability to make full use of the information of the data itself. The NID distribution (NIDD) is a linear combination of limited normal distributions, and its distribution parameters can be directly obtained using the observed data. We tried to apply the NID method to the probability density function inference of analytical solutions. To efficiently solve the problem of mismatched intervals, the truncated interval considering the ''3σ '' statistical theory was applied in this work.
The aim of this study was to obtain the optimal probability distributions of analytical solutions by investigating the goodness-of-fit of the NIDD, LLD3, LND, and ND methods using six sets of analytical solutions. The remainder of this paper is organized as follows. First, the concepts of the NIDD, LLD3, LND, and ND are briefly introduced. Next, the performances of the NIDD, LLD3, LND, and ND are evaluated based on the Kolmogorov-Smirnov (K-S) and chisquare tests, cumulative probability value, probability density function (PDF) curve, and cumulative density function (CDF) curve. Finally, several conclusions are drawn from the results of this study.

II. NID DISTRIBUTION, LOG-LOGISTIC (3P) DISTRIBUTION, LOGNORMAL DISTRIBUTION, AND NORMAL DISTRIBUTION A. NID DISTRIBUTION (NIDD)
The probability density function of the NID distribution is described as where n is the sample size, x i (i = 1, 2, . . . . . . , n) is the observed value of the sample, h is the window width of a standard normal distribution, and L and R are the left endpoint and right endpoint of a given interval, respectively. Based on the principle of choosing the nearest value, the window width h can be given as where x max and x min are the maximum and minimum values of the sample, respectively; γ is related to the sample size n.
When n is equal to or greater than 17, the corresponding value of γ is 1.420693101 [24]. The basic principles of the NID method were developed by Huang et al. [25].

B. LOG-LOGISTIC (3P) DISTRIBUTION (LLD3)
The PDF and CDF of the log-logistic (3P) distribution are described as where α is the shape parameter to determine the shape of distribution function which can change the properties of distribution function; β is the scale parameter to determines the scale of the distribution function in its interval. The change of β only compresses or expands the range of distribution function, but does not change its basic shape; and γ is the location parameter to determine the position of the distribution function in the horizontal ordinate.

C. LOGNORMAL DISTRIBUTION (LND)
The PDF and CDF of the lognormal distribution is described as where µ is the location parameter, the same meaning as above; σ is the scale parameter, the same meaning as above; and is the Laplace Integral.

D. NORMAL DISTRIBUTION (ND)
The PDF and CDF of the normal distribution is described as where µ is the location parameter, the same meaning as above; σ is the scale parameter, the same meaning as above; and is the Laplace Integral.

III. APPLICATIONS ON INFERRING THE OPTIMAL PDFS OF THREE-DIMENSIONAL ANALYTICAL SOLUTIONS FROM THE LOCATING SOURCES
To demonstrate the fitting advantages of the NID, LLD3, LND, and ND methods for approximating the PDFs or CDFs, six sets of analytical solutions from the locating sources were used as examples for illustration purposes. The K-S  and chi-square test values and cumulative probability value were calculated. Meanwhile, the PDF and CDF curves were drawn to intuitively illustrate the fitting abilities of the above methods.

A. STATISTICAL ANALYSIS OF OBSERVED DATA
Six sets of observed data of analytical solutions, as shown in Table 1, are chosen as examples to test the abovementioned aim. These data are acquired from the blasting testing in Dongguashan copper mine and have been properly calculated and processed. The specific content can be found elsewhere [13]. The basic information from the observed data is sequentially numbered in Table 2. Table 2 shows eight parts, including the sample number, sample size, minimum value, maximum value, mean value, standard deviation value, skewness value, interval endpoint values. The sizes of each set of analytical solutions are 84. In order to eliminate invalid data in the original data, the ''3σ '' statistical theory is used as truncation method. According to the requirements of the ''3σ '' statistical theory, the mean values, standard deviation values, and skewness values of the six sets of analytical solutions were calculated.
Further, the values of the left and right endpoints were also calculated. The calculation results of the above statistical characteristics are listed in Table 2.

B. COMPARISON OF THE K-S TEST VALUES AND THE CHI-SQUARE TEST VALUES
The K-S test method is used to determine whether the hypothesized distribution is acceptable and to choose the best one when two or more distributions are concurrently acceptable. Here, the K-S test principle is not described in detail, and its specific content can be found in Hubert W. Lilliefors' work [26]. In this study, the resulting K-S test values for the six sets of analytical solutions are calculated at the 5% significance level and are listed in Table 3. Table 3 includes the resulting K-S test values for the NIDDs, LLD3s, LNDs, NDs and the critical K-S values. The NIDDs pass the test of critical values and the K-S test values of the NIDDs were always less than those of the LLD3s, LNDs, and NDs. Because the smaller the K-S test values, the better the fit, the above analysis clearly indicates that the NIDDs have a better fitting ability than the LLD3s, LNDs, and NDs. VOLUME 8, 2020  To avoid the contingency of the K-S test results, this paper also selected the chi-square method for testing. The resulting chi-square test values for the six sets of analytical solutions were also calculated at the 5% significance level, as listed in Table 4. Table 4 includes the resulting chi-square test values of the NIDDs, LLD3s, LNDs, NDs and the critical chi-square values. The NIDDs, the LLD3s, LNDs, and NDs all passed the testing of critical values. For the chi-square test result of each event coordinate, the chi-square test values of the NIDDs were always less than those of the LLD3s, LNDs, and NDs. For example, the ratio of the chi-square test value of the LLD3 to that of the NIDD is up to 21 times in the event No.2-X. Similar to the evaluation criteria for the K-S test value, the smaller the chi-square test value, the better the fit will be. The same conclusion can be reached that the fitting advantage of the NIDDs is more prominent than those of the LLD3s, LNDs, and NDs.

C. COMPARISON OF THE CUMULATIVE PROBABILITY VALUES
The cumulative probability value is an important index that reflects the fit of a probability distribution on a truncated integral interval. Only when the cumulative probability value is equal to 1 can the function satisfy the basic properties of a probability density function. The cumulative probability values of the NIDDs, LLD3s, LNDs, and NDs for the six sets of analytical solutions are shown in Table 5. It can be clearly seen that the cumulative probability values of the NIDDs are always equal to 1.0000. However, those of the LLD3s, LNDs, and NDs are less than 1.0000. The resultant cumulative probability values show that the NIDDs are better than the LLDs on the truncated interval.
The F(L)s and F(R)s of the NIDDs, LLD3s, LNDs, and NDs for the six sets of analytical solutions are also calculated and listed in Table 5. Because the interval is truncated, the values of the left endpoints of the intervals are not exactly equal to 0 and those of the right endpoints of the intervals are not exactly equal to 1.0000. However, the F(L)s and F(R)s of the NIDDs are always equal to 0 and 1.0000, respectively. In the analysis of the cumulative probability value, the fitting effects of the NIDDs are better than those of the LLD3s.

D. COMPARISON OF THE PDF AND CDF CURVES
To intuitively show the fitting abilities of the NIDDs, LLD3s, LNDs, and NDs, the comparative PDF curves of the observed  data for the six sets of analytical solutions are plotted in Fig. 1. Event No.1-X was taken as an example to analyze the fitting advantages of the NIDDs. From the histogram of event No.1-X, it is observed that there is a fluctuation in the observed data. The PDF curves of the LLD3, LND, and ND are unimodal, representing that the curve gradually increases at first and then decreases. Obviously, such a curve cannot well describe the fluctuation of the histogram. However, the PDF curve of the NIDD is multimodal and it is almost consistent with the change trend in the histogram. Therefore, the PDF curve of the NIDD can better reflect the fluctuation of the histogram than that of the LLD3. The analysis results of the others are similar to those of event No.1-X.
In the PDF curves, the histogram would be affected by the amount of interval groups and it caused that the histogram was not unique. Then to better demonstrate the fitting abilities of the NIDDs, the CDF curves of the NIDDs, LLD3s, LNDs, NDs, and the staircase-like empirical distributions (EDs) are plotted as Fig. 2. For these events' coordinates, no gap was observed in the NIDDs with the EDs in the CDF curves. However, there is a clear gap for the LLD3s, LNDs, and NDs with the EDs. In conclusion, compared with the LLD3s, LNDs, and NDs, the NIDDs can better fit or approximate the EDs.

IV. CONCLUSIONS
This paper introduces the NID method based on the ''3σ '' truncated interval to infer the optimal PDFs among analytical solutions from the locating sources of sensor networks. Six sets of analytical solutions were used to illustrate the fitting advantages of the NIDDs compared with the LLD3s, LNDs, and NDs. Several conclusions were drawn.
(1) The normal information diffusion distribution (NIDD), which is based on the truncated interval considering the ''3σ '' statistical theory, was introduced in this study. In this truncated interval, the cumulative probability values of the NIDDs are always equal to 1.0000. However, those of the LLD3s, LNDs, and NDs are less than but very close to 1.0000. The calculated K-S and chi-square test values show that the K-S and chi-square test values of the NIDDs are less than those of the LLD3s, LNDs, and NDs. From the perspective of hypothesis testing and the cumulative probability value, it is proved that the fitting abilities of the NIDDs are better than those of the LLD3s, LNDs, and NDs.
(2) The PDF and CDF curves of the NIDDs, LLD3s, LNDs, and NDs are drawn to intuitively illustrate the fitting effects. The PDF curves of the NIDDs are multimodal, which indicates that the NIDDs can better describe the fluctuation of the histogram compared with the LLD3s, LNDs, and NDs. Additionally, the CDF curve coincidences of the NIDDs and EDs are higher than those of the LLD3s, LNDs, and NDs.
(3) The NIDDs are the optimal PDFs or CDFs of the event coordinates. We suggest that the NID method can be further used to improve the locating accuracy in locating methods of the microseismic or acoustic emission sources.