Fast-Moving Sound Source Tracking With Relative Doppler Stretch

Tracking a fast-moving sound source in the air in an acoustic way has rarely been seen in the current literature. The speed of the source in the air is often not negligible compared to the speed of sound in the air. Time-varying propagation distance and noticeable Doppler effect also bring difficulties to traditional tracking methods with the time difference of arrival (TDOA) and direction of arrival (DOA). In this paper, we propose a particle filtering framework with relative Doppler stretch. We utilize the propagation delayed measurement (PDM) model and correct the posterior probability in traditional particle filter with propagation delayed state being a bridge. This method avoids the complex pre-processing of the raw acoustic signals. The simulation results show that the algorithm has an expected performance and is superior to the existing methods. The whole tracking process which starts from raw signals is shown for the first time, and the key factors affecting the pre-processing of the raw signals that are not mentioned in other articles are also discussed in this paper.


I. INTRODUCTION
For most source tracking scenarios, radar is the most considered. However, for a low-altitude aircraft or a ground vehicle, such as an unmanned aerial vehicle, a helicopter or a tank, the requirements for radar devices will become expensive and complicated. If the source has the anti-radar capability, the performance will significantly reduce. Thus, a passive acoustic tracking method for the moving source is proposed in this paper as a supplement or an alternative to the above scenarios.
Acoustic methods have been widely used in stationary or slow-moving source tracking and localization [1]- [3]. The state-of-the-art deep learning is also used to adapt to the complex acoustic environment [4], [5]. However, when the velocity of a moving source cannot be ignored compared to the speed of sound in the air, the received acoustic signals radiated by the source will be greatly distorted due to Doppler effect. The time difference of arrival (TDOA), which is commonly used to locate a stationary or slowmoving sound source, will not work in such a situation with distorted signals. To extend this method to fast-moving source positioning and tracking, a new method was proposed The associate editor coordinating the review of this manuscript and approving it for publication was Jingchang Huang .
to compensate for the TDOA error caused by the Doppler effect [6], [7]. In [6], [7], the wideband cross-ambiguity function based on continuous wavelet transform was used to jointly estimate the relative time scale and time delay between two sensors. However, due to the retardation effect, the signals received at the same time instant by different microphones correspond to the signals emitted in different positions for a moving sound source. Thus, observation time intervals for different microphones should be different so that the signals from the same source position can be included in the observation, which brings trouble to the sound source tracking. To this end, Genescà et al. [8] proposed a method that the signals from each microphone could be synchronized to compensate for the retardation effect. With this method, the relative Doppler stretches from seven microphones were directly calculated through the one-dimensional wideband cross-ambiguity function with equal observation time interval. Based on the previous work, Martín et al. [9], [10] used the Simple Genetic Algorithm (SGA) to search for the position of the source and carried out the simulation study with the takeoff-climb-out data extracted from the database of the FAA's Integrated Noise Model (INM) for a Boeing 737-400. Next, a radio-controlled airplane localization is experimentally tested in [11]. The simulations and experiments showed a good performance.
Similarly, considering that the acoustic signals are distorted in amplitude and frequency, if the Doppler effect can be removed, the signals can be used to estimate TDOA directly. This method was called de-Dopplerization [12]. So far, time domain and frequency domain de-Dopplerization were developed to recover signals, but the velocity should be known accurately before de-Dopplerization, which was difficult in passive tracking.
Besides TDOA, the direction of arrival (DOA) is another measurement to estimate the position of the sound source. In [13], the authors used a planar five-element array to estimate the angle and range of the aircraft in each time interval. In [14], the authors used distributed sensors to locate the vehicular sources with DOA. In [15], the authors proposed a method that could adapt to multi source tracking with DOA batches. In [16], the moving acoustic source was located by intersecting azimuth lines with double arrays. Nevertheless, the DOA of a fast-moving sound source is highly-dynamic. There were a few studies about the highlydynamic DOA estimation with dynamic compressive sensing [17], [18]. Because dynamic compressive sensing is a difficult time-varying optimization problem, it is difficult to get satisfactory results. In [17], a weighted L 1 minimization algorithm is proposed to improve the performance by utilizing the DOA changing scope as a prior information. However, the preprocessing is still complex and time-consuming in tracking process.
In fact, there has rarely been a satisfactory acoustic tracking frame for a moving source in the air. The tracking frame contains two problems, one is that the speed of a moving source cannot be neglected, the other is that source motion and observation time do not match due to the time-varying distance. Both will cause numerous errors in tracking without compensation. The authors [19]- [21] proposed a propagation delayed measurement particle filter (PDM-PF) to find a solution. They changed the traditional state transition model and measurement model. The time delay and time delayed state were added to form an augmented state vector, then they predicted the current state from the delayed augmented state. The simulation experiment was carried out to show its performance with DOA, yet they assumed that the DOA estimation error was in a small range without building the relationship between the raw acoustic signals and tracking performance. Moreover, obtaining an accurate DOA for a moving source is not easy as mentioned above. Therefore, this paper proposes an algorithm to overcome the above limitations. We use randomly distributed microphones to collect signals and directly calculate the relative Doppler stretch of the signals received by each pair of microphones without compensating for the retardation effect. Then a general particle filter framework based on the method of propagation delayed state is constructed to track the fastmoving source with relative Doppler stretch.
The structure of the rest of the article is as follows. Section II describes the acoustic characteristics of a moving sound source. Section III introduces the framework of the method proposed in the article. Section IV shows the basic settings of simulation. The results discussions are also included in this section. Finally, conclusions are placed in Section V.

A. A SOUND FIELD GENERATED BY A MOVING SOUND SOURCE
When the size of the source is much less than the soundpropagation path-length, the size of the source is assumed to be ignored. Thus, the source can be regarded as a monopole sound source radiating noise in a spherical manner in the entire trajectory. In addition, the sound propagation model is considered an ideal propagation model without the effect of temperature.
Considering M randomly distributed microphones, according to [10], the sound pressure signals received by the microphone m(m = 1, . . . , M ) at time t is where v is the velocity of the source, . 2 denotes the vector 2-norm, s(t) is the sound source strength, τ m is the propagation time of the acoustic signal to the microphone direction vector and the vector of the sound wave propagation direction pointing to the microphone m, R m represents the soundpropagation path-length to the microphone m, and c is the speed of sound propagating in the air which is 344m/s in this article. Let the distance from the initial position to the source be R 0 , the sampled signals are where k represents the number of samples, t is the sampling interval, and • denotes rounding up.

B. RELATIVE DOPPLER STRETCH (RDS)
As shown in Figure 1, the sampling interval and motion interval do not match due to the variation of the path-length R m . In a short time period, T , the sound radiation characteristics and velocity v of the moving source can be assumed constant. From time t to time t + T , if the displacement of the moving source is much less than the sound-propagation path-length, i.e., v 2 ×T R m , the angle θ m can be considered constant. Then, due to Doppler effect, the frequency f d m of the signals received by the microphone m satisfies where f represents the frequency of the sound source. And θ m is obtained as where r m represents the position of the microphone m, v t−τ m and P t−τ m denotes the velocity and position of the source respectively at time t − τ m , and τ m satisfies Let d m = c c− v 2 ×cosθ m represents the Doppler stretch, then the relationship between the source signals and the received signals can be written as where q m (f ) and s(f ) is the Fourier transform of q m (t) and s(t) respectively, A m is the amplitude coefficient which is assumed constant in a short time interval. Thus, for a microphone pair (m, n) constructed by any two microphones, we can obtain where d m,n = d m d n , and φ(m, n) represents the phase difference.
Then, according to [7], for the microphone pair (m, n), the cross-ambiguity function of the signals is defined as 2 ) refers to the pth pair of microphones formed by microphone m and n.
As mentioned in [11], the ambiguity function will not be time domain restrictive if the signals have been synchronized. If without synchronization, we directly use two moduli |q m (f )| and |q n (δf )| ( e jφ(m,n) = 1) and Equation (8) becomes: The maximum value of X p (δ) will correspond to δ = d m d n which is also called relative Doppler stretch (RDS).
To calculate the value of X p (δ), a 1-dimensional version of the discrete form is used. Let where L N is the available number of frequency bins satisfying Nyquist sampling theorem, then L N is obtained as where f s is the sampling frequency, f is frequency resolution, and • denotes rounding down. For the discrete version, resampling of the frequency axis is needed, which is calculated by linear interpolation. Additionally, with Equation (4), the relative Doppler stretch can also be written as So far, we have only focused on subsonic sources. Assuming that the maximum speed of the low-altitude aircraft is v max , we can define the search range of δ is c−v max c+v max , c+v max c−v max in Equation (9). For a 2D motion source, eight scalar unknowns appear in Equation (12) including the position and velocity of the source at two different time t − τ m and t − τ n . Nevertheless, if we know its motion model, there are only four scalar unknowns. Thus, at least four microphones are required to form six microphone pairs to obtain more than four relative Doppler stretches. Besides, from Equation (12), the relative Doppler stretch obtained from Equation (9) is corresponding to two source states at two different moments if two received signals have not been synchronized. Thus, when using relative Doppler stretches, a correction of the tracking model is required.

C. PROPAGATION DELAYED MEASUREMENT (PDM) AND PROPAGATION DELAYED STATE (PDS)
For a tracking problem, the following state space model is considered where We assume that the transformation in Equation (13) is invertible which satisfies We consider a general measurement model where n T k is noise and g represents the function from state to measurement. According to section II.B, for the pth pair of microphones, when the relative Doppler stretch, δ p,T k , is used as the measurement, this model needs to be corrected as where δ p,T k is also called propagation delayed measurement (PDM), similarly, we define where x D T k is the propagation delayed state (PDS). The PDS is constructed by two different delayed states from the current state x T k . Therefore, the PDS establishes the relation between the current state and the PDM like a bridge.
With Equation (5) and (14), the problem of obtaining PDS is shown as where P T −τ m is composed of the position parameter in the state vector x T k −τ m . This kind of numeric root-finding problem can be solved with the Newton-Raphson method when the model is known in Equation (14).

III. PROPAGATION DELAYED STATE PARTICLE FILTER WITH RELATIVE DOPPLER STRETCH (PDS-PF-RDS)
As can be seen above, the tracking model is non-linear and non-Gaussian, thus, the particle filter is used here to provide the optimal solution. The particle filtering algorithm is a form of recursive Bayesian filter. It is a non-linear filtering algorithm based on the Bayesian formula to realize state predictions and state updates. The main idea is to use a group of random samples (particles) to describe the posterior probability distribution, adjust the weight of each sample through the measurement model to approximate the actual probability distribution, and use the weighted average of the samples as an estimate of the current state, which is applicable for arbitrary non-linear, non-Gaussian systems.

A. SAMPLING IMPORTANCE RESAMPLING (SIR) PARTICLE FILTER
For a typical particle filter, for example [22], the sampling importance resampling (SIR), the sampling framework is expressed as follows: 1) Initialization. Sampling N particles {x (i) T k , i = 1, 2, . . . , N } from the initial prior probability with uniform weight.

2) Prediction and updating. Predicts the new set of particles {x
(i) T k } according to state transition model, the weight {w 3) Estimation. The current state estimate can be obtained as

B. PROPAGATION DELAYED STATE PARTICLE FILTER
In our case, when relative Doppler stretches are used, the measurements can be shown as where δ p represents the relative Doppler stretch of the pth pair of microphones. The relative Doppler stretches are considered to be independent of each other. Based on the SIR particle filtering framework, the weight {w However, as mentioned above, the relative Doppler stretch δ p is a propagation delayed measurement which is corresponding to the PDS x D Tk . Therefore, with PDS being a bridge, the posterior probability can be corrected as where x D(i) The non-linear transformation of probability can be solved by unscented transform(UT) [23]. UT can generate a set of where c is the dimension of the state x We should use another likelihood function to describe it. The likelihood function need show the peak corresponds to likely PDS (i.e., a larger likelihood function value will be treated as a more likely PDS than a smaller value). If we use the Gaussian likelihood function δ p ;δ p x p , the estimation of δ p will be a time-consuming searching process due to the small searching step and repeated process of resampling in Equation (9) and σ p can only be determined through iterating. Thus, a pseudo-likelihood function is proposed to describe the probability distribution P δ p | x where p (x D(i) T k [j]) represents the relative Doppler stretch formed by the two delayed states in x D(i) T k [j], the exponential weight r is used to strengthen the main lobe peak and reduce the side lobe peak so that the likelihood function is closer to the true value. The closer to the true relative Doppler stretch (RDS), the greater value of the likelihood function and the greater weight. The pseudo-likelihood function does not require a search to find the peaks but imposes a weight on the possible state proportionally.
One cycle of the PDS-PF-RDS is depicted in Algorithm I based on the SIR particle filter.

IV. SIMULATION
This section describes the computer simulation of tracking a 2D moving sound source with four microphones, including basic parameter settings, simulation results and discussions.

A. IMPLEMENTATION
Four microphones are used to verify the tracking ability of the method in the 2D scene. For comparison, we choose the same simulation scene used in [19]- [21]. The distribution of these microphones and the source trajectory are shown in Figure 2, and their position coordinates are shown in TABLE 1. The size of the source is ignored. The source moves at a radius of 500m at a constant turn rate w = −0.12rad/s. The initial position is [−500m, 800m] T .

Algorithm 1 PDS-PF-RDS
1. Sampling N particles from initial probability distribution P(x T o ) with uniform weight to form the particle set 2) For the pth pair of microphones, obtain the PDS m corresponding to the PDS and the covariance S 3) With UT transform, get the sigma points r • Update the weight as

Normalize w (i) and get
Get the current estimation of source state:

1) SIGNAL PARAMETERS
The acoustic signals radiated by some moving sources, such as unmanned aerial vehicles or ground vehicles, are often dominated by a few harmonics [24], [25]. In this paper, the VOLUME 8, 2020 single-frequency signal is superimposed to simulate an acoustic signal with a band [1000Hz−1500Hz]. This method is also used in [26], [27] for a moving sound source simulation. The sampling frequency of the microphone is 10KHz, the noise attached to the received signals is Gaussian white noise, and the signal-to-noise ratio is SNR = 20dB. The signal frequency spectrums of the four microphones from t = 4s to t = 5s are shown in Figure 3. Figure 3 illustrates that the frequency spectrums of different microphones at different positions differ due to the different Doppler stretches and the unequal distance traveled.

2) MODEL PARAMETERS
The source motion is modeled with a discretized coordinated turn model with an unknown constant turn rate and with Cartesian velocity. The state variable is and the discretized form of the model is given as where F is state transition matrix, w T k ,T k+1 is the process noise with zero mean and covariance S k , Let T = T k+1 − T k , then F and S k can be obtained as where σ x , σ y , represent the standard deviations of the velocity in x, y directions, σ w is the standard deviations of the turn rate. we assume σ x = σ y = 1m/s 2 , σ w = 0.01rad/s 2 .
The configuration of the simulation parameter is as follows. The number of particles is N = 1000. We assume that the initial particle set is generated from the Gaussian distribution N ((x (i) ; µ; diag 100 2 , 10 2 , 100 2 , 10 2 , 0.05 2 ), and µ is generated from the Gaussian distribution N ((µ; x t 0 ; diag 100 2 , 10 2 , 100 2 , 10 2 , 0.05 2 ) which represents the errors of obtaining the initial position. Considering the propagation time of the signals at the first moment, we start tracking from t 0 = 4s to t end = 45s with the time interval T = 1s. A total number of 1000 Monte Carlo runs were made.

B. TRACKING PERFORMANCE METRICS
The Root Mean Square Error (RMSE) of position and velocity of the source are used as a reference to evaluate the performance of the method, which is defined as where k is the estimation step,l m,k and l m,k refers to the estimated value and real value respectively, and mc refers to the cycle index of the Monte Carlo simulation running.

C. RESULT DISCUSSIONS
To assess the performance of the proposed method, PDS-PF, a comparative experiment with the existing moving sound source tracking methods with DOA is conducted, which is the PDM-PF [19]. In addition, the direct PF regardless of the propagation time delay is also conducted. The differences of the three methods are summarized in TABLE 2.
The RMS position and velocity errors are given in Figure 4 and Figure 5 respectively. The performance of the  PDS-PF-RDS algorithm turns out to be superior to PDM-PF in terms of the position error and velocity estimation, the error of the PDS-PF-RDS can quickly reduce to a small value faster than other algorithms. And the error of the PF method increases fast, although it decreases after 25 seconds, it is still much bigger than the other two methods. In the whole tracking process, the performance of the PDS-PF-RDS is much more stable than PDM-PF and PF. The error of the PDM-PF increases after 10s and is even bigger than the error of the PF around 28s although it can reduce to a small value quickly after 30s.
In order to explain the performance differences of different algorithms, we show the true propagation delay in Figure 6. According to Equation (15) and (16), when the delay is neglected in PF, the current source states are not corresponding to the measurement, the relative Doppler stretch (RDS). When the delay is large, the bias between the current states and propagation delayed states (PDS) is not negligible. Thus, though the error also decreases when the delay is becoming smaller after 35s, the accumulation of the errors before 35s is too large to get a satisfactory performance with the PF method.
In [19], it is reported that when the delays change quite fast, the PDM-PF performance becomes inaccurate, although  the estimates can quickly recover after this period ends. In fact, the stability is affected by the change speed of the propagation delay. When the propagation delay changes fast, the corresponding propagation delayed state (PDS) also changes quickly. In order to keep up the fast changes, we need a smaller time interval to estimate the PDS, but our state transition model changes at a fixed time interval. However, in PDS-PF-RDS, the performance is always stable because randomly distributed microphones can reduce the influence of delay changes caused by the spatial position of a single microphone. Moreover, from Equation (12), relative Doppler stretch contains both the velocity information and position information of the source compared to DOA which is only related to the source position. Thus, when the relative Doppler stretch (RDS) is used as the measurement, the estimation of the posterior probability can be more accurate.
Next, we will carefully discuss some key impact parameters which influence the tracking performance of the propagation delayed state particle filter with relative Doppler stretch. VOLUME 8, 2020

1) EFFECT OF THE TIME INTERVAL
In section II.A, we assume that the frequency spectrum is constant in a short time interval. However, the time interval is not as short as possible. On one hand, a shorter time interval means more tracking steps which requires more computation time in the same tracking process, on the other hand, a shorter time interval results in lower frequency resolution. Therefore, we compare the relative Doppler stretch obtained by theory and measurement for different time intervals to make a compromise. Figure 7 illustrates the average errors of the relative Doppler stretch estimation with different time intervals. From Figure 7, we can observe that the error is very small in a short time interval less than about 1s but will rise much faster with a larger interval. Thus, we choose the time interval is T = 1s in our case. The real value and estimation of relative Doppler stretch is shown in Figure 8.
From Figure 8, the estimated relative Doppler stretch has errors but is similar to the true value. It means that Equation (9) and (10) can work well in obtaining the relative Doppler stretch with T = 1s. In [19], the reason for the interval chosen or the method for DOA estimation is not discussed, but it is worth studying to keep a balance between accuracy and efficiency as mentioned above.

2) EFFECT OF SIGNAL-TO-NOISE RATIO (SNR)
To evaluate the impact of the noise on the tracking performance, corresponding simulation experiments of different SNRs are carried out.
We discuss the performance of Equation (9) under different SNRs and give the limitations. The signals are from the microphone pair (1, 2) from t = 4s to t = 5s, and Figure 9 shows the results.
From Figure 9, we can observe that the Ambiguity function, X (δ), has a good performance when SNR > 15dB. However, the side lobe will be higher than the main lobe when SNR < 10dB. According to Figure 9, the pseudo-likelihood function, X (δ p ), in Equation (29) can only describe the Though the exponential weight r can be used to strengthen the main lobe peak, there is still a wrong peak in the likelihood function under low SNR environment. When the particles weights update in our tracking frame, the particles related to the PDS corresponding to the wrong peak will get very big weights. As a result, the error between the true state and the weighted average of the particles will become large and the number of effective particles will decrease.

V. CONCLUSION
In this paper, we propose a propagation delayed state particle filter framework of acoustic tracking for a moving sound source with relative Doppler stretch (PDS-PF-RDS).
We avoid using complex microphone arrays but randomly distributed microphones to reduce the hardware difficulty in real-world application. We obtain the relative Doppler stretches of each pair of sensors directly by Equation (9) to form the observation vector. Compared to obtaining DOA or TDOA of a moving source, this method does not need intricate preprocessing of the raw signals, such as using de-Dopplerization [12] or dynamic compressive sensing [17]. At the same time, we correct the observation model and construct the propagation delayed state (PDS) to be a bridge to get the corrected posterior probability. Moreover, we use the value X (δ) of the ambiguity function to approximate the weight of each particle instead of calculating the complex posterior probability directly. The simulation results show that the proposed tracking framework has smaller RMS position and velocity errors and can adapt to the changes of the propagation delay in the whole tracking process compared to other algorithms. In addition, we show the whole process of tracking a moving sound source with raw data received by sensors instead of ignoring the preprocessing of the signals.
As a limitation, the current method is only discussed in the subsonic scenes. When tracking a supersonic source, the received signal will contain both leading and lagging signals, and the measurement will no longer be reliable. However, it can still work well in tracking most subsonic sound sources and play an essential role in the multi-source information monitoring system. We will carry out field experiments in the next step, and the research on experiments will be presented in future work.
GUO CHEN received the B.Eng. degree in mechanical engineering from the Ha'erbin Institute of Technology, in 2013. He is currently pursuing the Ph.D. degree in acoustics engineering with the China Academy of Engineering Physics. His current research interests include acoustic signal processing and acoustic localization.
YONGGANG LU received the B.Eng. degree in engineering mechanics from Zhejiang University and the Ph.D. degree from the Beijing Institute of Technology.
He is currently a Professor and a Chief Engineer with the Institute of Systems Engineering, China Academy of Engineering Physics. He has presided over a number of national defense scientific research projects. His research interest includes systems engineering. He received the Military Progress Prize in Science and Technology for many times. VOLUME 8, 2020