94 GHz Asymmetric Antenna Radar for Speech Signal Detection and Enhancement via Variational Mode Decomposition and Improved Threshold Strategy

To further improve the detection distance and sensitivity of bio-radar, a 94 GHz asymmetric antenna radar sensor is employed to detect speech signal. However, the radar speech is often mixed with various noise, which will seriously affect the quality and intelligibility of the speech signal. Therefore, a novel method based on variational mode decomposition (VMD) and improved threshold strategy (ITS) is proposed in this paper for improving the quality and intelligibility of the radar speech. VMD is a novel adaptive decomposition method, which overcomes the problem of mode aliasing and end effect in empirical mode decomposition (EMD). ITS can overcome the limitation of traditional wavelet threshold and achieve the best compromise between speech intelligibility and noise reduction. Firstly, EMD is applied to determine the number of decomposition level, and then radar speech is decomposed into several limited bandwidth intrinsic mode functions by VMD. Secondly, ITS is employed to remove noise from useful modes which are determined by Pearson correlation coefficient (PCC). The performance of the proposed method is evaluated by perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI) and composite measures (CMs). The experimental results show that the radar sensor can detect long distance speech signal and the proposed method can effectively improve the quality and intelligibility of the radar speech signal. Due to the good performance, the proposed method will provide a promising alternative for various applications related to radar speech and traditional microphone speech signal enhancement.


I. INTRODUCTION
Speech, as one of the important physiological signal of the 20 human body, is the most important and effective means of 21 human communication. At present, the present technologies 22 The associate editor coordinating the review of this manuscript and approving it for publication was Hasan S. Mir.
for obtaining speech signals can be divided into air conduc-23 tion and non-air conduction detection. However, the short- 24 comings limit the development of these techniques for speech 25 detection [1], [2], [3], [4], [5]. In recent years, bio-radar tech-26 nology has been developed for using in a variety of remote 27 sensing applications [6], [7]. This article mainly focuses on 28 the application of bio-radar technology in human speech 29 FIGURE 1. Schematic diagram of the 94 GHz asymmetric antenna bio-radar sensor system. wavelet shrinkage [16], minimum mean-square estimator 86 (MMSE) [17]. Afterwards, various improved enhancement 87 algorithms based on the aforementioned method are proposed 88 [18], [19], [20], [21]. These methods promote the develop-89 ment of traditional microphone speech enhancement technol-90 ogy. For the radar speech, the noise sources of signal are more 91 complex than those found in speech acquired using a tradi-92 tional microphone. In our previous study, we have researched 93 many speech signal enhancement methods aimed at different 94 radar systems. These methods restrain the residual noise and 95 improve the speech quality of the corresponding radar system 96 [22], [23], [24], [25]. Although the noise of radar speech is 97 greatly suppressed, the improvement of intelligibility is not 98 obvious. Therefore, it is necessary find a new method aimed 99 at improving intelligibility when reducing noise. 100 Empirical mode decomposition (EMD) method is an adap-101 tive method for processing nonlinear and nonstationary sig-102 nals [26]. Khaldi et al. used EMD method to enhance the 103 quality of speech for the first time, the results showed that 104 the method is effective for additive white Gaussian noise 105 removal from speech [27]. Chatlani and Soraghan proposed 106 a EMDF method for speech enhancement, it is particu-107 larly effective in low frequency noise environments [28]. 108 Zao et al. proposed a EMD and Hurst-Based Mode Selection 109 (EMDH) method for speech enhancement, the results show 110 that the EMDH method improves the segmental signal-to-111 noise ratio and an overall quality composite measure [29]. 112 However, shortcomings of mode aliasing and end effect 113 restrict the development of EMD method in the field of 114 speech enhancement [30], [31]. To overcome the limita-115 tions of EMD technique, a new adaptive signal processing 116 method, called variational mode decomposition (VMD) was 117 proposed, this method provides a solution to the decompo-118 sition problem that is theoretically well founded and easy 119 to understand, it can decompose complex signal into an 120 ensemble of band-limited intrinsic mode functions (IMFs). 121 The experiments show that VMD method outperforms EMD 122 with regards to tone detection, tone separation, and noise 123 robustness [32]. In recent years, VMD method has been 124 widely used in various fields such as fault diagnosis, under-125 water acoustic, and signal denoising. Mohanty et al. [33]  show that the VMD is more efficient than EMD method ing noise [11]. If each IMF is filtered, we find that the 164 noise is suppressed, the intelligibility of the radar speech is 165 poor. Moreover, the traditional estimated noise level, is not 166 accurate in estimating IMF noise, which will take effect on 167 the quality of enhanced speech. Furthermore, although the 168 soft thresholding was widely employed to remove the noise 169 of signal, it may cause signal over-processing, resulting in 170 speech signal distortion [27]. asymmetric antenna radar speech signal into several 185 limited bandwidth intrinsic mode functions (IMFs). Second, 186 an improved threshold strategy (ITS) is employed to remove 187 noise from useful modes which are determined by the Pearson 188 correlation coefficient (PCC).

189
The remainder of this paper is organized as follows. 190 In Section 2, the experimental environment, speech cor-191 pus and detection result and noise characteristic analy-192 sis of the 94 GHz asymmetric antenna bio-radar speech 193 are presented. In Section 3, the method for improving the 194 quality of 94 GHz asymmetric antenna radar speech is pro-195 posed. In Section 4, performance evaluation is provided. 196 In Section 5, experimental results of the proposed algorithm 197 are demonstrated and the performance is evaluated. Finally, 198 the conclusion is provided in Section 6. The 94 GHz asymmetric antenna radar system is mainly com-202 posed of transmitting antenna, receiving antenna, transmitter 203 module, receiver module. One antenna with a diameter of 204 600 mm, is used as receiving antenna, the beam width is 205 0.4 degrees at −3 dB levels, the gain is 50 dB. One antenna 206 with a diameter of 200 mm, is used as transmitting antenna, 207 the beam width is 1 degrees at −3 dB levels, and the gain 208 is 41.7 dB. The more detailed description of the 94 GHz 209 asymmetric antenna radar detection theory and system was 210 shown in our previous work [10], [12].

211
The target sound source selected in this paper includes 212 human sound source and simulated sound source. Five 213 healthy volunteers including 4 males and 1 female were 214 selected as human sound source participated in the speech 215 detection experiment. All of the volunteers (from 20 to 216 28 years old) were native speakers of mandarin Chinese, none 217 of them had a history of voice training or voice disorders. All 218 of the experimental procedures were in accordance with the 219 rules of the Declaration of Helsinki, and all volunteers signed 220 the appropriate consent forms. The experiments were carried out in two scenarios. The  traditional microphones, such ashigh directional sensitivity, 261 strong acoustical disturbance. Therefore, the noise source of 262 radar speech is different from that of traditional microphone. 263 Radar speech is mainly interfered with electromagnetic noise 264 such as inter-modulation interference and cross modulation 265 interference, circuit noise such as thermal nois and scattered 266 nois, environmental noise such as side lobe clutter and human 267 body swerve noise In order to effectively remove the noise 268 from radar speech, we used 94 GHz asymmetric antenna radar 269 system to detect noise signal at two different time periods 270 in thecase of no sound source. Then the spectral analysis of 271 noise signal is carried out, as shown in Figure 5.

272
From the Figure 5, we can observe that the noise is con-273 centrated in the whole frequency range, especially true for the 274 low-frequency components of the speech. Besides, there will 275 be single frequency noise in the middle and high frequencies. 276 On the whole, the noise distribution of 94 GHz asymmetric 277 antenna radar speech is non-Gaussian. Therefore, it is of great 278 significance to apply an appropriate enhancement algorithm 279 to improve the quality of the 94 GHz asymmetric antenna 280 bio-radar speech signal structed, which is expressed as follows: where K is the number of mode number. u k (k = 1, 2, . . . , K ) The Lagrangian L expression is: In the formula, α is a bandwidth parameter of the quadratic 326 penalty term, * denotes the convolution operator, δ(t) is unit 327 pulse function, and λ(t) is a Lagrangian multipliers.

328
The expression updating uk n+1 can be expressed as: By means of Parseval/Plancheral Fourier isometry trans-332 form, the equation (3) can be converted from time domain 333 to frequency domain, and the expression of each mode in 334 frequency domain can be obtained as follows: where,λ n+1 k is updated by formula as follow: where, τ is noise tolerance parameter, then, for the updated 339 center frequencies wk n+1 of each mode, and the expression 340 of updated wk n+1 in frequency domain can be obtained as 341 follows: where, wk n+1 is the center of gravity of the corresponding 344 mode's power spectrum for k mode, ûk (w) is equivalent to the 345 wiener filter of the current residuex(w) − i =kû i (w).

346
On the whole, the VMD algorithm updates each mode in 347 the frequency domain, and then converts it to the time domain 348 by Fourier transform. The mode updating steps are as follows: 349 (1) Initialize {ûk 1 }, {wk 1 },λ 1 , and n= 0;

352
(3) According to Formula (5), updateλ 353 (4) For a given discriminant precision e > 0, until the 354 iteration constraint is satisfied: Stop Iteration; otherwise, return to step (2). . In this paper, PCC is used 419 to distinguish between the useful modes and noise modes 420 after VMD decomposition. The PCC values are calculated 421 between the original radar speech signal and each IMF as 422 follows: where x t and y t are two random variables.x andȳ are the 425 average of the two random variables, respectively.

426
It is concluded that the higher the PCC value, the stronger 427 the correlation between IMF and the original radar speech 428 signal. If the PCC value is high, we can believe the IMF is a 429 useful mode, otherwise is a noise mode. Thus, in order to find 430 the useful modes, a fixed threshold (FT) was defined as 10 −1 . 431 If the PCC value between the original radar speech signal and 432 IMF is greater than the FT, the IMF can be regarded as a useful 433 mode.

435
In order to effectively suppress the noise of the original 436 radar speech, we should select an appropriate threshold T 437 to remove the noise of useful modes before reconstruction. 438 Wavelet threshold has been widely used in noise reduction 439 [46], the threshold is estimated as follow: where N is the signal length, σ is the estimated noise level. 442 In the experiment, It was found that the estimated noise 443 level plays an important role in removing noise from radar 444 speech signal. However, the traditional estimated noise 445 level, is not accurate in estimating IMF noise, which will 446 take effect on the quality of enhanced speech. Further-447 more, although the soft thresholding was widely employed 448 to remove the noise of signal, it may cause signal over-449 processing, resulting in speech signal distortion. 450 Therefore, in this paper, we proposed an improved thresh-451 old strategy (ITS) for removing noise from the radar speech 452 signal. It includes two aspects: a new noise estimated level 453 and improved soft thresholding function.

454
A new noise estimated level of each IMF is given by: where L is the length of the initial silent segment of the radar 457 speech signal. IMF i (t) is the i-th band-limited intrinsic mode 458 function decomposed by VMD.

459
In order to obtain a compromise between noise reduction 460 and intelligibility of radar speech, an improved soft threshold-461 ing function is employed to remove the noise of useful modes 462 VOLUME 10, 2022 before reconstruction.
where m is compensation factor. The final PESQ score is obtained as follow: A higher PESQ score indicates better quality of speech, the 515 highest score is 4.5, it indicates no distortion.

517
The STOI was proposed by Cees et al. for predicting the intel-518 ligibility of noisy speech [49]. The evaluation results show 519 that STOI has high correlation with the speech intelligibility 520 in listening test. In addition, STOI does work well for additive 521 noise of noisy speech for different noise types and SNRs. 522 The STOI score range between 0 and 1, a higher STOI score 523 indicates better intelligibility of speech.

525
To further achieve higher correlation with the subjective 526 scores of evaluation method, composite measure (CM) 527 method was proposed for predicting the quality of noisy 528 speech enhanced by noise suppression algorithms [47]. The 529 authors used the ITU-T P.835 methodology to evaluate the 530 speech quality along three dimensions: composite measure 531 for signal distortion (CSIG), composite measure for noise 532 distortion (CBAK), and composite measure for overall speech 533 quality (COVL). CSIG is a measure for signal distortion 534 (SIG) formed by linearly combining the log likelihood ratio 535 seen that the original radar speech is decomposed 16 IMFs 565 by EMD, the frequency of IMFs decrease gradually with 566 the increase of IMF value. According to equation (8), the 567 decomposition mode number is equal to seven.

568
According to the decomposition mode number, Figure 8a 569 shows the seven IMFs of the original radar speech signal 570 decomposed by VMD method, Figure 8b shows the spectrum 571 of corresponding to each IMF. The longer the radar detection 572 distance is, the more serious the attenuation of high frequency 573 components of speech is, and the wider the frequency range 574 of speech signal itself is, the higher the speech quality it 575 presents.

576
The speech signal frequency range of a normal person is 577 0-4000 Hz, the limit frequency range that determines the 578 intelligibility is 300-1000 Hz. As presented in Figure 8, the 579 frequencies of IMFs increase gradually with the increase 580 of IMF value. It can also be inferred that the waveforms 581 of the former four IMFs are similar to speech signal, and 582 the waveforms of the latter three IMFs are similar to noise 583 signal. We can assume that some modes are mainly noise 584 and interference signal without any speech information which 585 should be rejected before reconstruction.  Figure 9 shows the PCC values of original radar (OR) speech 588 signal and each IMF using equation (9). According to the FT, 589 the former four modes are selected as the useful modes for 590 reconstruction.  Figure 10 shows the comparison of the enhanced radar speech 593 signal. Figure 10a presents the waveform and spectrogram of 594 the clean speech signal synchronously acquired. Figure 10b 595 presents the waveform and spectrogram of the original radar 596 speech. Figure 10c-f present the waveform and spectro-597 gram of the original radar speech after processing using the 598 VOLUME 10, 2022  in Figure 10f, and the energy stripe of the spectrogram is 622 very clear. The waveforms and spectrograms show that the 623 quality and the intelligibility of the original radar speech is 624 significantly improved by the proposed method. The evaluation measures results of original radar speech and 627 enhanced speech by four speech enhancement methods are 628 presented in Table 1. It can be seen from the table that the 629 proposed method yields the highest scores, which indicates 630 that the enhanced speech signal processed by the proposed 631 method has the highest speech quality and intelligibility.

632
Despite the PESQ score of enhanced speech by the EMD 633 shrinkage method is lower than the original radar speech, 634 the STOI, CMs scores are all higher than the original radar 635 speech. Combining it with the result of spectrograms, it can 636   the original radar speech with SNR 0, 5, 10 and 15 dB.  Table 2 and Figure 11, we can find that 653 the proposed method yields the highest scores than the origi-654 nal radar speech and the other three methods. It is suggested 655 that the effectiveness of the proposed method in removing the 656  white noise of the radar speech. The quality and intelligibility 657 of the radar speech signal is greatly improved. It also can be 658 found that the STOI score is not increased for the other three 659 methods, but the CMs score is increased for wavelet-soft and 660 EMD shrinkage methods, especially for the EMD shrinkage 661 method. It suggests that the EMD shrinkage method can 662 obtain a good tradeoff between the intelligibility and noise 663 reduction, but the results are not entirely satisfactory. For the 664 proposed method, the quality and intelligibility of the radar 665 speech signal can be greatly improved. 666 VOLUME 10, 2022  For pink noise, the evaluation measure results are shown 667 in Figure 12 and Table 3. For hfchannel noise, the evaluation 668 measure results are shown in Figure 13 and Table 4. We can 669 find that the proposed method is superior not only in white 670 noise but also in pink and hfchannel noise conditions. How-671 ever, we can observe that all the scores are not increased 672 to some degree for speech processed by the wavelet-hard, 673 wavelet-soft and EMD shrinkage methods in pink noise con-674 dition. For hfchannel noise, the results are consistent with 675 that in white noise described above. It also can be indicated 676 that the proposed method is quite effective in hfchannel noise 677 condition.

679
In order to further test the performance of the proposed 680 algorithm in improving the quality and intelligibility of 681 the original radar speech signal. One English language let-682 ters ''a-b-c-d'' detected by 94 GHz asymmetric antenna 683 radar is enhanced by four speech enhancement methods. 684 Figure 14 shows the comparison of the enhanced radar speech 685 signal for the ''a-b-c-d''. Figure 14a presents waveform 686 and spectrogram of the clean speech signal synchronously 687 acquired. Figure 14b shows the waveform and spectrogram 688 of the original radar speech signal. Figure 14c shows that 689 the noise of original radar speech has not been effectively 690 reduced, and some new noise signal is introduced, this results 691 in severe radar speech distortion. Figure 14d shows the EMD 692 soft method is effective in reducing the combined noise of 693 the radar speech, but there is still too much remnant noise, 694 so the quality of the radar speech was not improved. From 695 Figure 14e, It can be seen that the noise has mostly been 696 removed. However, the clarity of energy stripe is affected 697 to some extent. Figure 14f shows the proposed method can 698 effectively reduce the noise across all of the frequency com-699 ponents, the quality and the intelligibility of the radar speech 700 signal are greatly improved. These results can be further 701 proved in the Table 5.

703
In this paper, a 94 GHz asymmetric antenna bio-radar is 704 employed to detect speech signal detection. The structure of 705 the asymmetric antenna bio-radar system has a high gain and 706 the ability to obtain speech signal from remote distance. How-707 ever, the original radar speech signal is always disturbed by 708 complex noise, which include ambient, electromagnetic and 709 electrical circuit noise. These types of noise greatly degrade 710 the quality of the radar speech. Due to the special charac-711 teristics of the radar speech signal, a novel method based 712 on VMD, EMD and ITS is proposed to improve the quality 713 and the intelligibility of the original radar speech signal. 714 In our experiments, we show that the proposed method clearly 715 outperforms wavelet-hard, wavelet-soft and EMD shrinkage 716 methods. Furthermore, the PESQ, STOI and CMs scores 717 indicate that the proposed method can effectively enhance 718 the quality and the intelligibility of the original radar speech 719 signal.

720
In conclusion, the proposed method is more suitable than 721 the other above mentioned methods for 94 GHz asymmet-