Reverberation Suppression Method for Active Sonar Systems Using Non-negative Matrix Factorization with Pre-trained Frequency Basis Matrix

In active sonar systems, detection always suffers from reverberation interference from multiple scatterers in oceanic environments; therefore, numerous studies have been conducted on reverberation suppression. Recently, a non-negative matrix factorization (NMF)-based method was proposed and successfully applied to reverberation suppression. However, the conventional NMF-based method makes convergence challenging because the frequency basis matrix is initialized without considering reverberation characteristic information from oceanic environments. To solve these problems, We propose an improved NMF-based reverberation suppression method adopting a pre-trained reverberation basis matrix and modified sparse update rule. The proposed method is evaluated by analyzing simulation and sea experiment data and the study confirmed that the detection performance was improved compared to the conventional method under various signal-to-reverberation ratio conditions. Several topics are also discussed to analyze the proposed method in detail.


I. INTRODUCTION
A CTIVE sonar is an underwater surveillance system to detect, track, and localize targets. An active sonar system transmits pulses to investigate underwater situations and analyze received signals. During propagation, both the target as well as the unwanted scatterers reflect the transmitted pulse. Therefore, interference signals, called reverberation, corrupt the received signal due to unwanted scatters. Because reverberation strongly correlates with the signal reflected by the target, it deteriorates target detection performance. Accordingly, research on how to suppress reverberation is essential for active sonar systems [1,2,3,4,5,6].
Autoregressive (AR) prewhitener for continuous wave (CW) reverberation is an early approach suppressing the reverberation spectrum using inverse filters, designed by estimated AR coefficients [7]. An AR prewhitener uses the adjacent signal with reverberation as a reference signal to estimate the reverberation spectrum and design an inverse filter. The designed inverse filter is used to suppress reverberation from the current signal with both the target and reverberation.
Because a linear frequency modulated (LFM) pulse has a wideband spectrum, the AR prewhitener for CW reverberation works incorrectly. Therefore, Choi et al. [8] proposed an AR prewhitener using dechirping transformation as preprocessing. This pre-processing makes the received LFM signal tonal spectrum a CW case. Thus, like CW case, an inverse filter can be designed using AR modeling.
Alternatively, the principal component inversion (PCI) algorithm has been proposed [9]. It models the reverberation signal as the sum of multiple reflected signals and estimates a low-rank approximation of the forward signal matrix using singular value decomposition. As a follow-up paper, the signal subspace extraction algorithm [10] modeling reverberation as the sum of the higher and lower part, and a PCIsupport vector machine (SVM) algorithm [11] combining PCI and SVM, were proposed.
Although AR and PCI-based reverberation suppression methods have been widely used, pulse characteristics are interpreted and used only from a time or frequency domain perspective. Recently, Lee et al. [12] proposed a non-negative matrix factorization (NMF)-based CW reverberation suppression method that simultaneously uses pulse time and frequency characteristics. The received signal is transformed into the time-frequency domain, called a spectrogram, and the spectrogram is decomposed with frequency and time basis matrices. Because a CW target echo is represented as a straight line in the spectrogram, ideally, only a few frequency and time bases are needed to represent the target component. If a new spectrogram is reconstructed using only the target echo basis, the reverberation is removed, and only the target echo signal appears. That is, the reverberation has been suppressed.
As with the AR prewhitener, the NMF-based CW reverberation suppression method has been extended to be applied to LFM reverberation [13]. The modified NMF-based reverberation suppression method adopted two pre-processing techniques, namely, dechirping transformation and modulo operation. The dechirping transformation presents the LFM echo signal as a tonal-like spectrum, as with Choi's method. The modulo operation prevents the disconnected effect of block signal processing.
Although the usefulness of the conventional NMF-based reverberation suppression method has been verified, a problem exists that convergence might become challenging, depending on environmental fluctuation because parts of the W matrix are forced to be fixed, and others are randomly initialized [12]. Scaling between the target and reverberation frequency basis component also creates problems.
In this paper, we propose an improved NMF-based reverberation suppression method with a pre-trained reverberation frequency basis matrix to solve the convergence problem of the conventional NMF-based method. Pre-training the reverberation frequency basis matrix is performed using adjacent signals considering the Itakura-Saito distance, which measures the spectrum's similarity. Because the proposed method learned the frequency basis matrix in advance, its NMF update process converges stably. To solve the scaling problem, the power ratio parameter was adopted to balance the imbalance between the target and reverberation basis. A modified sparse update rule was also adopted to improve the NMF update process.
The rest of this article is organized as follows. In Section II, we present the problem statement, signal model, and conventional NMF-based reverberation suppression method. In Section III, we present the proposed NMF-based reverberation suppression method. In Section IV, we present the computer simulation and sea experiment analysis, respectively, to evaluate the proposed method. In Section V, several topics for the proposed method are discussed. Finally, in Section VI, we conclude this study.

A. PROBLEM STATEMENT
The CW pulse used in active sonars can be expressed as where f c is the center frequency and T is the pulse length. Fig. 1 shows the concept of receiving a signal in an active sonar environment. The sonar platform moves forward with speed v and the beam is steered to direction θ. The received beam signal at target direction θ can be expressed as where s e (t) and s r (t) are the received target echo and reverberation signals, respectively, and n(t) is the noise signal, modeled as a zero-mean Gaussian distribution. The target echo signal can be expressed as where a e is the attenuation factor and τ e is the time delay of the target echo signal. The target signal's frequency spectrum can be expressed as where S p (f ) is the Fourier transform of the CW pulse in (1). The Doppler shift frequency of echo signal f e can be expressed as where v, θ, c, and v e are the platform speed, beam steering angle, the speed of sound in underwater, and the relative target speed with the platform, respectively. From (5), the target spectrum is around f c 1 + 2v c cos θ , but decided by its motion.
The reverberation signal can be modeled as the sum of replica signals of the transmitted pulse from multiple scatters [6,14]. Because the beam pattern attenuates the reverberation effect from other directions, the reverberation signal in the received beam signal at direction of θ can be expressed as follows: where a i is the attenuation factor affected by the beam pattern. τ i is the time delay modeled as a uniform distribution over the interval. The frequency spectrum of the reverberation signal can be expressed as Typically, scatters are assumed immobile; therefore, the Doppler shift frequency of ith scatter f di is determined by platform speed v and beam steering angle θ and can be expressed as where θ i is the scatter angle. From (8), the reverberation spectrum is distributed in range of in the frequency domain. Note that because the beam pattern's sidelobe attenuates the reverberation from other directions, reverberation spectrum appears in the form of spreading around f c 1 + 2v c cos θ . Fig. 2 shows the concept of the frequency spectrum of the received beam signal described (2) to (8). From this figure, the received CW target echo signal will be buried in the reverberation spectrum [1,14].
The goal of this study is extracting the target echo signal from the received signal corrupted by reverberation. To accomplish this goal, the received signal's spectrogram is analyzed using NMF, and the components of the target echo and reverberation are decomposed. Note that we focus on CW reverberation suppression because it is easy to extend the NMF-based CW reverberation suppression method to the LFM reverberation case using the pre-processing method proposed by Kim et al. [13].

B. NON-NEGATIVE TIME-FREQUENCY MODEL OF THE RECEIVED SIGNAL
Lee and Seung originally developed the NMF method [15], inspired by psychological and physiological evidence for parts-based representations of the brain. The NMF method limits data to non-negative leading to a parts-based representation because they allow only additive combinations. Therefore, the NMF method has the advantage of representing the sparse matrix.
To apply the NMF method to the received signal, it must be expressed as a non-negative matrix. Therefore, the timefrequency analysis method is applied to the received signal and converted into spectrogram data by taking only the magnitude component. For this purpose, a short-time Fourier transform (STFT) is generally used and can be expressed as [16] [ where x(m) is the sampled received beam signal, w(m) is the window function of l w samples, m is the index of each sample, and ∆h is the hop size. k and n are the frequency bin and time frame bin indices, respectively. The magnitude square of (9) yields a spectrogram V as follows: where [V] (k,n) is the (k, n) element of matrix V. Furthermore, the phase information is defined as where Re(a) and Im(a) are the real and imaginary parts of a, respectively.

C. CONVENTIONAL NMF-BASED REVERBERATION SUPPRESSION METHOD AND ITS LIMITATIONS
From the non-negative spectrogram in (10), we could develop the NMF method. NMF decomposes the non-negative matrix into a multiplication of two non-negative matrices [15]. One advantage of the NMF method is the ability of part-based representation, meaning that NMF can analyze the sparse components in the matrix. In the CW active sonar, the CW echo signal is shown as a straight line, sparse components in the spectrogram. NMF can extract the sparse component by analyzing the frequency and time information simultaneously; therefore, NMF is suitable for sonar reverberation suppression. The NMF model is expressed as where V ∈ R + K×N , W ∈ R + K×L , H ∈ R + L×N , and E ∈ R + K×N . The matrix V is the spectrogram of the input signals with K frequency bins and N time frames, and W and H are the frequency and time basis matrices, respectively. Further, L is the rank of NMF. The matrices W and H can be estimated from the input spectrogram V using the multiplication update (MU) rule using a cost function with the Kullback-Leibler (KL) divergence [17].
Recently, Lee and Lim [12] proposed an NMF-based CW reverberation suppression algorithm. To establish the frame of the algorithm, they first divided the input spectrogram into the target echo and reverberation parts and applied different constraints to each basis. The frequency basis matrix W and time basis matrix H are, respectively, expressed as where W P ∈ R + K×L P and W R ∈ R + K×L R are, respectively, the target echo and reverberation parts of the frequency basis matrix; H P ∈ R + L P ×N and H R ∈ R + L R ×N are, respectively, the target echo and reverberation parts of the time basis matrix; L P and L R are the basis numbers of the target echo and reverberation, respectively, and L = L P + L R . Then, the NMF model in (12) can be rearranged as Because of the division, NMF could estimate information on the target echo and reverberation separately. In (15), the additivity assumption of the target echo part W P H P and reverberation part W R H R in the NMF model (15) is reasonable because it agrees with the signal model (2). The CW target echo components were shown as horizontal lines in the spectrogram. In other words, the CW target echo information is concentrated in a narrow or single frequency bin and is continuous along several time bins. Fig. 3 shows the meaning of this assumption. In the figure, matrix V is decomposed as W P , W R , H P , and H R . As W P is fixed in advance, after decomposition, information on the target (green cell) in V appears in a form in which a specific region of the H P matrix is activated on a specific frequency basis in W P . Therefore, we can reconstruct the target echo signal from W P and H P , shown inside the red rectangles.
To use the CW pulse information, in the conventional NMF-based method, W P is initialized with a set of onehot encoded vectors, which represent the specific Doppler target echoes and are fixed during the update. The target echo frequency basis matrix W P can be expressed as follows: where w P,l ∈ R + K×1 is the l th target echo frequency basis. w P,l is expressed as The value one is placed at the frequency bin corresponding to the target echo-Doppler shift. This structure is similar to the Doppler replica of the matched filter, verifying all hypotheses of the target Doppler to be detected. Then, other matrices are updated using the MU rule with several constraints until convergence is achieved. Finally, echo information is reconstructed from target echo basis matrices. Although the conventional NMF-based reverberation suppression method has been well used [12,13], room for performance improvement exists. The conventional NMFbased reverberation suppression method's limitations can be explained as follows. First, the NMF update process of the conventional method is sensitive to environmental conditions because parts of the W matrix are forced to be fixed, and others are randomly initialized. Therefore, hyperparameters must be set well for NMF updates to converge stably. Second, the conventional method is sensitive to the imbalance between the target and reverberation signals because the level scaling between the target and reverberation frequency basis components is excluded in the initial W matrix. Accordingly, a problem might occur in that the time basis power is unfairly estimated and can be concentrated on either the target or reverberation. Consequently, convergence might not be performed well, or even if convergence occurs, reverberation suppression might fail because non-target components are emphasized. These problems are related to setting the initial NMF value, and similar cases have been reported in studies [18,19].

A. CONTRIBUTION OF THE PROPOSED METHOD
We modified the conventional NMF-based reverberation suppression method to overcome its limitations. The main contributions of this article are as follows. First, the reverberation frequency basis matrix W R was initialized (pre-trained) before NMF update to making it stable and robust with environmental fluctuations. This concept is widely employed in speech, audio, and acoustic signal processing [20,21,22], but we apply it to the sonar reverberation suppression method for the first time. Second, a power normalization scheme was adopted to compensate for the imbalance between the target echo and reverberation signal. The Frobenius norm was employed to normalize the target echo frequency basis matrix W P and reverberation frequency basis matrix W R , and a power ratio parameter was adopted to tune their relationship. Third, sophisticated sparse update rules adjusting the update rate were proposed. The proposed sparse update rules measure the sparseness of an updated matrix and applie it to the step-size parameter of the NMF update.

B. PRE-TRAINING THE FREQUENCY BASIS MATRIX WITH POWER NORMALIZATION
In the conventional method, W R was randomly initialized and updated using a simple MU rule. However, since the target and reverberation are received complexly through the oceanic environment, and the target is often buried in the reverberation signal. Therefore, not only the target component but also the unnecessary reverberation component might be activated during the NMF update process. Although the NMF method provides few constrained options to enhance the convergence performance, it is sensitive to the hyperparameter selection. Fig.4 (a), (b), and (c) show the converged H matrix using the conventional method, where parameter α was set to 1, 10, and 50, respectively. In the figures, the red dashed line is the boundary between the target and reverberation base (based on the boundary line, the upper and lower parts mean the target and reverberation bases, respectively). For α = 10, the target component appeared because NMF converged with ease. For α = 1, however, unnecessary components besides the target component appeared. For α = 50, the target base component was inactivated.
To overcome the limitations of the conventional method, we adopted a methodology of taking information from the reference reverberation signal in research on AR prewhiteners [7,8] and combined it with the NMF method. This scheme can be implemented by learning the reverberation frequency basis matrix W R in advance. Notably, this scheme of pre-trained basis matrix is widely used in speech, audio, and acoustic signal processing [20,21,22].
The reverberation frequency basis matrix W R can be trained with the adjacent time or beam signals under the assumption that adjacent signals would have similar statistical characteristics of reverberation in the current block signals. We used the simple MU rule to train the W R in advance as follows: where V R is the spectrogram reference signal, which con-taining reverberation components;H R is a dummy reverberation time basis matrix, used temporarily to train W R . Notably, we used the tilde symbol to distinguish H R , which is updated in the main NMF process. There is a possibility that the pre-trained and actual reverberation frequency bases might mismatch. However, if the reference reverberation signal is locally stationary, there is no problem using it [7,8].
Moreover, a little mismatch can be compensated for during the NMF update process. The detailed mismatch issue will be discussed in Section V-B. Through (16) to (19), we can initialize frequency basis matrix W. However, the relationship between the level of target echo and reverberation signals is excluded in this formula. In this study, we devised a formula considering the imbalance level. To complete the formula of frequency basis matrix W, we normalized both target echo part W P and reverberation part W R using the Frobenius norm and adopted a new parameter as where γ is the power ratio parameter and A 2 F is the Frobenius norm of matrix A defined as adding the square of the absolute value to all elements of A. The power ratio parameter γ should be adjusted appropriately to ensure the performance. This issue will be discussed in Section V-A.

C. ESTIMATING THE TIME BASIS MATRIX WITH THE MODIFIED SPARSE UPDATE RULE
To use the target echo information that has continuous features in the time bin, H P is estimated using a cost function expressed as [12,13,23] C(W P , H P ) = C RE (W P , H P )+αC T C (H P )+βC T LL (H P ), where C RE (W P , H P ), C T C (H P ), and C T LL (H P ) are the costs of reconstruction error, temporal continuity (TC), and temporal length limitation (TLL), respectively. α and β are the weighting parameters for TC and TLL, respectively [12,23].
The gradient of each cost is the weighted sum of the gradients of each cost and expressed as Notably, we omitted the parentheses for convenience. If we split this gradient into positive and negative terms, (22) is expressed as Then, we can express the total gradient term as where the positive terms ∇ + H P C and the negative terms ∇ − H P C are, respectively, given by Consequently, H P can be estimated using [23] See Appendix A for detailed definition of each cost and derivation of each gradient term. In reference paper [13], a sparse update rule for matrix H P was proposed to improve NMF performance. Although the proposed sparse update rules work well, the rapid update rate due to sparse constraints causes a problem of highlighting local non-target components and fail to find the target information, even if the NMF algorithm converges. Therefore, we modified sparse update rules for H P as follows where I is an identity matrix. ∇ + H P C and ∇ − H P C are the positive and negative total gradient terms in (25) and (26), respectively.D is a modified normalized basis power matrix controlling the NMF update rate more sophisticated than the conventional method. See Appendix B for the detailed derivation process and Section V-C for a detailed discussion.
Estimating the reverberation basis is achieved using the MU rule without additional constraints as follows

D. RECONSTRUCTING TARGET ECHO SIGNAL
The NMF method is applied iteratively using (28), (29), and (30) until convergence is attained. After convergence, the target echo information can be estimated by multiplying the matrices W P and H P . The NMF results can be expressed aŝ Because (31) has only magnitude information, phase information is needed to complete the target echo signal spectrogram. It is easily achieved using the original phase information in (11) [12]. The output time signal with reverberation removed is reconstructed by applying the inverse STFT as followsx (m) = ISTFT V exp (∠X) where ISTFT means inverse STFT transformation, the inverse process of (9). We summarize the proposed reverberation suppression method in Table 1.

IV. PERFORMANCE EVALUATION A. SIMULATION ANALYSIS
We evaluated the proposed method using simulation. The received beam signal was generated based on (2), (3), and (6) with 0.4 s CW pulse. The number of point scatterers to synthesize the reverberation signal was set to 1000. The arrival time was set to have a uniform distribution of 3 s, and the attenuation factor was set to a Gaussian distribution with a mean of 1 and a standard deviation of 0.1. The platform speed was set to 4 m/s such that 2v c = 0.0053. STFT was performed using a 0.5 s Hamming window and a 75% overlap. The numbers of NMF target and reverberation echo bases (L P and L R ) were set to 33 and 330, respectively. The TC weighting parameter α, TLL weighting parameter β, power normalization factor γ, and expected echo length L T were set to 10, 1, 0.75, and 0.52 s, respectively.
The NMF method's matrices were initialized to perform the update process. The time basis matrix H was initialized with a non-negative value, and the target echo frequency basis matrix W P was initialized with one-hot encoding (16)  and (17). For the conventional method, the reverberation frequency basis matrix W R was initialized with a nonnegative value. However, for the proposed method, the MU rule learned W R using the reference reverberation signal. Fig. 5 shows the time signal of the generated beam signal. Fig.6 and Fig.7 show the spectrograms of the target echo signal and generated beam signal, respectively. The target echo signal was set to arrive at 1 s with a 1.0026 normalized Doppler. The signal-to-reverberation ratio (SRR) was set to −19 dB. In Fig.7, the target echo signal, highlighted with a red ellipse, is buried in the reverberation; thus, challenging to identify.
In Fig. 8, we present the estimated frequency and time basis matrices (W and H, respectively) of the conventional and proposed methods, respectively. The figure shows that each basis of the reverberation frequency basis matrix of the conventional method does not occupy a specific frequency bin. However, each basis of the reverberation frequency basis matrix of the proposed method occupied a specific frequency bin, and time bases were activated in specific regions. This phenomenon occurred because the reverberation basis matrix of the proposed method acquired reverberation information from the reference reverberation signal; therefore, it could accurately estimate the reverberation components. Fig. 9 (a) and (b) depict the NMF output of conventional and proposed methods, respectively. From Fig. 9 (a), energy occurred in components other than the target, but the result of the proposed method in Fig. 9 (b) showed only the target component; most reverberation components were removed. Notably, near the normalized frequency 1, components were strongly suppressed because the NMF algorithm accurately estimated the reverberation components.
We used a normalized matched filter to compare the proposed method's detection performance when the reverberation suppression algorithm was and was not applied. The normalized matched filter is defined as follows [24] where s p (t, η) is the Doppler replica of the transmitted CW pulse in (1), and x(t) is the received target echo signal in (2). In the simulation, the normalized Doppler range was set to 0.9947 to 1.0053. Fig. 10 shows the normalized matched filter results at an SRR of −19 dB. The normalized matched filter results using the original signal could not show a clear target echo peak. However, the normalized matched filter results using the output signal of the proposed method showed a clear target echo peak because reverberation was effectively suppressed. Although the normalized matched filter results using the output signal of the conventional method also showed a target echo peak, its performance was inferior to the proposed method.
We quantitatively analyzed the detection performance by calculating the receiver operating characteristics (ROC) curve performing 300 Monte-Carlo simulations per SRR in Fig. 11. In this figure, four cases are compared corresponding to without pre-processing, random initializing with the conventional sparse update rule, random initializing with the proposed sparse update rule, and pre-trained initializing with the proposed sparse update rule. From these figures, the proposed sparse update rule is superior to the conventional sparse update rule, and when combined with the pre-trained initializing method proposed in this paper, the proposed NMF-based reverberation suppression method performs better than the conventional method in all conditions.

B. SEA EXPERIMENTS ANALYSIS
To further evaluate the proposed NMF-based reverberation suppression method, data of sea experiments conducted at the Eastern Sea of Pohang, Republic of Korea, were analyzed. A 1.0 s CW pulse and an echo repeater were used to imitate the target echo signal.     12 shows the received beam signal of sea experiment data. A direct blast signal arrived at 4 s, and reverberation was subsequently received. Unfortunately, the received echo repeater signal arrived far from the direct blast arrival, approximately 17 s. Consequently, it was in the noise-limited region. Therefore, we must cut out the echo repeater signal and relocate it to the reverberation-limited region to reasonably evaluate the reverberation suppression methods. Fig. 13 shows the spectrogram of the echo repeater signal, and Fig. 14 shows the signal synthesized by cutting the target echo repeater signal. The target echo signal was set to arrive at the 10 th time bin, and Doppler was estimated to be 1.003. Figs. 15 (a) and (b) show the NMF output of the conventional and proposed methods, respectively. When the conventional method was used, reverberation components remained, whereas they were effectively removed when the proposed method was used and target echo was emphasized. From Fig.  15 (b), the component around normalized frequency 1.002 was significantly attenuated because the component corresponding to the pre-trained reverberation basis component was effectively removed. This phenomenon was predicted in the simulation results. Fig. 16 shows the normalized matched filter results of sea experiment data. Fig. 16 (a) shows the normalized matched filter results using original sea experiment data, and no target echo peak appeared because the reverberation remained strong. It masked the target echo signal. Figs. 16 (b) and (c) show the normalized matched filter results using the output signal of the conventional and proposed methods, respectively. Although the conventional method suppressed the reverberation components, superior reverberation suppression performance was seen when using the proposed method. Therefore, the proposed method outperforms the conventional method, even in the actual sea environment.

V. FURTHER DISCUSSIONS A. DISCUSSION FOR HYPERPARAMETER SETTINGS
In the proposed NMF method, three major hyperparameters α, β and γ affect the NMF performance results. To evaluate the performance according to hyperparameters, we conducted 100 Monte-Carlo simulations and calculated ROC curves at SRR −25 dB. Fig. 17 (a), (b), and (c) shows the ROC curves with various α, β and γ, respectively. While analyzing certain parameters, other parameters are the same Sec. IV. From these figures, the value of α and β does not significantly affect the performance. Therefore, the proposed method converges stably, regardless of hyperparameter tuning, unlike the conventional method. This robustness is because the frequency basis matrix is trained in advance using a reference signal. For the hyperparameter γ case, ranges of 0.7 to 0.9 show good performance. With our further investigations, the optimal value for γ depends on the reverberation level. Therefore, it can be fine-tuned if we know the reverberation level in advance. Such hyperparameter tuning typically occurs in the other reverberation suppression method, like the PCI algorithm [9,10].

B. DISCUSSION FOR MISMATCH ISSUE OF PRE-TRAINED REVERBERATION BASIS MATRIX
In this paper, we proposed a pre-trained reverberation basis matrix using a reference signal. Therefore, the spectrum VOLUME 4, 2016 similarity between the reverberation signals in the target beam and reference signals directly affects the performance.
To analyze the reference mismatch issue, we simulated some reference reverberations, which have various spectrum similarity conditions. Note that spectrum similarity is measured using the Itakura-Saito distance [25]. Fig. 18 shows a comparison of matched filter results with the Itakura-Saito distance is 0.58, but when it is increased to about 0.78, the performance deteriorates noticeably. This phenomenon is interpreted that the reference signal should have similar spectral characteristics as the reverberation signal of the target beam signal. Fig. 19 shows the ROC curve calculated using 100 Monte-Carlo simulations at SRR −25 dB using Itakura-Saito distances of 0, 0.5, and 1. In this figure, the performance decreases as the Itakura-Saito distance increases, but a value of 0.5 guarantees better performance than the conventional method. Fig. 20 shows the Itakura-Saito distance according to the time difference measured from sea experimental data. In the figure, as the time difference increases, the Itakura-Saito distance gradually increases. However, the signal within 1 s has an Itakura-Saito distance as low as 0.2 to 0.4, and most signals within 3 s have an Itakura-Saito distance lower than 0.6. From the ROC analysis in Figs. 18 and 19, it is desirable to use a reference signal with an Itakura-Saito distance of 0.6 or less. Therefore, it is desirable to use a signal close to the target signal within roughly 3 s as a reference signal.

C. DISCUSSION FOR THE NEW SPARSE UPDATE RULE
In this subsection, the update process and results of the algorithm were analyzed to verify that the proposed sparse update rule mechanism is valid. Fig. 21 compares the sparseness of the normalized power vector d, whose elements are the power of each time basis vector. Therefore, d being sparse means that only a certain time basis vector is being strongly activated. In this figure, the sparseness continuously increases in the conventional case and decreases at some point in the proposed case. This phenomenon can be interpreted as appearing because the step-size is adjusted to a lower level when the sparseness becomes too large in the proposed case. Figs. 22 (a) and (b) show the l2-norm of each row vector of H P when using the conventional and proposed cases, respectively. In this figure, only one specific basis power continues to increase in the conventional case because, if it initially converges erroneously, excessive sparsity continuously emphasizes the corresponding component. However, in the proposed case, the power of multiple bases converges to a lower level as the update is repeated, reducing the problem of excessive sparsity.
Figs. 23 (a) and (b) show NMF results using the conventional and proposed cases, respectively. While the conventional method emphasized only one non-target component, the proposed method successfully emphasized the target component because, as analyzed in Figs. 21 and 22, the proposed method adjusts the step-size according to sparseness.

VI. CONCLUSIONS
An NMF-based reverberation suppression method with a pretrained frequency basis matrix was proposed. The proposed method improved the initialization problem of the conventional method and the sparse update rule. The proposed method was verified by evaluating it using simulation and sea experiment data. In the simulation, the comparative analyses of spectrogram, normalized matched filter, and ROC curve are conducted and the proposed method showed superior detection performance than the conventional method when it converges in valid SRR conditions. In the spectrogram and normalized matching filter analysis using sea experimental data, the proposed method showed superior performance as expected in the simulation. Hyperparameter setting, pretrained reverberation basis mismatch issue, and new sparse update rule were discussed to analyze the proposed method from various viewpoints.

APPENDIX A REVIEW OF COST FUNCTION AND GRADIENT TERMS IN THE NMF METHOD
The cost function comprises three parts. The reconstruction error (RE) C RE is defined by the KL divergence, which is generally used in the NMF method as [17] C RE (W P , and its gradient terms are given as The temporal continuity (TC) is measured by assigning the cost of large changes between adjacent frames for each row component in H P . The TC cost function is defined as [23]  and its gradient terms are given as where H P ←1 , H P ←1 are variants of H P shifted one column to the left and right, respectively, and 1 N ×N is an N ×N matrix whose elements are all ones. Furthermore, A 2 indicates an element-wise square of matrix A. The temporal length limiting (TLL) is used to penalize activated components longer than the expected length of the target echo signal. The TLL cost function is defined as [12] and its gradient terms are given as where [A] (k,n) is the (k, n) element of matrix A, and l T is the expected target echo length. H P (k,n) is calculated using the moving sum of each target echo time basis, i.e., the rows of H P . It is expressed as

APPENDIX B DERIVATION OF SPARSE UPDATE RULE WITH ADJUSTING UPDATE RATE USING SPARSENESS
The original MU rule in (27) is a modified element-wise gradient descent algorithm, expressed as follows: where η (k,n) is the step-size parameter for updates. C, we can obtain (27). Using this process, if we appropriately control the step-size, we can reflect our intention.
Recently, Kim et al. [13] proposed a sparse constraint update rule. They implemented sparse constraints by setting the step-size parameter proportional to the power of each basis vector as where d r is the normalized power of each basis vector. It is expressed as where [H P ] r is the rth row vector of H P and · 2 indicates the l 2 -norm. Notably, we constrain each power of the echo time basis, not each element of H P . If we substitute (44) into (45), it yields the new element-wise update rule as (47) We can formulate these element-wise update rules in a matrix form as follows: where I is an identity matrix, ∇ + H P C and ∇ − H P C are positive and negative total gradient terms in (25) and (26), respectively. D is a diagonal matrix, comprising the normalized power of each basis vector. It can be expressed as where d is the normalized power vector and diag( · ) indicates a diagonal matrix. The problem of the conventional sparse update rule in (48) is that it can highlight local non-target components that appeared during the NMF update because of too much emphasis on sparseness. This phenomenon might cause an inability to find the target information, even if the NMF algorithm converges. Therefore, we proposed a new step-size parameter, which is the weighted sum of the original and sparse step-size parameters. The weighting ratio was determined by measuring the sparseness that appeared according to the NMF update.
The proposed step-size parameter is expressed as C (51) where S(d) is the sparseness of the updated matrix. From this equation, the original and sparse step-size parameters are combined via a sparsity term. As sparsity increases, the preceding weights become smaller; consequently, the sparse step-size has a smaller impact and vice versa.
The sparseness of basis power vector d can be measured as follows [26]: where L P is the number of elements in vector d, and | · | is the absolute value.
The modified step-size parameter shows that d r in (45) appears in a modified form (d r + S(d)(1 − d r )). Therefore, the proposed sparse update rule can be written as follows, with the new normalized power matrixD: LG Electronics, and from 2014 to 2018, he was an assistant professor at the Department of Electronics Engineering, Kyonggi University, Suwon, Republic of Korea. Since 2018, he has been an assistant professor (promoted to associate professor in 2020) at the School of Electronics Engineering, Kyungpook National University, Daegu, Republic of Korea. His research interests include acoustic, sound and music signal processing, array signal processing, and blind source separation. VOLUME 4, 2016