Tianshu Qu - IEEE Xplore Author Profile

Showing 1-25 of 25 results

Filter Results

Show

Results

As artificial intelligence-generated content (AIGC) continues to evolve, video-to-audio (V2A) generation has emerged as a key area with promising applications in multimedia editing, augmented reality, and automated content creation. While Transformer and Diffusion models have advanced audio generation, a significant challenge persists in extracting precise semantic information from videos, as curr...Show More
The Transformer model, particularly its cross-attention module, is widely used for feature fusion in target sound extraction which extracts the signal of interest based on given clues. Despite its effectiveness, this approach suffers from low computational efficiency. Recent advancements in state space models, notably the latest work Mamba, have shown comparable performance to Transformer-based me...Show More
The traditional feedback Active Noise Control (ANC) algorithms are built upon linear filters, which leads to reduced performance when dealing with real-world noise. Deep learning-based feedback ANC algorithms have been proposed to overcome this problem. However, methods relying on pre-trained neural networks exhibit performance degradation when encountering noise from unseen scenes in the training...Show More
Current Sound Event Localization and Detection(SELD) methods mainly adopt the output format from SELDnet that the Direction Of Arrival(DOA) prediction is for each category rather than event, thus these methods cannot handle the simultaneous occurrence of the same type of sound event in different directions. Although track-wise based methods could detect the homogeneous overlap, they are still limi...Show More
In a reverberant environment, interferences such as reflections and background noise can degrade the perception of the sound source signal. Although the DNN-based methods have made a tremendous breakthrough in addressing this issue, the performance of these models is highly dependent on the completeness of the training dataset, which will limit its generalization under unknown environments. In thi...Show More
In the current method for the sound field translation tasks based on spherical harmonic (SH) analysis, the solution based on the additive theorem usually faces the problem of singular values caused by large matrix condition numbers. The influence of different distances and frequencies of the spherical radial function on the stability of the translation matrix will affect the accuracy of the SH coe...Show More
This paper presents a multi-channel speech separation system for an unknown number of speakers. It can be applied to cases with a different number of speakers using a single model by iterative speech separation based on beam signal. It first determines the spatial directions where speakers are located (Direction of Arrival, DOA), and then the beam signals in each direction are obtained with spectr...Show More
The performance of higherorder Ambisonics (HOA) signals obtained using spherical harmonics decomposition method is disturbed by two primary sources of errors, the noise pollution in low-frequency band and the spatial aliasing in high-frequency band. Inspired by the HOA signals upscale method, which is performed using the sparse character of the sound field, this paper propose a sound field decompo...Show More
Reverberation is generally considered harmful to speech intelligibility and will cause degradation to speech related tasks. However, we propose a framework taking advantage of the early reflections (ER), which is part of reverberation, to tackle the speech enhancement problem in this work. First, a fully convolutional neural network (FCNN) is introduced to estimate the direction of arrivals (DOA) ...Show More
The Ambisonic technique has been widely used in fields such as sound-field reproduction, sound source localization, and acoustic noise control. In this paper, a recently proposed Ambisonic decoding method, the partially matching projection method (PMPD), was described in detail and extended for more practical use with near-field sources and reverberant playback environments. Several experiments we...Show More
Higher order Ambisonics (HOA) is a 3D sound decomposition and reproduction technology that represent 3D sound fields based on a series of spherical harmonic functions. However, practical application of HOA technology is limited by narrow available bandwidth, low-frequency noise pollution and high-frequency spatial aliasing. Inspired by the sparse recovery method, anti-spatial aliasing HOA encoding...Show More
In recent years, deep neural networks have been applied in many fields. In this paper, a time-domain unsupervised learning based sound source localization method is proposed, where auto-encoder neural networks are adopted so that some operation like time-delay compensation can be removed and there is no need to prepare training data with precise alignment labels. In order to improve its performanc...Show More
The lack of data is a major problem in individual HRTF modeling. There are many HRTF databases, but each database only has limited HRTFs with different characteristics, such as distance-dependent HRTFs or individual HRTFs. How to effectively model HRTFs through several different databases is an important task. In this paper, a method for predicting individual distance-dependent HRTFs using a few a...Show More
Head-related transfer function (HRTF) plays an important role in the construction of 3D auditory display. This article presents an individual HRTF modeling method using deep neural networks based on spatial principal component analysis. The HRTFs are represented by a small set of spatial principal components combined with frequency and individual-dependent weights. By estimating the spatial princi...Show More
Recently, although the traditionally proposed Permutation Invariant Training (PIT) has attracted much attention, it performs poorly on datasets of unknown number of speakers. In this paper, we propose an approach based on beamforming and deep models (BDM) to solve the problem mentioned above. BDM firstly estimates the number of speakers by sound source localization algorithm and then enhances the ...Show More
The traditional eigen beam based localization algorithms are usually not employed on the non-spherical microphone array, for which the eigen beam is hard to be obtained. In this paper, the transfer functions are introduced to calculated the eigen beam on the non-spherical microphone array. Based on it, three localization algorithms including the eigen beam based intensity vector, eigen beam based ...Show More
In recent years, many researches focus on sound source localization based on neural networks, which is an appealing but difficult problem. In this paper, a novel time-domain end-to-end method for sound source localization is proposed, where the model is trained by two strategies with both cross entropy loss and mean square error loss. Based on the idea of multi-task learning, CNN is used as the sh...Show More
The Ambisonic technique has been widely used for sound field recording and reproduction recently. However, the basic Ambisonic decoding method will break down when the playback loudspeakers distribute unevenly. Various methods have been proposed to solve this problem. This paper introduces several improvements to a recently proposed Ambisonic decoding method, the matching projection method, for un...Show More
In this paper, a method for modeling distance dependent head-related transfer functions is presented. The HRTFs are first decomposed by spatial principal component analysis. Using deep neural networks, we model the spatial principal component weights of different distances. Then we realize the prediction of HRTFs in arbitrary spatial distances. The objective and subjective experiments are conducte...Show More
The traditional weighted MUSIC algorithm is usually implemented based on a sparsity assumption named W-Disjoint Orthogonality (WDO) when the number of sound sources is unknown, which may not be suitable in many scenarios. In this paper, a modified weighted MUSIC algorithm is proposed to improve the localization performance in multiple sound sources. Instead of using the maximum eigenvalue as the w...Show More
The basic Ambisonics decoding method will break down when the playback loudspeakers distribute unevenly. This paper proposes a modified Ambisonics method, the matching projection decoding method, for solving this problem. The matching projection decoding method is a kind of the greedy algorithm. It firstly calculates the projection value of the object Ambisonics signal over each Ambisonics signal ...Show More
In the Ambisonics decoding system, the non-evenly placement of loudspeakers, the non-identical characters of each sound playing channel and the reverberation of the listening room will destroy the spatial perception of the virtual sound. In this paper, an environment adaptive loudspeaker calibration method is proposed for accurately reproducing the sound field by using the Ambisonics system. The t...Show More
Decorrelator is a module to restore the specific correlation properties between stereo signals in parametric stereo audio decoder, which is vital to keep the spatial information of the stereo signals. Generally, the existing decorrelators are prone to have a comb-filter effect which results in an undesirable “metallic” sound and produce temporal smearing such as pre- and post-echoes artefacts. Thi...Show More
Basis pursuit algorithm is one of the most popular methods of sparse coding. The goal of the algorithm is to represent signal using as few coefficients as possible, which is suitable for acoustic signal compression. This paper presents a lossless coding/decoding method using the basis pursuit algorithm. In this method, wavelet packets bases were used to compose the dictionary because of their natu...Show More
A measurement of head-related transfer functions (HRTFs) with high spatial resolution was carried out in this study. HRTF measurement is difficult in the proximal region because of the lack of an appropriate acoustic point source. In this paper, a modified spark gap was used as the acoustic sound source. Our evaluation experiments showed that the spark gap was more like an acoustic point source th...Show More