Loading [MathJax]/extensions/MathZoom.js

Ashutosh Pandey - IEEE Xplore Author Profile

IEEE.org
IEEE Xplore
IEEE SA
IEEE Spectrum
More Sites

- Donate
- Cart
- Create Account
- Personal Sign In

Institutional Sign In

Institutional Sign In

ADVANCED SEARCH

Author details

Ashutosh Pandey

Publications

19

Citations

906

Publications by Year

20182025

Co-Authors:

Ali AroudiJuan AzcarretaPaul CalamiaLongbiao ChengTobi Delbruck

Show All Co-Authors (22)

Ashutosh Pandey

Affiliation

Reality Labs Research

Meta

Redmond, United States

Publication Topics

Speech Enhancement,
Deep Neural Network,
Short-time Fourier Transform,
Linear Layer,
Reverberation,
Frame Size,
Long Short-term Memory,
Signal-to-noise,
Spatial Filter,
Time Domain,
Background Noise,
Beamforming

Biography

Ashutosh Pandey (Student Member, IEEE) received the B.Tech degree in electronics and communication engineering from the Indian Institute of Technology, Guwahati, India, in 2011. He is currently pursuing the Ph.D. degree at The Ohio State University. He is interested in speech separation and deep learning.(Based on document published on 14 August 2020).

Publications

19

Citations

906

Publications by Year

20182025

Co-Authors:

Ali Aroudi
Juan Azcarreta
Paul Calamia
Longbiao Cheng
Tobi Delbruck

Show All Co-Authors (22)

Author's Published Works

Search History

Showing 1-19 of 19 results

Conferences (13)

Journals (6)

Sort

Filter Results

Show

Open Access Only

Range
Single Year
Ashutosh Pandey(19)
DeLiang Wang(11)
Buye Xu(7)
Anurag Kumar(3)
Jacob Donley(3)
Department of Computer Science and Engineering, The Ohio State University, USA(7)
Center for Cognitive and Brain Sciences, The Ohio State University, USA(4)
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA(4)
Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH, USA(2)
Facebook Reality Labs Research, USA(2)
IEEE/ACM Transactions on Audio, Speech, and Language Processing(6)
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(3)
ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(3)
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2)
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2)
IEEE(19)
Hyderabad, India(3)
Seoul, Korea, Republic of(3)
Brighton, UK(2)
Singapore, Singapore(2)
Barcelona, Spain(1)
Speech Enhancement(16)
Deep Neural Network(15)
Short-time Fourier Transform(14)
Time Domain(10)
Speech Signal(8)

Select All on Page

Sort By

Results

Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement

Longbiao Cheng;Ashutosh Pandey;Buye Xu;Tobi Delbruck;Vamsi Krishna Ithapu;Shih-Chii Liu

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2025 | Conference Paper |

HTML

Deep learning-based speech enhancement (SE) methods often face significant computational challenges when needing to meet low-latency requirements because of the increased number of frames to be processed. This paper introduces the SlowFast framework which aims to reduce computation costs specifically when low-latency enhancement is needed. The framework consists of a slow branch that analyzes the ...Show More

Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement

Longbiao Cheng;Ashutosh Pandey;Buye Xu;Tobi Delbruck;Vamsi Krishna Ithapu;Shih-Chii Liu

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2025 | Conference Paper |

Ultra low-compute complex spectral masking for multichannel speech enhancement

Ashutosh Pandey;Juan Azcarreta

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2025 | Conference Paper |

HTML

We present a streamlined framework for complex spectral masking that processes multichannel speech with minimal computational demands, enhancing both spectral magnitude and phase by integrating low-compute models with the Multi-Channel Wiener Filter (MCWF). Our methodology employs a two-stage, end-to-end training approach where a deep neural network (DNN) first estimates MCWF weights, followed by ...Show More

Ultra low-compute complex spectral masking for multichannel speech enhancement

Ashutosh Pandey;Juan Azcarreta

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2025 | Conference Paper |

Reexamining the Efficacy of MetricGAN for Speech Enhancement

Haibin Wu;Ali Aroudi;Buye Xu;Ashutosh Pandey;Francesco Nesta;Anurag Kumar;Alexander Reich;Ke Tan

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2025 | Conference Paper |

HTML

MetricGAN, a notable generative approach, provides an effective framework to train speech enhancement models to produce high metric scores. However, we identify two key limitations of current MetricGAN-family models, i.e. neglecting certain mainstream metrics during evaluation and conducting evaluation exclusively at high SNR. Firstly, we comprehensively assess MetricGAN models using mainstream me...Show More

Reexamining the Efficacy of MetricGAN for Speech Enhancement

Haibin Wu;Ali Aroudi;Buye Xu;Ashutosh Pandey;Francesco Nesta;Anurag Kumar;Alexander Reich;Ke Tan

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2025 | Conference Paper |

Decoupled Spatial and Temporal Processing for Resource Efficient Multichannel Speech Enhancement

Ashutosh Pandey;Buye Xu

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2024 | Conference Paper |

Cited by: Papers (3)

HTML

We present a novel model designed for resource-efficient multichannel speech enhancement in the time domain, with a focus on low latency, lightweight, and low computational requirements. The proposed model incorporates explicit spatial and temporal processing within deep neural network (DNN) layers. Inspired by frequency-dependent multichannel filtering, our spatial filtering process applies multi...Show More

Decoupled Spatial and Temporal Processing for Resource Efficient Multichannel Speech Enhancement

Ashutosh Pandey;Buye Xu

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2024 | Conference Paper |

Leveraging Sound Localization to Improve Continuous Speaker Separation

Hassan Taherian;Ashutosh Pandey;Daniel Wong;Buye Xu;DeLiang Wang

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2024 | Conference Paper |

Cited by: Papers (3)

HTML

Continuous speaker separation aims to separate overlapping speakers in real-world environments like meetings, but it often falls short in isolating speech segments of a single speaker. This leads to split signals that adversely affect downstream applications such as automatic speech recognition and speaker diarization. Existing solutions like speaker counting have limitations. This paper presents ...Show More

Leveraging Sound Localization to Improve Continuous Speaker Separation

Hassan Taherian;Ashutosh Pandey;Daniel Wong;Buye Xu;DeLiang Wang

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2024 | Conference Paper |

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

Tsun-An Hsieh;Jacob Donley;Daniel Wong;Buye Xu;Ashutosh Pandey

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2024 | Conference Paper |

Cited by: Papers (1)

HTML

We introduce a time-domain framework for efficient multichannel speech enhancement, emphasizing low latency and computational efficiency. This framework incorporates two compact deep neural networks (DNNs) surrounding a multichannel neural Wiener filter (NWF). The first DNN enhances the speech signal to estimate NWF coefficients, while the second DNN refines the output from the NWF. The NWF, while...Show More

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

Tsun-An Hsieh;Jacob Donley;Daniel Wong;Buye Xu;Ashutosh Pandey

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2024 | Conference Paper |

Attentive Training: A New Training Framework for Speech Enhancement

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2023 | Volume: 31 | Journal Article |

Cited by: Papers (2)

HTML

Dealing with speech interference in a speech enhancement system requires either speaker separation or target speaker extraction. Speaker separation has multiple output streams with arbitrary assignments while target speaker extraction requires additional cueing for speaker selection. Both of these are not suitable for a standalone speech enhancement system with one output stream. In this study, we...Show More

Attentive Training: A New Training Framework for Speech Enhancement

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2023 | Volume: 31 | Journal Article |

Low-Latency Active Noise Control Using Attentive Recurrent Network

Hao Zhang;Ashutosh Pandey;De Liang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2023 | Volume: 31 | Journal Article |

Cited by: Papers (10)

HTML

Processing latency is a critical issue for active noise control (ANC) due to the causality constraint of ANC systems. This paper addresses low-latency ANC in the context of deep learning (i.e. deep ANC). A time-domain method using an attentive recurrent network (ARN) is employed to perform deep ANC with smaller frame sizes, thus reducing algorithmic latency of deep ANC. In addition, we introduce a...Show More

Low-Latency Active Noise Control Using Attentive Recurrent Network

Hao Zhang;Ashutosh Pandey;De Liang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2023 | Volume: 31 | Journal Article |

TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement

Ashutosh Pandey;Buye Xu;Anurag Kumar;Jacob Donley;Paul Calamia;DeLiang Wang

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2022 | Conference Paper |

Cited by: Papers (24)

HTML

In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), wh...Show More

TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement

Ashutosh Pandey;Buye Xu;Anurag Kumar;Jacob Donley;Paul Calamia;DeLiang Wang

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2022 | Conference Paper |

Multichannel Speech Enhancement Without Beamforming

Ashutosh Pandey;Buye Xu;Anurag Kumar;Jacob Donley;Paul Calamia;DeLiang Wang

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2022 | Conference Paper |

Cited by: Papers (8)

HTML

Deep neural networks are often coupled with traditional spatial filters, such as MVDR beamformers for effectively exploiting spatial information. Even though single-stage end-to-end supervised models can obtain impressive enhancement, combining them with a traditional beamformer and a DNN-based post-filter in a multistage processing provides additional improvements. In this work, we propose a two-...Show More

Multichannel Speech Enhancement Without Beamforming

Ashutosh Pandey;Buye Xu;Anurag Kumar;Jacob Donley;Paul Calamia;DeLiang Wang

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2022 | Conference Paper |

Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2022 | Volume: 30 | Journal Article |

Cited by: Papers (33)

HTML

Deep neural networks (DNNs) represent the mainstream methodology for supervised speech enhancement, primarily due to their capability to model complex functions using hierarchical representations. However, a recent study revealed that DNNs trained on a single corpus fail to generalize to untrained corpora, especially in low signal-to-noise ratio (SNR) conditions. Developing a noise, speaker, and c...Show More

Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2022 | Volume: 30 | Journal Article |

Dual Application of Speech Enhancement for Automatic Speech Recognition

Ashutosh Pandey;Chunxi Liu;Yun Wang;Yatharth Saraf

2021 IEEE Spoken Language Technology Workshop (SLT)

Year: 2021 | Conference Paper |

Cited by: Papers (31)

HTML

In this work, we exploit speech enhancement for improving a re-current neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and find it helpful for ASR in two ways: a data augmentation technique, and a preprocessing frontend. In using it for ASR data augmentation, we exploit a KL divergen...Show More

Dual Application of Speech Enhancement for Automatic Speech Recognition

Ashutosh Pandey;Chunxi Liu;Yun Wang;Yatharth Saraf

2021 IEEE Spoken Language Technology Workshop (SLT)

Year: 2021 | Conference Paper |

Dense CNN With Self-Attention for Time-Domain Speech Enhancement

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2021 | Volume: 29 | Journal Article |

Cited by: Papers (130)

HTML

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and t...Show More

Dense CNN With Self-Attention for Time-Domain Speech Enhancement

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2021 | Volume: 29 | Journal Article |

On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2020 | Volume: 28 | Journal Article |

Cited by: Papers (37)

HTML

In recent years, supervised approaches using deep neural networks (DNNs) have become the mainstream for speech enhancement. It has been established that DNNs generalize well to untrained noises and speakers if trained using a large number of noises and speakers. However, we find that DNNs fail to generalize to new speech corpora in low signal-to-noise ratio (SNR) conditions. In this work, we estab...Show More

On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2020 | Volume: 28 | Journal Article |

Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain

Ashutosh Pandey;DeLiang Wang

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2020 | Conference Paper |

Cited by: Papers (94)

HTML

In this work, we propose a fully convolutional neural network for real-time speech enhancement in the time domain. The proposed network is an encoder-decoder based architecture with skip connections. The layers in the encoder and the decoder are followed by densely connected blocks comprising of dilated and causal convolutions. The dilated convolutions help in context aggregation at different reso...Show More

Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain

Ashutosh Pandey;DeLiang Wang

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2020 | Conference Paper |

A New Framework for CNN-Based Speech Enhancement in the Time Domain

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2019 | Volume: 27, Issue: 7 | Journal Article |

Cited by: Papers (199)

HTML

This paper proposes a new learning mechanism for a fully convolutional neural network (CNN) to address speech enhancement in the time domain. The CNN takes as input the time frames of noisy utterance and outputs the time frames of the enhanced utterance. At the training time, we add an extra operation that converts the time domain to the frequency domain. This conversion corresponds to simple matr...Show More

A New Framework for CNN-Based Speech Enhancement in the Time Domain

Ashutosh Pandey;DeLiang Wang

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Year: 2019 | Volume: 27, Issue: 7 | Journal Article |

Exploring Deep Complex Networks for Complex Spectrogram Enhancement

Ashutosh Pandey;DeLiang Wang

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2019 | Conference Paper |

Cited by: Papers (26)

HTML

A recent study has demonstrated the effectiveness of complex-valued deep neural networks (CDNNs) using newly developed tools such as complex batch normalization and complex residual blocks. Motivated by the fact that CDNNs are well suited for the processing of complex-domain representations, we explore CDNNs for speech enhancement. In particular, we train a CDNN that learns to map the complex-valu...Show More

Exploring Deep Complex Networks for Complex Spectrogram Enhancement

Ashutosh Pandey;DeLiang Wang

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2019 | Conference Paper |

TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain

Ashutosh Pandey;DeLiang Wang

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2019 | Conference Paper |

Cited by: Papers (252)

HTML

This work proposes a fully convolutional neural network (CNN) for real-time speech enhancement in the time domain. The proposed CNN is an encoder-decoder based architecture with an additional temporal convolutional module (TCM) inserted between the encoder and the decoder. We call this architecture a Temporal Convolutional Neural Network (TCNN). The encoder in the TCNN creates a low dimensional re...Show More

TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain

Ashutosh Pandey;DeLiang Wang

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2019 | Conference Paper |

On Adversarial Training and Loss Functions for Speech Enhancement

Ashutosh Pandey;Deliang Wang

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2018 | Conference Paper |

Cited by: Papers (53)

HTML

Generative adversarial networks (GANs) are becoming increasingly popular for image processing tasks. Researchers have started using GAN s for speech enhancement, but the advantage of using the GAN framework has not been established for speech enhancement. For example, a recent study reports encouraging enhancement results, but we find that the architecture of the generator used in the GAN gives be...Show More

On Adversarial Training and Loss Functions for Speech Enhancement

Ashutosh Pandey;Deliang Wang

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Year: 2018 | Conference Paper |

IEEE Personal Account

Change username/password

Purchase Details

Payment Options
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical interests

Need Help?

US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support

Follow

About IEEE Xplore | Contact Us | Help | Accessibility | Terms of Use | Nondiscrimination Policy | IEEE Ethics Reporting | Sitemap | IEEE Privacy Policy

A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

© Copyright 2025 IEEE - All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies.

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests

Need Help?

US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support

About IEEE Xplore
Contact Us
Help
Accessibility
Terms of Use
Nondiscrimination Policy
Sitemap
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.
© Copyright 2025 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.