TFDense-GAN: a Generative Adversarial Network for Single-Channel Speech Enhancement

Authors

Author Haoxiang Chen, Jinxiu Zhang, Yaogang Fu, Xintong Zhou, Ruilong Wang, Yanyan Xu, Dengfeng Ke

Abstract

Research indicates that utilizing the spectrum in the time-frequency domain plays a crucial role in speech enhancement tasks, as it can better extract audio features and reduce computational consumption. For the speech enhancement methods in the time-frequency domain, the introduction of attention mechanisms and the application of DenseBlock have yielded promising results. In particular, the Unet architecture, which comprises three main components, the encoder, the decoder, and the bottleneck, employs DenseBlock in both the encoder and the decoder to achieve powerful feature fusion capabilities with fewer parameters. In this paper, in order to enhance the advantages of the aforementioned methods for speech enhancement, we propose a Unet-based time-frequency domain denoising model called TFDense-Net. It utilizes our improved DenseBlock for feature extraction in both the encoder and the decoder, and employs an attention mechanism in the bottleneck for feature fusion and denoising. The model has demonstrated excellent performance for speech enhancement tasks, achieving significant improvements in the Si-SDR metric compared to other state-of-the-art models. Additionally, to further enhance the denoising performance and increase the receptive field of the model, we introduce a multi-spectrogram discriminator based on multiple STFTs. Since the discriminator loss can observe the correlations between spectra that traditional loss functions cannot detect, we train TFDense-Net as a generator against the multi-spectrogram discriminator, resulting in a significant improvement in the denoising performance, and we name this enhanced model TFDense-GAN. We evaluate our proposed TFDense-Net and TFDense-GAN on two public datasets: the VCTK + DEMAND dataset and the Interspeech Deep Noise Suppression Challenge dataset. Experimental results show that TFDense-GAN outperforms most existing models in terms of STOI, PESQ, and Si-SDR, achieving state-of-the-art results.

Speech enhancement demo (on the Interspeech Deep Noise Suppression Challenge dataset)

fileid_14

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
GaGNet Output GaGNet Spectrogram
MFNet Output MFNet Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

fileid_48

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
GaGNet Output GaGNet Spectrogram
MFNet Output MFNet Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

fileid_88

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
GaGNet Output GaGNet Spectrogram
MFNet Output MFNet Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

fileid_112

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
GaGNet Output GaGNet Spectrogram
MFNet Output MFNet Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

fileid_156

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
GaGNet Output GaGNet Spectrogram
MFNet Output MFNet Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

fileid_285

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
GaGNet Output GaGNet Spectrogram
MFNet Output MFNet Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

Speech enhancement demo (on the VCTK + DEMAND dataset)

p232_055

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

p232_068

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

p232_221

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

p232_265

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

p232_266

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram

p232_320

Method Audio Spectrogram
Noisy Input Noisy Spectrogram
Clean Target Clean Spectrogram
FRCRN Output FRCRN Spectrogram
TFDense-GAN Output TFDenseGAN Spectrogram
TFDense-Net Output TFDenseNet Spectrogram