GlobalSIP 2014 Example Audio Files

Learning a concatenative resynthesis system for noise suppression [PDF]

Michael I Mandel, Young Suk Cho, and Yuxuan Wang

Abstract: This paper introduces a new approach to dictionary-based source separation employing a learned non-linear metric. In contrast to existing parametric source separation systems, this model is able to utilize a rich dictionary of speech signals. In contrast to previous dictionary-based source separation systems, the system can utilize perceptually relevant non-linear features of the noisy and clean audio. This approach utilizes a deep neural network (DNN) to predict whether a noisy chunk of audio contains a given clean chunk. Speaker-dependent experiments on the CHiME2-GRID corpus show that this model is able to accurately resynthesize clean speech from noisy observations. Preliminary listening tests show that the system's output has much higher audio quality than existing parametric systems trained on the same data, achieving noise suppression levels close to those of the original clean speech.

Audio files

Wav files from the CHiME2-GRID devel corpus evaluated in intelligibility and MUltiple Stimuli with Hidden Reference and Anchor(MUSHRA) listening tests.

Compares four systems:

  • Concat: the proposed concatenative resynthesis system
  • Concat No-trans: the proposed system using only the learned metric with no transition probabilities
  • Ideal ratio mask NN: a deep neural network trained to predict the ideal ratio mask from the noise input
  • Noise-to-clean NN: a deep neural network trained to predict the clean signal from the noise signal
FileSNR CleanNoisyConcatConcat No-transIdeal ratio mask NNNoisy-to-clean NN
bbafzn0dB
brbm4n6dB
bwam8pm6dB
lbwk6nm3dB
pbwc7sm6dB
sgwdzpm6dB
bbizzpm3dB
brbs8n0dB
bwis6n0dB
lbwr3a3dB
pgwe7a0dB
srbo2p6dB
bgwb5am3dB
briz6p9dB
lbaq2nm6dB
lrwe8n6dB
prbc9s0dB
srin2n3dB
brbg1sm6dB
bwam7sm3dB
lbip9s6dB
lwik9a9dB
prbxznm3dB
swih5s6dB