GlobalSIP 2014 Example Audio Files

Learning a concatenative resynthesis system for noise suppression [PDF]

Michael I Mandel, Young Suk Cho, and Yuxuan Wang

Abstract: This paper introduces a new approach to dictionary-based source separation employing a learned non-linear metric. In contrast to existing parametric source separation systems, this model is able to utilize a rich dictionary of speech signals. In contrast to previous dictionary-based source separation systems, the system can utilize perceptually relevant non-linear features of the noisy and clean audio. This approach utilizes a deep neural network (DNN) to predict whether a noisy chunk of audio contains a given clean chunk. Speaker-dependent experiments on the CHiME2-GRID corpus show that this model is able to accurately resynthesize clean speech from noisy observations. Preliminary listening tests show that the system's output has much higher audio quality than existing parametric systems trained on the same data, achieving noise suppression levels close to those of the original clean speech.

Audio files

Wav files from the CHiME2-GRID devel corpus evaluated in intelligibility and MUltiple Stimuli with Hidden Reference and Anchor(MUSHRA) listening tests.

Compares four systems:

Concat: the proposed concatenative resynthesis system
Concat No-trans: the proposed system using only the learned metric with no transition probabilities
Ideal ratio mask NN: a deep neural network trained to predict the ideal ratio mask from the noise input
Noise-to-clean NN: a deep neural network trained to predict the clean signal from the noise signal

File	SNR	Clean	Noisy	Concat	Concat No-trans	Ideal ratio mask NN	Noisy-to-clean NN
bbafzn	0dB
brbm4n	6dB
bwam8p	m6dB
lbwk6n	m3dB
pbwc7s	m6dB
sgwdzp	m6dB
bbizzp	m3dB
brbs8n	0dB
bwis6n	0dB
lbwr3a	3dB
pgwe7a	0dB
srbo2p	6dB
bgwb5a	m3dB
briz6p	9dB
lbaq2n	m6dB
lrwe8n	6dB
prbc9s	0dB
srin2n	3dB
brbg1s	m6dB
bwam7s	m3dB
lbip9s	6dB
lwik9a	9dB
prbxzn	m3dB
swih5s	6dB