WASPAA 2019 results

Parametric resynthesis with neural vocoders [pdf]

Soumi Maiti and Michael Mandel

Noise suppression systems generally produce output speech with copromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjective and objective quality scores than the source separation model Chimera++. Further, WaveNet and WaveGlow also achieve significantly better subjective quality ratings than the oracle Wiener mask. Moreover, we observe that between WaveNet and WaveGlow, WaveNet achieves the best subjective quality scores, although at the cost of much slower waveform generation.

Audio files

Wav files from the LJ corpus mixed with noise from the CHiME-3 noise recordings

Neural parametric resynthesis systems

FileNoisyPR-WaveNetPR-WaveGlowPR-WaveNet-JointPR-WaveGlow-JointClean
LJ050-0276
LJ050-0273
LJ050-0253
LJ050-0251
LJ050-0246
LJ050-0242
LJ050-0239
LJ050-0235
LJ050-0223
LJ050-0206
LJ050-0196
LJ001-0008

Baseline systems

FileNoisyPR-WorldChimera++Oracle WienerClean
LJ050-0276
LJ050-0273
LJ050-0253
LJ050-0251
LJ050-0246
LJ050-0242
LJ050-0239
LJ050-0235
LJ050-0223
LJ050-0206
LJ050-0196
LJ001-0008