WASPAA 2015 example audio files and listening test results
Audio super-resolution using concatenative resynthesis [PDF]
Michael I Mandel and Young Suk Cho
Abstract: This paper utilizes a recently introduced non-linear dictionary-based denoising system in another voice mapping task, that of transforming low-bandwidth, low-bitrate speech into high-bandwidth, high-quality speech. The system uses a deep neural network as a learned non-linear comparison function to drive unit selection in a concatenative synthesizer based on clean recordings. This neural network is trained to predict whether a given clean audio segment from the dictionary could be transformed into a given segment of the degraded observation. Speaker-dependent experiments on the small-vocabulary CHiME2-GRID corpus show thatthis model is able to resynthesize high quality clean speech from degraded observations. Preliminary listening tests show that the system is able to improve subjective speech quality evaluations by up to 50 percentage points, while a similar system based on non-negative matrix factorization and trained on the same data produces no significant improvement.
Wav files from the CHiME2-GRID devel corpus evaluated in intelligibility (Intel) and quality (Qual).
Compares 2 systems:
Concat: the proposed concatenative resynthesis system