WASPAA 2015 example audio files and listening test results

Audio super-resolution using concatenative resynthesis [PDF]

Michael I Mandel and Young Suk Cho

Abstract: This paper utilizes a recently introduced non-linear dictionary-based denoising system in another voice mapping task, that of transforming low-bandwidth, low-bitrate speech into high-bandwidth, high-quality speech. The system uses a deep neural network as a learned non-linear comparison function to drive unit selection in a concatenative synthesizer based on clean recordings. This neural network is trained to predict whether a given clean audio segment from the dictionary could be transformed into a given segment of the degraded observation. Speaker-dependent experiments on the small-vocabulary CHiME2-GRID corpus show thatthis model is able to resynthesize high quality clean speech from degraded observations. Preliminary listening tests show that the system is able to improve subjective speech quality evaluations by up to 50 percentage points, while a similar system based on non-negative matrix factorization and trained on the same data produces no significant improvement.

Audio files

Wav files from the CHiME2-GRID devel corpus evaluated in intelligibility (Intel) and quality (Qual).

Compares 2 systems:

  • Concat: the proposed concatenative resynthesis system
  • NMF: NMF-based bandwidth enhancement from [11]

Reference

CleanRevRev8k
FileIntelQualIntelQualIntelQual
bbafzn10096.3 10070.9 10066.4
bgwb5a10094.9 10074.5 10063.1
brbm4n9697.0 9279.1 9075.0
brbs8n10097.1 10074.3 9867.1
lbip9s10097.6 9880.1 10063.4
lbwr3a10096.3 10072.8 10063.0
lrwe8n10096.6 10073.5 10065.3
lwik9a10096.6 10071.4 10057.5
pgwe7a10097.1 10075.6 10066.0
prbxzn10094.5 10074.5 9866.3
sgwdzp9695.9 10074.0 10059.6
srin2n10095.9 10073.3 10061.9

Reverberant, Opus-encoded, 20% packet loss

RevOpusL20RevOpusL20NmfRevOpusL20Concat
FileIntelQualIntelQualIntelQual
bbafzn9222.5 8324.5 9868.8
bgwb5a9222.6 9430.1 6776.0
brbm4n9026.4 8833.8 8370.9
brbs8n9824.3 9025.4 6775.6
lbip9s9226.0 8835.1 8170.4
lbwr3a9824.3 10027.0 9684.0
lrwe8n9819.8 10023.5 9069.0
lwik9a9822.6 9831.6 9069.5
pgwe7a9626.1 9231.4 8369.8
prbxzn10022.6 10025.1 8575.3
sgwdzp8826.4 8329.0 9479.6
srin2n9423.3 9825.6 9471.5

Reverberant, AMR-encoded

RevAmrRevAmrNmfRevAmrConcat
FileIntelQualIntelQualIntelQual
bbafzn9049.6 9252.1 10079.0
bgwb5a10053.8 10052.3 10079.4
brbm4n8354.3 8548.0 7567.6
brbs8n9855.0 9054.6 10077.6
lbip9s10052.4 10048.1 10071.3
lbwr3a10049.6 9850.4 9875.6
lrwe8n10051.9 9843.6 9083.4
lwik9a9452.4 9847.3 9872.5
pgwe7a9449.9 9646.3 10079.8
prbxzn10044.8 9847.9 7581.1
sgwdzp9649.0 9648.8 8373.9
srin2n10051.9 10049.8 8878.3

Clean, AMR-encoded

CleanAmrCleanAmrNmfCleanAmrConcat
FileIntelQualIntelQualIntelQual
bbafzn9658.3 10060.6 9868.4
bgwb5a10067.9 9862.3 8365.3
brbm4n9064.1 9669.3 7961.4
brbs8n10064.0 9667.8 8359.3
lbip9s10057.1 10065.8 8158.0
lbwr3a10064.1 9658.8 9262.4
lrwe8n10065.1 10065.8 7959.5
lwik9a9864.3 10059.0 5661.0
pgwe7a10066.4 10063.3 8156.9
prbxzn10058.6 10062.0 8367.5
sgwdzp9267.8 9260.9 8356.8
srin2n10055.8 10072.1 6557.4