WASPAA 2015 example audio files and listening test results

Audio super-resolution using concatenative resynthesis [PDF]

Michael I Mandel and Young Suk Cho

Abstract: This paper utilizes a recently introduced non-linear dictionary-based denoising system in another voice mapping task, that of transforming low-bandwidth, low-bitrate speech into high-bandwidth, high-quality speech. The system uses a deep neural network as a learned non-linear comparison function to drive unit selection in a concatenative synthesizer based on clean recordings. This neural network is trained to predict whether a given clean audio segment from the dictionary could be transformed into a given segment of the degraded observation. Speaker-dependent experiments on the small-vocabulary CHiME2-GRID corpus show thatthis model is able to resynthesize high quality clean speech from degraded observations. Preliminary listening tests show that the system is able to improve subjective speech quality evaluations by up to 50 percentage points, while a similar system based on non-negative matrix factorization and trained on the same data produces no significant improvement.

Audio files

Wav files from the CHiME2-GRID devel corpus evaluated in intelligibility (Intel) and quality (Qual).

Compares 2 systems:

Concat: the proposed concatenative resynthesis system
NMF: NMF-based bandwidth enhancement from [11]

Reference

	Clean		Rev		Rev8k
File	Intel	Qual	Intel	Qual	Intel	Qual
bbafzn	100	96.3	100	70.9	100	66.4
bgwb5a	100	94.9	100	74.5	100	63.1
brbm4n	96	97.0	92	79.1	90	75.0
brbs8n	100	97.1	100	74.3	98	67.1
lbip9s	100	97.6	98	80.1	100	63.4
lbwr3a	100	96.3	100	72.8	100	63.0
lrwe8n	100	96.6	100	73.5	100	65.3
lwik9a	100	96.6	100	71.4	100	57.5
pgwe7a	100	97.1	100	75.6	100	66.0
prbxzn	100	94.5	100	74.5	98	66.3
sgwdzp	96	95.9	100	74.0	100	59.6
srin2n	100	95.9	100	73.3	100	61.9

Reverberant, Opus-encoded, 20% packet loss

	RevOpusL20		RevOpusL20Nmf		RevOpusL20Concat
File	Intel	Qual	Intel	Qual	Intel	Qual
bbafzn	92	22.5	83	24.5	98	68.8
bgwb5a	92	22.6	94	30.1	67	76.0
brbm4n	90	26.4	88	33.8	83	70.9
brbs8n	98	24.3	90	25.4	67	75.6
lbip9s	92	26.0	88	35.1	81	70.4
lbwr3a	98	24.3	100	27.0	96	84.0
lrwe8n	98	19.8	100	23.5	90	69.0
lwik9a	98	22.6	98	31.6	90	69.5
pgwe7a	96	26.1	92	31.4	83	69.8
prbxzn	100	22.6	100	25.1	85	75.3
sgwdzp	88	26.4	83	29.0	94	79.6
srin2n	94	23.3	98	25.6	94	71.5

Reverberant, AMR-encoded

	RevAmr		RevAmrNmf		RevAmrConcat
File	Intel	Qual	Intel	Qual	Intel	Qual
bbafzn	90	49.6	92	52.1	100	79.0
bgwb5a	100	53.8	100	52.3	100	79.4
brbm4n	83	54.3	85	48.0	75	67.6
brbs8n	98	55.0	90	54.6	100	77.6
lbip9s	100	52.4	100	48.1	100	71.3
lbwr3a	100	49.6	98	50.4	98	75.6
lrwe8n	100	51.9	98	43.6	90	83.4
lwik9a	94	52.4	98	47.3	98	72.5
pgwe7a	94	49.9	96	46.3	100	79.8
prbxzn	100	44.8	98	47.9	75	81.1
sgwdzp	96	49.0	96	48.8	83	73.9
srin2n	100	51.9	100	49.8	88	78.3

Clean, AMR-encoded

	CleanAmr		CleanAmrNmf		CleanAmrConcat
File	Intel	Qual	Intel	Qual	Intel	Qual
bbafzn	96	58.3	100	60.6	98	68.4
bgwb5a	100	67.9	98	62.3	83	65.3
brbm4n	90	64.1	96	69.3	79	61.4
brbs8n	100	64.0	96	67.8	83	59.3
lbip9s	100	57.1	100	65.8	81	58.0
lbwr3a	100	64.1	96	58.8	92	62.4
lrwe8n	100	65.1	100	65.8	79	59.5
lwik9a	98	64.3	100	59.0	56	61.0
pgwe7a	100	66.4	100	63.3	81	56.9
prbxzn	100	58.6	100	62.0	83	67.5
sgwdzp	92	67.8	92	60.9	83	56.8
srin2n	100	55.8	100	72.1	65	57.4