This is the course material for CSC 83060: Speech and Audio Understanding at the CUNY Graduate Center, as taught by Michael Mandel in Fall 2016.

See course announcements below

Topics (syllabus)

Note that this schedule might change, so check back frequently!

Date	Topic	Assignments	Readings
2016/08/26	Introduction
	Part I: Fundamentals
2016/09/02	Digital signal processing		Review: Oppenheim & Schafer, 2010 Discrete time signal processing, §2.1-2.9 Eric: Dolson, 1986 The phase vocoder: a tutorial Michael: Puckette, 1995 The phase-locked vocoder
2016/09/09	[No class: Interspeech]
2016/09/16	Acoustics		Review: Schroeder, 1980 Acoustics in Human Communication Soumi: Allen & Berkley, 1978 Image method for simulating room acoustics Links: from lecture
2016/09/23	Auditory perception		Review: Evans, 1972 Properties of guinea pig auditory nerve fibers Arwa: Mesgarani et al, 2008 Phoneme representation in auditory cortex
2016/09/30	Machine Learning and Neural networks		Review: Deng and Yu, 2014 Deep learning (Chapter 7 only: applications to speech and audio) Zhaoheng: Lee et al, 2009 Unsupervised feature learning for audio classification
2016/10/07	Project proposal presentations
2016/10/14	[No class: Tuesday schedule]
	Part II: Core machine listening topics
2016/10/21	Speech models and speech synthesis		Review: Tokuda et al, 2013 Speech synthesis based on HMMs Jiyoung: Hunt & Black, 1996 Unit selection in concatenative speech synthesis Links: Synthesis demos
2016/10/28	Speech recognition front ends (features, acoustic modeling, noise robustness)		Review: Gales and Young, 2007 HMMs for ASR, Ch 1, 2, 4, 5 (Intro only), 6 (Intro only) Cong: Povey et al, 2011 The Kaldi ASR Toolkit
2016/11/04	Speech recognition back ends (language modeling, search, finite state transducers)		Review: Mohri et al, 2008 Speech Recognition with wFSTs Anh: Mikolov et al 2011 The RNNLM Toolkit Links: WFST in ASR slides
2016/11/11	Music analysis and modeling		Review: Schedl et al 2014 Music Information Retrieval, Ch 1, 2, 6, intros to other chapters Contribution: van den Oord et al 2013 Deep content-based music recommendation
2016/11/18	Source separation and spatial sound		Review: Kumatani et al, 2013 Microphone array processing for distant speech recognition Raj: Weng et al 2014 Single channel mixed speech recognition using DNNs
2016/11/25	[No class: Thanksgiving]
2016/12/02	Environmental sound analysis		Review: Stowell et al, 2015 Detection and Classification of Audio Scenes and Events Contribution: McDermott and Simoncelli 2011 Sound texture perception via synthesis
2016/12/09	Final project presentations	Final project assignment
2016/12/16	Final papers due (no class)	Final project assignment

Recommended textbooks

"Speech and Audio Signal Processing: Processing and Perception of Speech and Music" by Ben Gold, Nelson Morgan, and Daniel Ellis. Second edition. Wiley, 2011
"Discrete-Time Signal Processing" by Alan V. Oppenheim and Ronald W. Schafer. Third Edition. Pearson, 2010. And its companion website

Announcements

2016/11/03: See this article for instructions on viewing the detailed feedback I provide on your assignments in blackboard
2016/08/26: Welcome to class, the course website has been updated again with a tentative schedule, but incomplete readings list
2016/08/04: Welcome to class, the course website has been updated for this semester