This is the course material for CSC 83060: Speech and Audio Understanding at the CUNY Graduate Center, as taught by Michael Mandel in Spring 2019.

See course announcements below

Topics (syllabus)

Note that this schedule might change, so check back frequently!

Date	Room	Content	Assignments due	Useful
2019/01/25	4422	Introduction and DSP	Lyon:Ch1, Lyon:Ch2	Links01
		Part I: Fundamentals
2019/02/01	4422	Time-frequency and Acoustics	Lyon:Ch6, Lyon:Ch7	Links02, Datasets, Implementations
2019/02/08	4422	Auditory perception	Lyon:Ch4, Yudan: Allen and Berkley 1978	Links03
2019/02/15	4422	The auditory brain	Lyon:Ch20, Lyon:Ch23, Abhinav: Mesgarani et al. 2008
2019/02/22	4422	Deep learning, Troubleshooting	Lyon:Ch24, Subhadarshi: Cho et al. 2015	Troubleshooting neural networks
2019/03/01	4422	Project proposal presentations	Project proposal presentations
		Part II: Core machine listening topics
2019/03/08	4422	Speech models and speech synthesis	Ling et al 2015, Claire: Wang et al 2017	Synthesis tutorial, Links05
2019/03/15	4422	Speech recognition front ends, Jyothi Lec1, Jyothi Lec7, Jyothi Lec8	Young 1996, Hassan: Chan et al 2016	Jyothi cs753
2019/03/22	4422	Speech recognition language modeling, Jyothi Lec10, Jyothi Lec11	Mohri et al 2002, Rafae: Mikolov et al 2010
2019/03/29	4422	Speech recognition graphs and search, Jyothi Lec2, Jyothi Lec3, Jyothi Lec4, Jyothi Lec15, Jyothi Lec16, Jyothi Lec17	Vincent et al 2016	Vertanen 2005, Viterbi example
2019/04/05	4422	Music analysis and modeling	Schedl et al 2014 ch1 ch2 ch6, Sean: Raffel and Ellis 2015	Links09
2019/04/12	4422	Source separation and spatial sound, ICASSP 16 tutorial Vincent et al	Kumatani et al 2013, Patrick: Hershey et al 2015	Links10
2019/04/19		[No class]
2019/04/26		[No class]
2019/05/03	4422	Environmental sound analysis	Soumik: Gemmeke et al 2017, Enis: Hershey et al 2017, Stowell et al 2015
2019/05/10	4422	Final project presentations	Final project presentations
2019/05/17		[No class]	Final papers due

Textbooks ( bookstore link)

Required: Richard Lyon (2018), Human and Machine Hearing: Extracting meaning from sound. Cambridge University Press. Author's corrected manuscript.
Required: Mark Gales and Steve Young (2007), "The application of hidden Markov models in speech recognition". Foundations and Trends in Signal Processing. Vol. 1, No. 3, pp 195-304.
Required: Markus Schedl, Emilia Gómez and Julián Urbano (2014), "Music Information Retrieval: Recent Developments and Applications", Foundations and Trends in Information Retrieval: Vol. 8: No. 2-3, pp 127-261.
Optional: Tuomas Virtanen, Rita Singh, and Bhiksha Raj (2012), Techniques for Noise Robustness in Automatic Speech Recognition. Wiley.
Optional: Ben Gold, Nelson Morgan, and Daniel Ellis (2011), Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Second edition. Wiley.
Optional: Alan V. Oppenheim and Ronald W. Schafer (2010), "Discrete-Time Signal Processing". Third Edition. Pearson. And its companion website

Announcements

2019/02/01: I rearranged the first few classes and readings a little bit.
2019/01/24: Welcome to class, the course website has been updated for this semester.
2019/01/15: Welcome to class, the course website has almost been updated for this semester.

Resources

See this article for instructions on viewing the detailed feedback I provide on your assignments in blackboard
The course website for this course from Fall 2016

CSC 83060: Speech and Audio Understanding, Spring 2019

See course announcements below

Topics (syllabus)

Textbooks ( bookstore link)

Announcements

Resources