This is the course material for CSC 83060: Speech and Audio Understanding at the CUNY Graduate Center, as taught by Michael Mandel in Spring 2019.

See course announcements below

Topics (syllabus)

Note that this schedule might change, so check back frequently!

Date Room Content Assignments Readings due
2019/01/25 TBA Introduction
Part I: Fundamentals
2019/02/01 TBA Digital signal processing
2019/02/08 TBA Acoustics
2019/02/15 TBA Auditory perception
2019/02/22 TBA Machine learning and neural networks Lyon:Ch24
2019/03/01 TBA Project proposal presentations Project proposal presentations
Part II: Core machine listening topics
2019/03/08 TBA Speech models and speech synthesis
  • Zen et al 2009,
  • Ling et al 2015,
  • Tacotron2
2019/03/15 TBA Speech recognition front ends
  • Gales and Young 2007,
  • Hinton et al 2012
2019/03/22 TBA Speech recognition back ends Mohri et al 2008
2019/03/29 TBA Speech recognition noise robustness Virtanen et al 2012
2019/04/05 TBA Music analysis and modeling Schedl et al 2014 ch1 ch2 ch6
2019/04/12 TBA Source separation and spatial sound
  • Kumatani et al 2013,
  • Hershey et al 2015
2019/04/19 [No class]
2019/04/26 [No class]
2019/05/03 TBA Environmental sound analysis
  • Stowell et al 2015,
  • McDrmott and Simoncelli 2011
2019/05/12 TBA Final project presentations Final project presentations
2019/05/19 [No class] Final papers due

Textbooks

Required
Richard Lyon (2018), Human and Machine Hearing: Extracting meaning from sound. Cambridge University Press, 2018. Author's corrected manuscript.
Required
Mark Gales and Steve Young (2007), "The application of hidden Markov models in speech recognition". Foundations and Trends in Signal Processing. Vol. 1, No. 3, pp 195-304.
Required
Markus Schedl, Emilia Gómez and Julián Urbano (2014), "Music Information Retrieval: Recent Developments and Applications", Foundations and Trends in Information Retrieval: Vol. 8: No. 2-3, pp 127-261.
Optional
Tuomas Virtanen, Rita Singh, and Bhiksha Raj (2012), Techniques for Noise Robustness in Automatic Speech Recognition. Wiley.
Optional
Ben Gold, Nelson Morgan, and Daniel Ellis (2011), Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Second edition. Wiley.
Optional
Alan V. Oppenheim and Ronald W. Schafer (2010), "Discrete-Time Signal Processing". Third Edition. Pearson. And its companion website

Announcements

2019/01/15
Welcome to class, the course website has almost been updated for this semester.

Resources

  • See this article for instructions on viewing the detailed feedback I provide on your assignments in blackboard
  • The course website for this course from Fall 2016