This is the course material for CSC 83060: Speech and Audio Understanding at the CUNY Graduate Center, as taught by Michael Mandel in Spring 2019.

See course announcements below

Topics (syllabus)

Note that this schedule might change, so check back frequently!

Date Room Content Assignments due Useful
2019/01/25 4422 Introduction and DSP Links01
Part I: Fundamentals
2019/02/01 4422 Time-frequency and Acoustics
2019/02/08 4422 Auditory perception Links03
2019/02/15 4422 The auditory brain
2019/02/22 4422 Troubleshooting neural networks
2019/03/01 4422 Project proposal presentations Project proposal presentations
Part II: Core machine listening topics
2019/03/08 4422 Speech models and speech synthesis
2019/03/15 4422 Jyothi cs753
2019/03/22 4422 Speech recognition back ends
2019/03/29 4422 Speech recognition noise robustness Virtanen et al 2012
2019/04/05 4422 Music analysis and modeling Schedl et al 2014 ch1 ch2 ch6
2019/04/12 4422 Source separation and spatial sound
  • Kumatani et al 2013,
  • Lyon:Ch22,
  • Hershey et al 2015
2019/04/19 [No class]
2019/04/26 [No class]
2019/05/03 4422 Environmental sound analysis
  • Stowell et al 2015,
  • McDrmott and Simoncelli 2011
2019/05/12 4422 Final project presentations Final project presentations
2019/05/19 [No class] Final papers due

Textbooks ( bookstore link)

Required
Richard Lyon (2018), Human and Machine Hearing: Extracting meaning from sound. Cambridge University Press. Author's corrected manuscript.
Required
Mark Gales and Steve Young (2007), "The application of hidden Markov models in speech recognition". Foundations and Trends in Signal Processing. Vol. 1, No. 3, pp 195-304.
Required
Markus Schedl, Emilia Gómez and Julián Urbano (2014), "Music Information Retrieval: Recent Developments and Applications", Foundations and Trends in Information Retrieval: Vol. 8: No. 2-3, pp 127-261.
Optional
Tuomas Virtanen, Rita Singh, and Bhiksha Raj (2012), Techniques for Noise Robustness in Automatic Speech Recognition. Wiley.
Optional
Ben Gold, Nelson Morgan, and Daniel Ellis (2011), Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Second edition. Wiley.
Optional
Alan V. Oppenheim and Ronald W. Schafer (2010), "Discrete-Time Signal Processing". Third Edition. Pearson. And its companion website

Announcements

2019/02/01
I rearranged the first few classes and readings a little bit.
2019/01/24
Welcome to class, the course website has been updated for this semester.
2019/01/15
Welcome to class, the course website has almost been updated for this semester.

Resources

  • See this article for instructions on viewing the detailed feedback I provide on your assignments in blackboard
  • The course website for this course from Fall 2016