Publications -- Michael I Mandel

Theses, Chapters

[1]	Michael Mandel, Justin Salamon, and Daniel P.W. Ellis, editors. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019). New York University, NY, USA, October 2019. [ bib \| DOI ]
[2]	Michael I Mandel, Shoko Araki, and Tomohiro Nakatani. Multichannel clustering and classification approaches. In Emmanuel Vincent, Tuomas Virtanen, and Sharon Gannot, editors, Audio Source Separation and Speech Enhancement, chapter 12. Wiley, 2018. [ bib ]
[3]	Michael I Mandel and Jon P Barker. Multichannel spatial clustering using model-based source separation. In Shinji Watanabe, Marc Delcroix, Florian Metze, and John R. Hershey, editors, New Era for Robust Speech Recognition: Exploiting, Deep Learning, chapter 3. Springer, 2017. [ bib \| DOI ]
[4]	Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Michael Mandel, Liang Lu, John R. Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, and Dong Yu. Discriminative beamforming with phase-aware neural networks for speech enhancement and recognition. In Shinji Watanabe, Marc Delcroix, Florian Metze, and John R. Hershey, editors, New Era for Robust Speech Recognition: Exploiting, Deep Learning, chapter 4. Springer, 2017. [ bib \| DOI ]
[5]	Johanna Devaney, Michael I Mandel, Douglas Turnbull, and George Tzanetakis, editors. Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York, 2016. [ bib \| http ]
[6]	Thierry Bertin-Mahieux, Douglas Eck, and Michael I. Mandel. Automatic tagging of audio: The state-of-the-art. In Wenwu Wang, editor, Machine Audition: Principles, Algorithms and Systems, chapter 14, pages 334--352. IGI Publishing, 2010. [ bib ]
[7]	Michael I. Mandel. Binaural Model-Based Source Separation and Localization. PhD thesis, Columbia University, February 2010. [ bib \| .pdf \| Abstract ]

Journal

[1]	Vieh Anh Trinh and Michael I Mandel. Directly comparing the listening strategies of humans and machines. IEEE Transactions on Audio, Speech, and Language Processing, 29:312--323, 2021. [ bib \| DOI \| Abstract ]
[2]	Michael I Mandel, Vikas Grover, Mengxuan Zhao, Jiyoung Choi, and Valerie Shafer. The bubble-noise technique for speech perception research. Perspectives of the ASHA Special Interest Groups, 4(6):1653--1666, 2019. [ bib \| DOI \| Abstract ]
[3]	Michael I Mandel, Sarah E Yoho, and Eric W Healy. Measuring time-frequency importance functions of speech with bubble noise. Journal of the Acoustical Society of America, 140:2542--2553, 2016. [ bib \| DOI \| Code \| .pdf \| Abstract ]
[4]	Hugo Larochelle, Michael I Mandel, Razvan Pascanu, and Yoshua Bengio. Learning algorithms for the classification restricted boltzmann machine. Journal of Machine Learning Research, 13:643--669, March 2012. [ bib \| .pdf \| Abstract ]
[5]	Ron Weiss, Michael I. Mandel, and Daniel P. W. Ellis. Combining localization cues and source model constraints for binaural source separation. Speech Communication, 53(5):606--621, May 2011. [ bib \| DOI \| .pdf \| Abstract ]
[6]	Michael I. Mandel, Razvan Pascanu, Douglas Eck, Yoshua Bengio, Luca M. Aiello, Rossano Schifanella, and Filippo Menczer. Contextual tag inference. ACM Transactions on Multimedia Computing, Communications and Applications, 7S(1):32:1--32:18, October 2011. [ bib \| DOI \| .pdf \| Abstract ]
[7]	Johanna Devaney, Michael I. Mandel, Daniel P. W. Ellis, and Ichiro Fujinaga. Automatically extracting performance data from recordings of trained singers. Psychomusicology: Music, Mind & Brain, 21(1–-2):108--136, 2012. [ bib \| .pdf \| Abstract ]
[8]	Michael I. Mandel, Scott Bressler, Barbara Shinn-Cunningham, and Daniel P. W. Ellis. Evaluating source separation algorithms with reverberant speech. IEEE Transactions on Audio, Speech, and Language Processing, 18(7):1872--1883, 2010. [ bib \| DOI \| .pdf \| Abstract ]
[9]	Michael I. Mandel, Ron J. Weiss, and Daniel P. W. Ellis. Model-based expectation maximization source separation and localization. IEEE Transactions on Audio, Speech, and Language Processing, 18(2):382--394, February 2010. [ bib \| DOI \| .pdf \| Abstract ]
[10]	Michael I. Mandel and Daniel P. W. Ellis. A web-based game for collecting music metadata. Journal of New Music Research, 37(2):151--165, 2008. [ bib \| DOI \| .pdf \| Abstract ]
[11]	Thomas S. Huang, Charlie K. Dagli, Shyamsundar Rajaram, Edward Y. Chang, Michael I. Mandel, Graham E. Poliner, and Daniel P. W. Ellis. Active learning for interactive multimedia retrieval. Proceedings of the IEEE, 96(4):648--667, 2008. [ bib \| DOI \| Abstract ]
[12]	Michael I. Mandel, Graham E. Poliner, and Daniel P. W. Ellis. Support vector machine active learning for music retrieval. Multimedia systems, 12(1):1--11, August 2006. [ bib \| DOI \| .pdf \| Abstract ]

Conference

[1]	Enis Berk Çoban, Megan Perra, and Michael I Mandel. Towards high resolution weather monitoring with sound data. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024. To appear. [ bib \| Abstract ]
[2]	Ali Raza Syed and Michael I Mandel. Estimating shapley values of training utterances for automatic speech recognition models. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2023. [ bib ]
[3]	Viet Ahn Trinh, Hassan Salami Kavaki, and Michael I Mandel. Importantaug: a data augmentation agent for speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2022. [ bib ]
[4]	Enis Berk Çoban, Megan Perra, Dara Pir, and Michael I Mandel. Edansa-2019: The ecoacoustic dataset from arctic north slope alaska. In Workshop on the Detection and Classification of Audio Scenes and Environments, 2022. [ bib ]
[5]	Enis Berk Çoban, Ali R Syed, Dara Pir, and Michael I Mandel. Towards large scale ecoacoustic monitoring with small amounts of labeled data. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021. [ bib ]
[6]	Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shixiong Zhang, Dong Yu, and Michael I Mandel. WPD++: an improved neural beamformer for simultaneous speech separation and dereverberation. In IEEE Workshop on Spoken Language Technologies, 2020. [ bib ]
[7]	Hassan Salami Kavaki and Michael I Mandel. Identifying important time-frequency locations in continuous speech utterances. In Proceedings of Interspeech, pages 1639--1643, 2020. [ bib \| DOI \| .pdf \| Abstract ]
[8]	Viet Anh Trinh and Michael I. Mandel. Large scale evaluation of importance maps in automatic speech recognition. In Proceedings of Interspeech, pages 1166--1170, 2020. [ bib \| DOI \| .pdf \| Abstract ]
[9]	Hussein Ghaly and Michael I Mandel. Using prosody to improve dependency parsing. In Speech prosody, 2020. [ bib ]
[10]	Enis Berk Çoban, Dara Pir, Richard So, and Michael I Mandel. Transfer learning from youtube soundtracks to tag arctic ecoacoustic recordings. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 726--730, 2020. [ bib \| DOI \| .pdf \| Abstract ]
[11]	Soumi Maiti and Michael I Mandel. Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 206--210, 2020. [ bib \| DOI \| arXiv \| Demo \| Slides \| .pdf \| Abstract ]
[12]	Zhaoheng Ni and Michael I Mandel. Mask-dependent phase estimation for monaural speaker separation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2020. [ bib \| arXiv \| .pdf \| Abstract ]
[13]	Soumi Maiti and Michael I Mandel. Parametric resynthesis with neural vocoders. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 303--307, 2019. [ bib \| DOI \| arXiv \| Demo \| .pdf ]
[14]	Soumi Maiti and Michael I Mandel. Speech denoising by parametric resynthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 6995--6999, 2019. [ bib \| DOI \| Demo \| Poster \| .pdf \| Abstract ]
[15]	Viet Anh Trinh, Brian McFee, and Michael I Mandel. Bubble cooperative networks for identifying important speech cues. In Proceedings of Interspeech, pages 1616--1620, 2018. [ bib \| DOI \| Poster \| .pdf \| Abstract ]
[16]	Ali Raza Syed, Viet Anh Trinh, and Michael I. Mandel. Concatenative resynthesis with improved training signals for speech enhancement. In Proceedings of Interspeech, pages 1195--1199, 2018. [ bib \| DOI \| Poster \| .pdf \| Abstract ]
[17]	Soumi Maiti, Joey Ching, and Michael I. Mandel. Large vocabulary concatenative resynthesis. In Proceedings of Interspeech, pages 1190--1194, 2018. [ bib \| DOI \| Poster \| .pdf \| Abstract ]
[18]	Soumi Maiti and Michael I Mandel. Concatenative resynthesis using twin networks. In Proceedings of Interspeech, pages 3647--3651, 2017. [ bib \| DOI \| .pdf \| Abstract ]
[19]	Ali Syed, Andrew Rosenberg, and Michael I Mandel. Active learning for low-resource speech recognition: Impact of selection size and language modeling data. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017. [ bib \| .pdf \| Abstract ]
[20]	Johanna Devaney and Michael I Mandel. An evaluation of score-informed methods for estimating fundamental frequency and power from polyphonic audio. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017. [ bib \| .pdf \| Abstract ]
[21]	Michael I Mandel and Jon P Barker. Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions. In Proceedings of Interspeech, pages 1991--1995, 2016. [ bib \| DOI \| Slides \| .pdf \| Abstract ]
[22]	Michael I Mandel. Directly comparing the listening strategies of humans and machines. In Proceedings of Interspeech, pages 660--664, 2016. [ bib \| DOI \| Poster \| .pdf \| Abstract ]
[23]	Hakan Erdogan, John Hershey, Shinji Watanabe, Michael I Mandel, and Jonathan Le Roux. Improved MVDR beamforming using single-channel mask prediction networks. In Proceedings of Interspeech, pages 1981--1985, 2016. [ bib \| DOI \| .PDF \| Abstract ]
[24]	Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Liang Lu, John Hershey, Michael L Seltzer, Guoguo Chen, Yu Zhang, Michael Mandel, and Dong Yu. Deep beamforming networks for multi-channel speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 5745--5749. IEEE, mar 2016. [ bib \| DOI \| .pdf \| Abstract ]
[25]	Deblin Bagchi, Michael I Mandel, Zhongqiu Wang, Yanzhang He, Andrew Plummer, and Eric Fosler-Lussier. Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pages 496--503, 2015. [ bib \| DOI \| .pdf \| Abstract ]
[26]	Sreyas Srimath Tirumala and Michael I Mandel. Exciting estimated clean spectra for speech resynthesis. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015. [ bib \| Poster \| .pdf \| Abstract ]
[27]	Michael I Mandel and Young Suk Cho. Audio super-resolution using concatenative resynthesis. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015. [ bib \| Demo \| Slides \| .pdf \| Abstract ]
[28]	Michael I Mandel and Nicoleta Roman. Enforcing consistency in spectral masks using markov random fields. In Proceedings of EUSIPCO, pages 2028--2032, 2015. [ bib \| .pdf \| Abstract ]
[29]	Michael I Mandel, Young-Suk Cho, and Yuxuan Wang. Learning a concatenative resynthesis system for noise suppression. In Proceedings of the IEEE GlobalSIP conference, 2014. [ bib \| Demo \| Poster \| .pdf \| Abstract ]
[30]	Michael I Mandel, Sarah E Yoho, and Eric W Healy. Generalizing time-frequency importance functions across noises, talkers, and phonemes. In Proceedings of Interspeech, 2014. [ bib \| Poster \| .pdf \| Abstract ]
[31]	Michael I Mandel and Arun Narayanan. Analysis-by-synthesis feature estimation for robust automatic speech recognition using spectral masks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2014. [ bib \| Poster \| .pdf \| Abstract ]
[32]	Arnab Nandi, Lilong Jiang, and Michael I Mandel. Gestural query specification. In Proceedings of the International Conference on Very Large Data Bases, volume 7, 2014. [ bib \| Slides \| .pdf \| Abstract ]
[33]	Michael I. Mandel. Learning an intelligibility map of individual utterances. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013. [ bib \| .pdf \| Abstract ]
[34]	Nicoleta Roman and Micheal Mandel. Classification based binaural dereverberation. In Proceedings of Interspeech, 2013. [ bib \| Abstract ]
[35]	Johanna Devaney, Michael I. Mandel, and Ichiro Fujinaga. A study of intonation in three-part singing using the automatic music performance analysis and comparison toolkit (AMPACT). In Proceedings of the International Society for Music Information Retrieval conference, 2012. [ bib \| .pdf \| Abstract ]
[36]	Johanna Devaney, Michael I. Mandel, and Ichiro Fujinaga. Characterizing singing voice fundamental frequency trajectories. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 73--76, October 2011. [ bib \| Poster \| .pdf \| Abstract ]
[37]	Michael I. Mandel, Douglas Eck, and Yoshua Bengio. Learning tags that vary within a song. In Proceedings of the International Society for Music Information Retrieval conference, pages 399--404, August 2010. [ bib \| Slides \| .pdf \| Abstract ]
[38]	James Bergstra, Michael I. Mandel, and Douglas Eck. Scalable genre and tag prediction with spectral covariance. In Proceedings of the International Society for Music Information Retrieval conference, pages 507--512, August 2010. [ bib \| .pdf \| Abstract ]
[39]	Michael I. Mandel and Daniel P. W. Ellis. The ideal interaural parameter mask: a bound on binaural separation systems. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 85--88, October 2009. [ bib \| DOI \| Poster \| .pdf \| Abstract ]
[40]	Johanna Devaney, Michael I. Mandel, and Daniel P. W. Ellis. Improving MIDI-audio alignment with acoustic features. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 45--48, October 2009. [ bib \| DOI \| .pdf \| Abstract ]
[41]	Edith Law, Kris West, Michael I Mandel, Mert Bay, and J. Stephen Downie. Evaluation of algorithms using games: the case of music annotation. In Proceedings of the International Society for Music Information Retrieval conference, pages 387--392, October 2009. [ bib \| .pdf \| Abstract ]
[42]	Ron J. Weiss, Michael I. Mandel, and Daniel P. W. Ellis. Source separation based on binaural cues and source model constraints. In Proceedings of Interspeech, pages 419--422, September 2008. [ bib \| Demo \| .pdf \| Abstract ]
[43]	Michael I. Mandel and Daniel P. W. Ellis. Multiple-instance learning for music information retrieval. In Proceedings of the International Society for Music Information Retrieval conference, pages 577--582, September 2008. [ bib \| Poster \| .pdf \| Abstract ]
[44]	Daniel P. W. Ellis, Courtenay V. Cotton, and Michael I. Mandel. Cross-correlation of beat-synchronous representations for music similarity. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 57--60, April 2008. [ bib \| DOI \| .pdf \| Abstract ]
[45]	Michael I. Mandel and Daniel P. W. Ellis. EM localization and separation using interaural level and phase cues. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 275--278, October 2007. [ bib \| DOI \| Poster \| .pdf \| Abstract ]
[46]	Michael I. Mandel and Daniel P. W. Ellis. A web-based game for collecting music metadata. In Simon Dixon, David Bainbridge, and Rainer Typke, editors, Proceedings of the International Society for Music Information Retrieval conference, pages 365--366, September 2007. [ bib \| Poster \| .pdf \| Abstract ]
[47]	Michael I. Mandel, Daniel P. W. Ellis, and Tony Jebara. An EM algorithm for localizing multiple sound sources in reverberant environments. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, pages 953--960. MIT Press, Cambridge, MA, 2007. [ bib \| Poster \| .pdf \| Abstract ]
[48]	Michael I. Mandel and Daniel P. W. Ellis. Song-level features and support vector machines for music classification. In Joshua D. Reiss and Geraint A. Wiggins, editors, Proceedings of the International Society for Music Information Retrieval conference, pages 594--599, September 2005. [ bib \| Poster \| .pdf \| Abstract ]
[49]	Erik B. Sudderth, Michael I. Mandel, William T. Freeman, and Alan S. Willsky. Distributed occlusion reasoning for tracking with nonparametric belief propagation. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances in Neural Information Processing Systems, pages 1369--1376. MIT Press, Cambridge, MA, 2005. [ bib \| Demo \| .pdf \| Abstract ]

Other

[1]	Eleanor Davol, Natalie Boelman, Todd Brinkman, Carissa Brown, Glen Liston, Michael Mandel, Enis Coban, Megan Perra, Kirsten Reid, Scott Leorna, et al. Automated soundscape analysis reveals strong influence of time since wildfire on boreal breeding birds. In AGU Fall Meeting Abstracts, volume 2021, pages B23C--03, 2021. [ bib ]
[2]	Zhaoheng Ni, Felix Grezes, Viet Anh Trinh, and Michael I Mandel. Improved MVDR beamforming using LSTM speech models to clean spatial clustering masks, 2020. [ bib \| arXiv \| .pdf ]
[3]	Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, and Michael Mandel. Enhancement of spatial clustering-based time-frequency masks using lstm neural networks, 2020. [ bib \| arXiv \| .pdf ]
[4]	Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, and Michael Mandel. Combining spatial clustering with lstm speech models for multichannel speech enhancement, 2020. [ bib \| arXiv \| .pdf ]
[5]	Tian Cai, Michael I Mandel, and Di He. Music autotagging as captioning. In First Workshop on NLP for Music and Audio, 2020. [ bib \| Poster \| http \| Abstract ]
[6]	Shinji Watanabe, Michael I Mandel, Jon Barker, and Emmanuel Vincent. CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings, 2020. [ bib \| arXiv \| Abstract ]
[7]	Lauren Mandel, Michael I. Mandel, and Chris Streb. Soundscape ecology: How listening to the environment can shape design and planning. In American Society for Landscape Architects Conference on Landscape Architecture, San Diego, CA, 2019. [ bib ]
[8]	Zhaoheng Ni and Michael I Mandel. Onssen: an open-source speech separation and enhancement library. pages 7269--7273, 2020. [ bib \| DOI \| arXiv \| Abstract ]
[9]	Vikas Grover, Michael I Mandel, Valerie Shafer, Yusra Syed, and Austin Twine. Understanding acoustic cues non-native speakers use for identifying english /v/-/w/ using bubble noise method. In ASHA Convention, 2018. [ bib \| http \| Abstract ]
[10]	Hussein Ghaly and Michael I Mandel. Analyzing human and machine performance in resolving ambiguous spoken sentences. In 1st Workshop on Speech-Centric Natural Language Processing (SCNLP), pages 18--26, 2017. [ bib \| .pdf ]
[11]	Jiyoung Choi and Michael I Mandel. Perception of korean fricatives and affricates in 'bubble' noise by native and nonnative speakers. In International Circle of Korean Linguistics, 2017. [ bib ]
[12]	Michael I Mandel and Nicoleta Roman. Integrating markov random fields and model-based expectation maximization source separation and localization. In Acoustical Society of America Spring Meeting, 2015. [ bib \| Slides ]
[13]	Michael I Mandel, Sarah E Yoho, and Eric W Healy. Listener consistency in identifying speech mixed with particular “bubble” noise instances. In Acoustical Society of America Spring Meeting, 2015. [ bib \| Poster ]
[14]	Michael I Mandel and Song Hui Chon. Using auditory bubbles to determine spectro-temporal cues of timbre. In Cognitively Based Music Informatics Research (CogMIR), 2014. [ bib \| Slides \| Abstract ]
[15]	Arnab Nandi and Michael I Mandel. The interactive join: Recognizing gestures for database queries. In CHI Works-In-Progress, 2013. [ bib \| Poster \| .pdf \| Abstract ]
[16]	Michael Mandel, Razvan Pascanu, Hugo Larochelle, and Yoshua Bengio. Autotagging music with conditional restricted boltzmann machines, March 2011. [ bib \| arXiv \| http \| Abstract ]
[17]	Michael I. Mandel and Daniel P. W. Ellis. A probability model for interaural phase difference. In ISCA Workshop on Statistical and Perceptual Audio Processing SAPA, pages 1--6, 2006. [ bib \| Demo \| Slides \| .pdf \| Abstract ]
[18]	Erik B. Sudderth, Michael I. Mandel, William T. Freeman, and Alan S. Willsky. Visual hand tracking using nonparametric belief propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 189--197, 2004. [ bib \| DOI \| Demo \| .pdf \| Abstract ]