When: Tuesday, May 15th @10am
Title: “Supervised and Unsupervised Semantic Audio Representations”
Abstract: The Sound Understanding team at Google has been developing automatic sound classification tools with the ambition to cover all possible sounds – speech, music, and environmental. I will describe our application of vision-inspired deep neural networks to the classification of our new ‘AudioSet’ ontology of ~600 sound events. I’ll also talk about recent work using triplet loss to train semantic representations — where semantically ‘similar’ sounds end up close by in the representation — from unlabeled data.
Bio: Dan Ellis joined Google in 2015 after 15 years as a faculty member in the Electrical Engineering department at Columbia University, where he headed the Laboratory for Recognition and Organization of Speech and Audio (LabROSA). He has over 150 publications in the areas of audio processing, speech recognition, and music information retrieval.
Joint work with Aren Jansen, Manoj Plakal, Ratheet Pandya, Shawn Hershey, Jiayang Liu, Channing Moore, Rif A. Saurous