These lasts weeks we have been disseminating our recent work: “A Wavenet for Speech Denoising”. To this end, I gave two talks in the Bay Area of San Francisco: one at Dolby Laboratories and the other one at Pandora Radio — where I am currently doing an internship.
But Dario (coauthor of the paper) also gave a talk in the Technical University of Munich, and I am excited to share his slides with you — since these have fantastic and very clarifying figures!
Hopefully, checking our complementary views might help folks better understanding our work.
These last months have been very intense for us – and, as a result, three papers were recently uploaded to arXiv. Two of those have been accepted for presentation in ISMIR, and are the result of a collaboration with Rong – who is an amazing PhD student (also advised by Xavier) working on Jingju music:
- Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks [code]
- Audio to score matching by combining phonetic and duration information [code]
The third paper was done in collaboration with Dario (an excellent master student!) who was interested in using deep learning models operating directly on the audio:
Our EUSIPCO 2017 paper got accepted! This paper was done in collaboration with Olga Slizovskaia, Rong Gong, Emilia Gómez and Xavier Serra. And it is entitled: “Timbre Analysis of Music Audio Signals with Convolutional Neural Networks”.
And I have been awarded with one of the AI Grants given by Nat Friedman for creating a dataset of sounds from Freesound and using it in my research. The AI grants are an initiative of Nat Friedman, Cofounder/CEO of Xamarin, to support open-source AI projects. The project I proposed is part of an initiative of the MTG to promote the use of Freesound.org for research. The goal is to create a large dataset of sounds, following the same principles as Imagenet – in order to make audio AI more accessible to everyone. The project will contribute in developing an infrastructure to organize a crowdsource tool to convert Freesound into a research dataset. The following video presents the aforementioned project:
After digging into AudioSet Ontology, we realized that a tree-like visualization might be very useful for further understanding the proposed ontology. To this end, we adapted some code from here: https://bl.ocks.org/mbostock/4339083
Project done in collaboration with Xavier Favory, Eduardo Fonseca and Frederic Font.