This has been my first ISMIR ever, and I am thrilled for being part of this amazing community. It was fun to put faces (and hight, and weight) to these names I respect so much!
All awarded papers were amazing, and these are definitely in my list of highlights:
Choi et al. – every time I re-read this paper I am more impressed about the efforts they put in assessing the generalization capabilities of deep learning models. This work defines a high evaluation standard for those working in deep auto-tagging models!
Bittner et al. proposes a fully-convolutional model for tracking f0 contours in polyphonic music. The article has a brilliant introduction drawing parallelisms between their proposed fully-convolutional architecture and previous traditional models – making clear that it is worth building bridges between deep learning works and previous signal processing literature.
Oramas et al. – deep learning enables to easily combine information from many sources, such as: audio, text or images. They do so by combining representations extracted from audio-spectrograms, word-embeddings and ImageNet-based features. Moreover, they released a new dataset: MuMu, with 147,295 songs belonging to 31,471 albums.
Jansson et al.‘s work proposes a U-net model for singing voice separation. It seems that adding connections between layers at the same hierarchical level in the encoder and decoder for reconstructing masked audio signals is a good idea since several papers already reported good results using this setup.
These lasts weeks we have been disseminating our recent work: “A Wavenet for Speech Denoising”. To this end, I gave two talks in the Bay Area of San Francisco: one at Dolby Laboratories and the other one at Pandora Radio — where I am currently doing an internship.
These last months have been very intense for us – and, as a result, three papers were recently uploaded to arXiv. Two of those have been accepted for presentation in ISMIR, and are the result of a collaboration with Rong – who is an amazing PhD student (also advised by Xavier) working on Jingju music:
Our EUSIPCO 2017 paper got accepted! This paper was done in collaboration with Olga Slizovskaia, Rong Gong, Emilia Gómez and Xavier Serra. And it is entitled: “Timbre Analysis of Music Audio Signals with Convolutional Neural Networks”.
And I have been awarded with one of the AI Grants given by Nat Friedman for creating a dataset of sounds from Freesound and using it in my research. The AI grants are an initiative of Nat Friedman, Cofounder/CEO of Xamarin, to support open-source AI projects. The project I proposed is part of an initiative of the MTG to promote the use of Freesound.org for research. The goal is to create a large dataset of sounds, following the same principles as Imagenet – in order to make audio AI more accessible to everyone. The project will contribute in developing an infrastructure to organize a crowdsource tool to convert Freesound into a research dataset. The following video presents the aforementioned project: