Here the (comprehensive) list of papers on audio and speech that were presented this week at ICLR – together with some comments. At the end of the post, I also share my thoughts regarding the virtual aspect of the conference.

Here the (comprehensive) list of papers on audio and speech that were presented this week at ICLR – together with some comments. At the end of the post, I also share my thoughts regarding the virtual aspect of the conference.
To facilitate my online teaching activities, I collected on this website the educational material I have been preparing throughout the years.
I thought it could be useful to share it! It is ready to use and includes a quiz and a lab. For now, I cover music/audio classification and the basics of deep learning for music/audio.
Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia that allow predictions with pre-trained deep learning models — and some of those are based on musicnn!
Here a link to the paper, and to a nice post on how to use it!
I’m happy to share the highlights of my first paper with Dolby! We will be presenting this work at ICASSP 2020, in Barcelona.
Several improvements have been proposed to Conv-TasNet – that mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. We propose a (deep) non-linear variant of it, that is based on a deep stack of small filters. With this change, we can improve 0.6-0.9 dB SI-SNRi.
Although I’m now a researcher at Dolby Laboratories, I’m still collaborating with some universities in Barcelona — where I’ll keep teaching deep learning for music and audio. In this context, and given the importance of the gradient vanishing/explode problem in deep neural networks, this week I’ll be teaching recurrent neural networks to the Master in Sound and Music Computing students of the Universitat Pompeu Fabra.