Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia that allow predictions with pre-trained deep learning models — and some of those are based on musicnn!Continue reading
I’m happy to share the highlights of my first paper with Dolby! We will be presenting this work at ICASSP 2020, in Barcelona.
Several improvements have been proposed to Conv-TasNet – that mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. We propose a (deep) non-linear variant of it, that is based on a deep stack of small filters. With this change, we can improve 0.6-0.9 dB SI-SNRi.
During the last summer, I have been a research intern at Telefónica Research (Barcelona). This article is the outcome of this short (but intense!) collaboration with Joan Serrà, where we explore how to train deep learning models with just 1, 2 or 10 audios per class. Check it out on arXiv, and reproduce our results running our code!
This last year I have been collaborating with Francesc Lluís. He is master student in our research group, who worked on “A Wavenet for Music Source Separation”. For more info about our investigation, you can read his thesis or our arXiv paper. Code, and some separations are also available for you!
1) Given that enough training data is available: waveform models (sampleCNN) > spectrogram models (musically motivated CNN).
2) But spectrogram models > waveform models when no sizable data are available.
3) Musically motivated CNNs achieve state-of-the-art results for the MTT & MSD datasets.