ISMIR article: End-to-end learning for music audio tagging at scale

Our accepted ISMIR paper on music auto-tagging at scale is now online – read it on arXiv, and listen to our demo!

TL;DR:
1) Given that enough training data is available: waveform models (sampleCNN) > spectrogram models (musically motivated CNN).
2) But spectrogram models > waveform models when no sizable data are available.
3) Musically motivated CNNs achieve state-of-the-art results for the MTT & MSD datasets.

Continue reading

arXiv paper & slides: Randomly weighted CNNs for (music) audio classification

A few weeks ago Olga Slizovskaya and I were invited to give a talk to the Centre for Digital Music (C4DM) @ Queen Mary Universtity of London – one of the most renowned music technology research institutions in Europe, and possibly in the world. It’s been an honor, and a pleasure to share our thoughts (and some beers) with you!

Download the slides!

The talk was centered in our recent work on music audio tagging, which is available on arXiv, where we study how non-trained (randomly weighted) convolutional neural networks perform as feature extractors for (music) audio classification tasks.

Deep learning architectures for audio classification: a personal (re)view

One can divide deep learning models into two parts: front-end and back-end – see Figure 1. The front-end is the part of the model that interacts with the input signal in order to map it into a latent-space, and the back-end predicts the output given the representation obtained by the front-end.

Figure 1 – Deep learning pipeline.

In the following, we discuss the different front- and back-ends we identified in the audio classification literature. Continue reading