During the last summer, I have been a research intern at Telefónica Research (Barcelona). This article is the outcome of this short (but intense!) collaboration with Joan Serrà, where we explore how to train deep learning models with just 1, 2 or 10 audios per class. Check it out on arXiv, and reproduce our results running our code!
This last year I have been collaborating with Francesc Lluís. He is master student in our research group, who worked on “A Wavenet for Music Source Separation”. For more info about our investigation, you can read his thesis or our arXiv paper. Code, and some separations are also available for you!
Many things have happened between the pioneering papers written by Lewis and Todd in the 80s and the current wave of GANs composers. Along that journey, connectionists’ work was forgotten during the AI winter, very influential names (like Schmidhuber or Ng) contributed seminal publications and, in the meantime, researchers have made tons of awesome progress.
I won’t be going through every single paper in the field of neural networks for music nor diving into technicalities, but I’ll cover what are the milestones that helped shaping the current state of music AI – this being a nice excuse to give credit to these wild researchers who decided to care about a signal that is nothing else but cool. Let’s start!
1) Given that enough training data is available: waveform models (sampleCNN) > spectrogram models (musically motivated CNN).
2) But spectrogram models > waveform models when no sizable data are available.
3) Musically motivated CNNs achieve state-of-the-art results for the MTT & MSD datasets.
This post aims to share our experience setting up our deep learning server – thanks to NVidia for the two Titan X Pascal, but also thanks to the Maria de Maeztu Research Program for the machine ! 🙂 The text is divided in two parts: bringing the pieces together, and install TensorFlow. Let’s start!