Post written in collaboration with and sponsorship of Exxact (@Exxactcorp).
Many things have happened between the pioneering papers written by Lewis and Todd in the 80s and the current wave of GANs composers. Along that journey, connectionists’ work was forgotten during the AI winter, very influential names (like Schmidhuber or Ng) contributed seminal publications and, in the meantime, researchers have made tons of awesome progress.
I won’t be going through every single paper in the field of neural networks for music nor diving into technicalities, but I’ll cover what are the milestones that helped shaping the current state of music AI – this being a nice excuse to give credit to these wild researchers who decided to care about a signal that is nothing else but cool. Let’s start!
Here my first personal AMA interview! But wait, what’s an AMA interview? AMA stands for “Ask Me Anything” in Reddit jargon. After reading this interview you will know a bit more about my life and way of thinking 🙂 This interview is a dissemination effort done by the María de Maeztu program (who funds my PhD research), and the AI Grant (who supports our Freesound Datasets project). Let’s start!
One can divide deep learning models into two parts: front-end and back-end – see Figure 1. The front-end is the part of the model that interacts with the input signal in order to map it into a latent-space, and the back-end predicts the output given the representation obtained by the front-end.
Figure 1 – Deep learning pipeline.
In the following, we discuss the different front- and back-ends we identified in the audio classification literature. Continue reading
This has been my first ISMIR ever, and I am thrilled for being part of this amazing community. It was fun to put faces (and hight, and weight) to these names I respect so much!
All awarded papers were amazing, and these are definitely in my list of highlights:
- Choi et al. – every time I re-read this paper I am more impressed about the efforts they put in assessing the generalization capabilities of deep learning models. This work defines a high evaluation standard for those working in deep auto-tagging models!
- Bittner et al. proposes a fully-convolutional model for tracking f0 contours in polyphonic music. The article has a brilliant introduction drawing parallelisms between their proposed fully-convolutional architecture and previous traditional models – making clear that it is worth building bridges between deep learning works and previous signal processing literature.
- Oramas et al. – deep learning enables to easily combine information from many sources, such as: audio, text or images. They do so by combining representations extracted from audio-spectrograms, word-embeddings and ImageNet-based features. Moreover, they released a new dataset: MuMu, with 147,295 songs belonging to 31,471 albums.
- Jansson et al.‘s work proposes a U-net model for singing voice separation. It seems that adding connections between layers at the same hierarchical level in the encoder and decoder for reconstructing masked audio signals is a good idea since several papers already reported good results using this setup.
But there were many other inspiring papers.. Continue reading