Here my first personal AMA interview! But wait, what’s an AMA interview? AMA stands for “Ask Me Anything” in Reddit jargon. After reading this interview you will know a bit more about my life and way of thinking 🙂 This interview is a dissemination effort done by the María de Maeztu program (who funds my PhD research), and the AI Grant (who supports our Freesound Datasets project). Let’s start!
One can divide deep learning models into two parts: front-end and back-end – see Figure 1. The front-end is the part of the model that interacts with the input signal in order to map it into a latent-space, and the back-end predicts the output given the representation obtained by the front-end.
In the following, we discuss the different front- and back-ends we identified in the audio classification literature. Continue reading
- Choi et al. – every time I re-read this paper I am more impressed about the efforts they put in assessing the generalization capabilities of deep learning models. This work defines a high evaluation standard for those working in deep auto-tagging models!
- Bittner et al. proposes a fully-convolutional model for tracking f0 contours in polyphonic music. The article has a brilliant introduction drawing parallelisms between their proposed fully-convolutional architecture and previous traditional models – making clear that it is worth building bridges between deep learning works and previous signal processing literature.
- Oramas et al. – deep learning enables to easily combine information from many sources, such as: audio, text or images. They do so by combining representations extracted from audio-spectrograms, word-embeddings and ImageNet-based features. Moreover, they released a new dataset: MuMu, with 147,295 songs belonging to 31,471 albums.
- Jansson et al.‘s work proposes a U-net model for singing voice separation. It seems that adding connections between layers at the same hierarchical level in the encoder and decoder for reconstructing masked audio signals is a good idea since several papers already reported good results using this setup.
But there were many other inspiring papers.. Continue reading
The Department of Information and Communication Technologies at UPF organizes an annual Doctoral Students Workshop, where I was awarded with the Intelygenz Award for Best Poster in Machine Learning for my poster entitled “Towards a grounded deep learning paradigm for music modeling”. The award sponsor, Intelygenz, interviewed me.. and here it is the result!
The signal processing community is very into machine learning. Although I am not sure of the implications of this fact, this intersection already produced very interesting results – such as Smaragdis et al.’s work. Lots of papers related to deep learning were presented. Although in many cases people were naively applying DNN or LSTMs to a new problem, there also was (of course) amazing work with inspiring ideas – I highlight some:
- Koizumi et al. propose using reinforcement learning for source separation. This work introduces how to use reinforcement learning for audio signal processing.
- Ewert et al. propose using a variant of dropout that can be used to induce models to learn specific structures by using information from weak labels.
- Ting-Wei et al. propose doing frame-level predictions with a fully convolutional model that also uses gaussian kernel filters (first introduced by them) trained with clip-level annotations in a weakly-supervised learning setup.