Currently, successful neural network audio classifiers use log-mel spectrograms as input. Given a mel-spectrogram matrix X, the logarithmic compression is computed as follows:
f(x) = log(α·X + β).
Common pairs of (α,β) are (1, eps) or (10000,1). In this post we investigate the possibility of learning (α,β). To this end, we study two log-mel spectrogram variants:
- Log-learn: The logarithmic compression of the mel spectrogram X is optimized via SGD together with the rest of the parameters of the model. We use exponential and softplus gates to control the pace of α and β, respectively. We set the initial pre-gate values to 7 and 1, what results in out-of-gate α and β initial values of 1096.63 and 1.31, respectively.
- Log-EPS: We set as baseline a log-mel spectrogram which does not learn the logarithmic compression. (α,β) are set to (1, eps). Note eps stands for “machine epsilon”, a very small number.
TL;DR: We are publishing a negative result,
log-learn did not improve our results! 🙂
During the last summer, I have been a research intern at Telefónica Research (Barcelona). This article is the outcome of this short (but intense!) collaboration with Joan Serrà, where we explore how to train deep learning models with just 1, 2 or 10 audios per class. Check it out on arXiv, and reproduce our results running our code!
This last year I have been collaborating with Francesc Lluís. He is master student in our research group, who worked on “A Wavenet for Music Source Separation”. For more info about our investigation, you can read his thesis or our arXiv paper. Code, and some separations are also available for you!
This was my second ISMIR, and I am super excited of being part of this amazing, diverse, and so inclusive community. It was fun to keep putting faces (and height, and weight) to these names I respect so much! This ISMIR has been very special for me, because I was returning to the city where I kicked off my academic career (5 years ago I was starting a research internship @ IRCAM!), and we won the best student paper award!
Our ISMIR paper on music auto-tagging at scale got the best student paper award – read it on arXiv, and listen to our demo!
That’s a great honor! Thank you everyone, specially to my co-authors for their support. Without them, this would have never been possible 🙂