This is the first ICASSP I’m feeling that the conference has become a place where influential machine learning papers are presented. I’m happy to see that most of our community is not only employing ‘LSTMs for a new dataset‘, but are proposing novel and inspiring machine learning methods. Let’s see what happened in Brighton (UK)!
This was my second ISMIR, and I am super excited of being part of this amazing, diverse, and so inclusive community. It was fun to keep putting faces (and height, and weight) to these names I respect so much! This ISMIR has been very special for me, because I was returning to the city where I kicked off my academic career (5 years ago I was starting a research internship @ IRCAM!), and we won the best student paper award!
Our accepted ISMIR paper on music auto-tagging at scale is now online – read it on arXiv, and listen to our demo!
TL;DR: 1) Given that enough training data is available: waveform models (sampleCNN) > spectrogram models (musically motivated CNN). 2) But spectrogram models > waveform models when no sizable data are available. 3) Musically motivated CNNs achieve state-of-the-art results for the MTT & MSD datasets.
After assisting to the Google Speech Summit 2018, I can adventure to say that Google’s speech interests for the future are: (i) to continue improving their automatic speech recognition (w/ Listen, Attend and Spell, a seq2seq model) and speech synthesis (w/ Tacotron 2 + Wavenet/WaveRNN) systems so that a robust interface is available for their conversational agent; (ii) they want to keep simplifying pipelines – having less “separated” blocks in order to be end-to-end whenever is possible; (iii) they are studying how to better control some aspects of their end-to-end models – for example, with style tokens they aim to control some Tacotron (synthesis) parameters; and (iv) lots of efforts are put in building the Google Assistant, a conversational agent that I guess will be the basis of their next generation of products.
The following lines aim to summarize (by topics) what I found relevant – and, ideally, describe some details that are not in the papers.