Our accepted ISMIR paper on music auto-tagging at scale is now online – read it on arXiv, and listen to our demo!
TL;DR:
1) Given that enough training data is available: waveform models (sampleCNN) > spectrogram models (musically motivated CNN).
2) But spectrogram models > waveform models when no sizable data are available.
3) Musically motivated CNNs achieve state-of-the-art results for the MTT & MSD datasets.
Continue reading