The musicnn library (pronounced as “musician”) employs deep convolutional neural networks to automatically tag songs, and the models that are included achieve the best scores in public evaluation benchmarks. These state-of-the-art models have been released as an open-source library that can be easily installed and used. For example, you can use musicnn to tag this emblematic song from Muddy Waters — and it will predominantly tag it as blues!
We present a didactic toolkit to rapidly prototype audio classifiers with pre-trained Tensorlow models and Scikit-learn. We use pre-trained Tensorflow models as audio feature extractors, and Scikit-learn classifiers are employed to rapidly prototype competent audio classifiers that can be trained on a CPU.
This material was prepared for teaching Tensorflow, Scikit-learn, and deep learning in general. Besides, due to the simplicity of Scikit-learn, this toolkit can be employed to easily build proof-of-concept models with your own data.
In this series of posts I have written a couple of articles discussing the pros & cons of spectrogram-based VGG architectures, to think about which is the role of the computer vision deep learning architectures in the audio field. Now is time to discuss what’s up with waveform-based VGGs!