Audio Transfer Learning with Scikit-learn and Tensorflow

We present a didactic toolkit to rapidly prototype audio classifiers with pre-trained Tensorlow models and Scikit-learn. We use pre-trained Tensorflow models as audio feature extractors, and Scikit-learn classifiers are employed to rapidly prototype competent audio classifiers that can be trained on a CPU.

Check it on Github!

This material was prepared for teaching Tensorflow, Scikit-learn, and deep learning in general. Besides, due to the simplicity of Scikit-learn, this toolkit can be employed to easily build proof-of-concept models with your own data.



What’s up with waveform-based VGGs?

In this series of posts I have written a couple of articles discussing the pros & cons of spectrogram-based VGG architectures, to think about which is the role of the computer vision deep learning architectures in the audio field. Now is time to discuss what’s up with waveform-based VGGs!

Continue reading

Learning the logarithmic compression of the mel spectrogram

Currently, successful neural network audio classifiers use log-mel spectrograms as input. Given a mel-spectrogram matrix X, the logarithmic compression is computed as follows:

f(x) = log(α·X + β).

Common pairs of (α,β) are (1, eps) or (10000,1). In this post we investigate the possibility of learning (α,β). To this end, we study two log-mel spectrogram variants:

  • Log-learn: The logarithmic compression of the mel spectrogram X is optimized via SGD together with the rest of the parameters of the model. We use exponential and softplus gates to control the pace of α and β, respectively. We set the initial pre-gate values to 7 and 1, what results in out-of-gate α and β initial values of 1096.63 and 1.31, respectively.
  • Log-EPS: We set as baseline a log-mel spectrogram which does not learn the logarithmic compression. (α,β) are set to (1, eps). Note eps stands for “machine epsilon”, a very small number.

TL;DR: We are publishing a negative result,
log-learn did not improve our results! 🙂

Continue reading