ICASSP article: An empirical study of Conv-TasNet

I’m happy to share the highlights of my first paper with Dolby! We will be presenting this work at ICASSP 2020, in Barcelona.

Link to arxiv!

Several improvements have been proposed to Conv-TasNet – that mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. We propose a (deep) non-linear variant of it, that is based on a deep stack of small filters. With this change, we can improve 0.6-0.9 dB SI-SNRi.

Continue reading

Learning the logarithmic compression of the mel spectrogram

Currently, successful neural network audio classifiers use log-mel spectrograms as input. Given a mel-spectrogram matrix X, the logarithmic compression is computed as follows:

f(x) = log(α·X + β).

Common pairs of (α,β) are (1, eps) or (10000,1). In this post we investigate the possibility of learning (α,β). To this end, we study two log-mel spectrogram variants:

  • Log-learn: The logarithmic compression of the mel spectrogram X is optimized via SGD together with the rest of the parameters of the model. We use exponential and softplus gates to control the pace of α and β, respectively. We set the initial pre-gate values to 7 and 1, what results in out-of-gate α and β initial values of 1096.63 and 1.31, respectively.
  • Log-EPS: We set as baseline a log-mel spectrogram which does not learn the logarithmic compression. (α,β) are set to (1, eps). Note eps stands for “machine epsilon”, a very small number.

TL;DR: We are publishing a negative result,
log-learn did not improve our results! 🙂

Continue reading

ISMIR article: End-to-end learning for music audio tagging at scale

Our accepted ISMIR paper on music auto-tagging at scale is now online – read it on arXiv, and listen to our demo!

1) Given that enough training data is available: waveform models (sampleCNN) > spectrogram models (musically motivated CNN).
2) But spectrogram models > waveform models when no sizable data are available.
3) Musically motivated CNNs achieve state-of-the-art results for the MTT & MSD datasets.

Continue reading