## What’s up with waveform-based VGGs?

In this series of posts I have written a couple of articles discussing the pros & cons of spectrogram-based VGG architectures, to think about which is the role of the computer vision deep learning architectures in the audio field. Now is time to discuss what’s up with waveform-based VGGs!

## Why do spectrogram-based VGGs rock?

Me: VGGs suck because they are computationally inefficient, and because they are a naive adoption of a computer vision architecture.

Random person on Internet: Jordi, you might be wrong. People use VGGs a lot!

## Why do spectrogram-based VGGs suck?

Me: VGGs suck because they are computationally inefficient and because they are a naive adoption of a computer vision architecture.

Random person on Internet: Jordi, you might be wrong. People use VGGs a lot!

## Learning the logarithmic compression of the mel spectrogram

Currently, successful neural network audio classifiers use log-mel spectrograms as input. Given a mel-spectrogram matrix X, the logarithmic compression is computed as follows:

f(x) = log(α·X + β).

Common pairs of (α,β) are (1, eps) or (10000,1). In this post we investigate the possibility of learning (α,β). To this end, we study two log-mel spectrogram variants:

• Log-learn: The logarithmic compression of the mel spectrogram X is optimized via SGD together with the rest of the parameters of the model. We use exponential and softplus gates to control the pace of α and β, respectively. We set the initial pre-gate values to 7 and 1, what results in out-of-gate α and β initial values of 1096.63 and 1.31, respectively.
• Log-EPS: We set as baseline a log-mel spectrogram which does not learn the logarithmic compression. (α,β) are set to (1, eps). Note eps stands for “machine epsilon”, a very small number.

TL;DR: We are publishing a negative result,
log-learn did not improve our results! 🙂

## arXiv article: End-to-end music source separation: is it possible in the waveform domain?

This last year I have been collaborating with Francesc Lluís. He is master student in our research group, who worked on “A Wavenet for Music Source Separation”. For more info about our investigation, you can read his thesis or our arXiv paper. Code, and some separations are also available for you!