Which is the outtake of Artificial Intelligence (AI)? This is a recurrent conversation topic among AI practitioners, specialized journalists, and brave politicians. Although some simple concepts are clearly conveyed to the general audience, there are some others that are not so widely known. In this post I’ll be focusing on an important topic that is often overlooked: the economics behind AI.
Since AI is impacting our lives through products available in the marketplace, the goal of this post is to analyze what’s up with AI systems when consumed via the free market. In other words, AI is developed and consumed in a market-driven fashion and I would like to better understand which are the consequences of that. Hence, I’ll be focusing on the economic side of AI to show that for encouraging the main AI actors to behave ethically we better (directly) act over the market.
In this series of posts I have written a couple of articles discussing the pros & cons of spectrogram-based VGG architectures, to think about which is the role of the computer vision deep learning architectures in the audio field. Now is time to discuss what’s up with waveform-based VGGs!
Currently, successful neural network audio classifiers use log-mel spectrograms as input. Given a mel-spectrogram matrix X, the logarithmic compression is computed as follows:
f(x) = log(α·X + β).
Common pairs of (α,β) are (1, eps) or (10000,1). In this post we investigate the possibility of learning (α,β). To this end, we study two log-mel spectrogram variants:
Log-learn: The logarithmic compression of the mel spectrogram X is optimized via SGD together with the rest of the parameters of the model. We use exponential and softplus gates to control the pace of α and β, respectively. We set the initial pre-gate values to 7 and 1, what results in out-of-gate α and β initial values of 1096.63 and 1.31, respectively.
Log-EPS: We set as baseline a log-mel spectrogram which does not learn the logarithmic compression. (α,β) are set to (1, eps). Note eps stands for “machine epsilon”, a very small number.
TL;DR: We are publishing a negative result,
log-learn did not improve our results! 🙂