This year’s ICASSP keywords are: generative adversarial networks (GANs), wavenet, speech enhancement, source separation, industry, music transcription, cover song identification, sampleCNN, monophonic pitch tracking, and gated/dilated CNNs. This time, passionate scientific discussions happened in random sport bars at downtown Calgary – next to dirty snow piles that were melting.
Extreme Learning Machines (ELMs) are very controversial and very fast machine learning models that perform very well. Of course, very is in italics because such word is susceptible to change depending on your background or application field. However, this sentence provides an idea of what ELMs can deliver – and why these might be interesting for an audio community that rarely uses them. Continue reading
This post aims to share our experience setting up our deep learning server – thanks nvidia for the two Titan X Pascal! 🙂 The text is divided in two parts: bringing the pieces together, and install TensorFlow. Let’s start!
I was invited to give a talk to the Deep Learning for Speech and Language Winter Seminar at the UPC in Barcelona. Since UPC is the university where I did my undergraduate studies, it was a great pleasure to give a talk there!
The talk was centered in my recent work on music audio tagging, which is available on arXiv and is summarized in these previous posts: deep learning architectures for music audio classification, and deep end-to-end learning for music audio tagging at Pandora.
One can divide deep learning models into two parts: front-end and back-end – see Figure 1. The front-end is the part of the model that interacts with the input signal in order to map it into a latent-space, and the back-end predicts the output given the representation obtained by the front-end.
In the following, we discuss the different front- and back-ends we identified in the audio classification literature. Continue reading