Slides: Deep learning for music data processing – a personal (re)view

I was invited to give a talk to the Deep Learning for Speech and Language Winter Seminar @ UPC,  Barcelona. Since UPC is the university where I did my undergaduate sudies, it was a great pleasure to give an introductory talk about how our community is using deep learning for approaching music technology problems.

Download the slides!

Overall, the talk was centered in reviewing the state-of-the-art (1988-2016) in deep learning for music data processing in order to boost some discussion about current trends. Several key papers were chronologically listed and briefly described: pioneer papers using MLP [1], RNNs [2], LSTMs [3] and CNNs [4] for music data processing; and pioner papers using symbolic data [1], spectrograms [5] and waveforms [6] – among others.

Continue reading

Journal article: Remixing music using source separation algorithms to improve the musical experience of cochlear implant users

This journal article summarizes the most relevant results we found throughout my master thesis research – namely, the results related to popular western music. However, in this thesis we also describe the first attempt of remixing orchestral music for improving CI users classical music experience. Although the results for orchestral music are not conclusive, they provide nice intuition for designing future experiments and might be valuable for researchers who are interested in that topic.

jasa Continue reading

Conference paper: Designing efficient architectures for modeling temporal features with CNNs

Abstract – Many researchers use convolutional neural networks with small rectangular filters for music (spectrograms) classification. First, we discuss why there is no reason to use this filters setup by default and second, we point that more efficient architectures could be implemented if the characteristics of the music features are considered during the design process. Specifically, we propose a novel design strategy that might promote more expressive and intuitive deep learning architectures by efficiently exploiting the representational capacity of the first layer – using different filter shapes adapted to fit musical concepts within the first layer. The proposed architectures are assessed by measuring their accuracy in predicting the classes of the Ballroom dataset. We also make available the used code (together with the audio-data) so that this research is fully reproducible.

figure_big Continue reading

Lack of annotated music data? Restrict the solution space!

Given that several relevant researchers in our field were in Barcelona for being part of the jury of Ajay‘s and Sankalp‘s PhD thesis defense, the MTG hosted a very interesting seminar. Among other topics, the potential impact of deep learning in our field was discussed and almost everyone agreed that it seems that end-to-end learning approaches are not successful because no large-scale (annotated) music collections are available for research benchmarking. And indeed, most successful deep learning approaches use those models as mere feature extractors or as hierarchical classifiers build on top of hand-crafted features.

tradeoff

Continue reading

Rationalizing the design of deep learning models for music signals

A brief review of the state-of-the-art in music informatics research (MIR) and deep learning reveals that such models achieved competitive results in a relatively short amount of time – most relevant papers were published during the last 5 years. Many researchers successfully used deep learning for several tasks: onset detection, genre classification, chord estimation, auto-tagging or source separation. Even some researchers declare that is the time for a paradigm shift: from hand-crafted features and shallow classifiers to deep processing models. In fact, in the past, introducing machine learning for global modeling (ie. classification) resulted in a significant state-of-the-art advance – no one doubts about that. And now, some researchers think that another advance could be done by using data-driven feature extractors instead of hand-crafted features – meaning that these researchers propose to fully substitute the current pipeline by machine learning. However, deep learning for MIR is still in its early ages. Current systems are based on solutions proposed in the computer vision, natural language or speech research fields. Therefore, now it is time to understand and adapt these for the music case.

Continue reading