Rationalizing the design of deep learning models for music signals

A brief review of the state-of-the-art in music informatics research (MIR) and deep learning reveals that such models achieved competitive results in a relatively short amount of time – most relevant papers were published during the last 5 years. Many researchers successfully used deep learning for several tasks: onset detection, genre classification, chord estimation, auto-tagging or source separation. Even some researchers declare that is the time for a paradigm shift: from hand-crafted features and shallow classifiers to deep processing models. In fact, in the past, introducing machine learning for global modeling (ie. classification) resulted in a significant state-of-the-art advance – no one doubts about that. And now, some researchers think that another advance could be done by using data-driven feature extractors instead of hand-crafted features – meaning that these researchers propose to fully substitute the current pipeline by machine learning. However, deep learning for MIR is still in its early ages. Current systems are based on solutions proposed in the computer vision, natural language or speech research fields. Therefore, now it is time to understand and adapt these for the music case.

Continue reading