This journal article summarizes the most relevant results we found throughout my master thesis research – namely, the results related to popular western music. However, in this thesis we also describe the first attempt of remixing orchestral music for improving CI users classical music experience. Although the results for orchestral music are not conclusive, they provide nice intuition for designing future experiments and might be valuable for researchers who are interested in that topic.
Abstract – Many researchers use convolutional neural networks with small rectangular filters for music (spectrograms) classification. First, we discuss why there is no reason to use this filters setup by default and second, we point that more efficient architectures could be implemented if the characteristics of the music features are considered during the design process. Specifically, we propose a novel design strategy that might promote more expressive and intuitive deep learning architectures by efficiently exploiting the representational capacity of the first layer – using different filter shapes adapted to fit musical concepts within the first layer. The proposed architectures are assessed by measuring their accuracy in predicting the classes of the Ballroom dataset. We also make available the used code (together with the audio-data) so that this research is fully reproducible.
These (preliminary) results denote that the CNNs design for music informatics research (MIR) can be further optimized by considering the characteristics of the music audio data. By designing musically motivated CNNs, a much more interpretable and efficient model can be obtained. These results are the logical culmination of the previously presented discussion, that we recommend to read first.