Journal article:

  • Jordi Pons, Jordi Janer, Thilo Rode & Waldo Nogueira (2016, December). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. Journal of the Acoustical Society of America, vol. 140, no 6, p. 4338-4349.
    [code1, code2, ASA]

Conference and workshop papers (peer-reviewed):

  • Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons & Xavier Serra (May, 2020). TensorFlow audio models in Essentia. In (ICASSP2020).
    [arXiv, code & demos]
  • Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar (May, 2020). An empirical study of Conv-TasNet. In (ICASSP2020).
  • Jordi Pons & Xavier Serra (November, 2019). musicnn: pre-trained convolutional neural networks for music audio tagging. In Late breaking/demo session of the 20th International Society for Music Information Retrieval Conference (LBD-ISMIR2019).
    [arXiv, code, demo]
  • Francesc Lluís, Jordi Pons & Xavier Serra (September, 2019). End-to-end music source separation: is it possible in the waveform domain? In 20th Annual Conference of the International Speech Communication Association (INTERSPEECH2019).
    [arXiv, code, demo]
  • Jordi Pons, Joan Serrà & Xavier Serra (October, 2018). Training neural audio classifiers with few data. In (ICASSP2019).
    [arXiv, code] – Oral presentation
  • Jordi Pons & Xavier Serra (May, 2018). Randomly weighted CNNs for (music) audio classification. In (ICASSP2019).
    [arXiv, code, slides]
  • Dario Rethage, Jordi Pons & Xavier Serra (2018, April). A Wavenet for Speech Denoising. In (ICASSP2018).
    [arXiv, code, audioExamples] – Oral presentation
  • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik M. Schmidt, Andreas F. Ehmann & Xavier Serra (2017, December). End-to-end learning for music audio tagging at scale. Presented at the Workshop on Machine Learning for Audio Signal Processing (ML4Audio) at NIPS 2017, and in proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR2018).
    [code, demo, abstract, arXiv] – Best student paper award
  • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel PW Ellis, Xavier Favory, Jordi Pons & Xavier Serra (2018, October). General-purpose tagging of Freesound audio with Audioset labels: Task description, dataset, and baseline. In Detection and Classification of Acoustic Scenes and Events Workshop (DCASE2018).
  • Jordi Pons, Rong Gong & Xavier Serra (2017, October). Score-informed syllable segmentation for a capella singing voice with convolutional neural networks. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [code, data, arXiv, ISMIR, MTG]
  • Rong Gong, Jordi Pons & Xavier Serra (2017,October). Audio to score matching by combining phonetic and duration information. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [code, data, arXiv, ISMIR, MTG]
  • Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter & Xavier Serra (2017, October). Freesound Datasets: A platform for the creation of open audio datasets. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [GitHub, platform, ISMIR, MTG]
  • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez & Xavier Serra (2017, September). Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. In 25th European Signal Processing Conference (EUSIPCO2017).
    [code1, code2, code3, arXiv, MTG]Oral presentation
  • Jordi Pons & Xavier Serra (2017, March). Designing efficient architectures for modeling temporal features with convolutional neural networks. In (ICASSP2017). Publisher: IEEE.
    [code, MTG, paper]
  • Jordi Pons, Thomas Lidy & Xavier Serra (2016, June). Experimenting with musically motivated convolutional neural networks. In 14th International Workshop on Content-Based Multimedia Indexing (CBMI2016).
    [code, MTG, paper, IEEE] – Best paper award
  • Axel Roebel, Jordi Pons, Marco Liuni & Mathieu Lagrange (2015, April). On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence. In IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP2015) on pp. 414-418.


  • Doctoral thesis (2019): Deep neural networks for music and audio tagging. Supervised by Xavier Serra (Music Technology Group – UPF).
    [PDF, open-source music tagging system]
  • Master thesis (2015): Music remixing using source separation to improve cochlear implant users music perception. Supervised by: Waldo Nogueira (German Hearing Center – MHH) and Jordi Janer (Music Technology Group – UPF).
    [code1, code2, PDF, MTG]
  • Undergraduate thesis (2014): Automatic Drums Transcription for polyphonic music using Non-Negative Matrix Factor Deconvolution. Supervised by: Antonio Bonafonte (UPC) and Axel Roebel (IRCAM).