Work in progress:

  • Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra (October 2020). FSD50k: an open dataset of human-labeled sound events. arXiv preprint.
    [arXiv, dataset]

Conference and workshop papers (peer-reviewed):

  • Margarita Geleta, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, Xavier Giro-i-Nieto (June 2021). PixInWav: Residual Steganography for Hiding Pixels in Audio. In (ICASSP2022).
  • Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà. On Loss Functions and Evaluation Metrics for Music Source Separation. In (ICASSP2022).
    [Zenodo, arXiv]
  • Santiago Pascual, Joan Serrà, Jordi Pons (July 2021). Adversarial auto-encoding for packet loss concealment. In WASPAA 2021.
  • Xiaoyu Liu, Jordi Pons (June 2021). On permutation invariant training for speech source separation. In (ICASSP2021).
  • Daniel Arteaga, Jordi Pons (June 2021). Multichannel-based learning for audio object extraction. In (ICASSP2021).
  • Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà (June 2021). Upsampling artifacts in neural audio synthesis. In (ICASSP2021).
    [arXiv, code]
  • Christian J Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà (June 2021). Automatic multitrack mixing with a differentiable mixing console of neural audio effects. In (ICASSP2021).[arXiv, demo]
  • Joan Serrà, Jordi Pons, Santiago Pascual (June 2021). SESQA: semi-supervised learning for speech quality assessment. In (ICASSP2021).
  • Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons & Xavier Serra (May, 2020). TensorFlow audio models in Essentia. In (ICASSP2020).
    [arXiv, code & demos]
  • Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar (May, 2020). An empirical study of Conv-TasNet. In (ICASSP2020).
  • Jordi Pons & Xavier Serra (November, 2019). musicnn: pre-trained convolutional neural networks for music audio tagging. In Late breaking/demo session of the 20th International Society for Music Information Retrieval Conference (LBD-ISMIR2019).
    [arXiv, code, demo]
  • Francesc Lluís, Jordi Pons & Xavier Serra (September, 2019). End-to-end music source separation: is it possible in the waveform domain? In 20th Annual Conference of the International Speech Communication Association (INTERSPEECH2019).
    [arXiv, code, demo]
  • Jordi Pons, Joan Serrà & Xavier Serra (October, 2018). Training neural audio classifiers with few data. In (ICASSP2019).
    [arXiv, code] – Oral presentation
  • Jordi Pons & Xavier Serra (May, 2018). Randomly weighted CNNs for (music) audio classification. In (ICASSP2019).
    [arXiv, code, slides]
  • Dario Rethage, Jordi Pons & Xavier Serra (2018, April). A Wavenet for Speech Denoising. In (ICASSP2018).
    [arXiv, code, audioExamples] – Oral presentation
  • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik M. Schmidt, Andreas F. Ehmann & Xavier Serra (2017, December). End-to-end learning for music audio tagging at scale. Presented at the Workshop on Machine Learning for Audio Signal Processing (ML4Audio) at NIPS 2017, and in proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR2018).
    [code, demo, abstract, arXiv] – Best student paper award
  • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel PW Ellis, Xavier Favory, Jordi Pons & Xavier Serra (2018, October). General-purpose tagging of Freesound audio with Audioset labels: Task description, dataset, and baseline. In Detection and Classification of Acoustic Scenes and Events Workshop (DCASE2018).
  • Jordi Pons, Rong Gong & Xavier Serra (2017, October). Score-informed syllable segmentation for a capella singing voice with convolutional neural networks. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [code, data, arXiv, ISMIR, MTG]
  • Rong Gong, Jordi Pons & Xavier Serra (2017,October). Audio to score matching by combining phonetic and duration information. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [code, data, arXiv, ISMIR, MTG]
  • Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter & Xavier Serra (2017, October). Freesound Datasets: A platform for the creation of open audio datasets. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [GitHub, platform, ISMIR, MTG]
  • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez & Xavier Serra (2017, September). Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. In 25th European Signal Processing Conference (EUSIPCO2017).
    [code1, code2, code3, arXiv, MTG]Oral presentation
  • Jordi Pons & Xavier Serra (2017, March). Designing efficient architectures for modeling temporal features with convolutional neural networks. In (ICASSP2017). Publisher: IEEE.
    [code, MTG, paper]
  • Jordi Pons, Thomas Lidy & Xavier Serra (2016, June). Experimenting with musically motivated convolutional neural networks. In 14th International Workshop on Content-Based Multimedia Indexing (CBMI2016).
    [code, MTG, paper, IEEE] – Best paper award
  • Axel Roebel, Jordi Pons, Marco Liuni & Mathieu Lagrange (2015, April). On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence. In IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP2015) on pp. 414-418.

Journal article:

  • Jordi Pons, Jordi Janer, Thilo Rode & Waldo Nogueira (2016, December). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. Journal of the Acoustical Society of America, vol. 140, no 6, p. 4338-4349.
    [code1, code2, ASA]


  • Doctoral thesis (2019): Deep neural networks for music and audio tagging. Supervised by Xavier Serra (Music Technology Group – UPF).
    [PDF, open-source music tagging system]
  • Master thesis (2015): Music remixing using source separation to improve cochlear implant users music perception. Supervised by: Waldo Nogueira (German Hearing Center – MHH) and Jordi Janer (Music Technology Group – UPF).
    [code1, code2, PDF, MTG]
  • Undergraduate thesis (2014): Automatic Drums Transcription for polyphonic music using Non-Negative Matrix Factor Deconvolution. Supervised by: Antonio Bonafonte (UPC) and Axel Roebel (IRCAM).