Conference and workshop papers:

  • Jordi Pons, Xiaoyu Liu, Santiago Pascual, Joan Serra. GASS: Generalizing Audio Source Separation with Large-scale Data. In  (ICASSP2024).
  • Joan Serrà, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi Pons, Jeroen Breebaart, Giulio Cengarle. Mono-to-stereo through parametric stereo generation. 24th International Society for Music Information Retrieval Conference (ISMIR2023).
  • Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley. CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2023).
    [arXiv, demo]
  • Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà. Full-band general audio synthesis with score-based diffusion. In  (ICASSP2023).
    [arXiv, demo]
  • Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà. Adversarial Permutation Invariant Training for Universal Sound Separation. In  (ICASSP2023).
    [arXiv, demo]
  • Jaume Ros, Margarita Geleta, Jordi Pons, Xavier Giro-i-Nieto. Towards robust image-in-audio deep steganography. In WiCV@CVPR2023.
  • Nicolás Schmidt, Jordi Pons, Marius Miron. PodcastMix: A dataset for separating music and speech in podcasts. In 23th Annual Conference of the International Speech Communication Association (INTERSPEECH2022).
    [arXiv, dataset]
  • Margarita Geleta, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, Xavier Giro-i-Nieto (June 2021). PixInWav: Residual Steganography for Hiding Pixels in Audio. In (ICASSP2022).
  • Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà. On Loss Functions and Evaluation Metrics for Music Source Separation. In (ICASSP2022).
    [Zenodo, arXiv]
  • Santiago Pascual, Joan Serrà, Jordi Pons (July 2021). Adversarial auto-encoding for packet loss concealment. In WASPAA 2021.
  • Xiaoyu Liu, Jordi Pons (June 2021). On permutation invariant training for speech source separation. In (ICASSP2021).
  • Daniel Arteaga, Jordi Pons (June 2021). Multichannel-based learning for audio object extraction. In (ICASSP2021).
  • Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà (June 2021). Upsampling artifacts in neural audio synthesis. In (ICASSP2021).
    [arXiv, code]
  • Christian J Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà (June 2021). Automatic multitrack mixing with a differentiable mixing console of neural audio effects. In (ICASSP2021).[arXiv, demo]
  • Joan Serrà, Jordi Pons, Santiago Pascual (June 2021). SESQA: semi-supervised learning for speech quality assessment. In (ICASSP2021).
  • Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons & Xavier Serra (May, 2020). TensorFlow audio models in Essentia. In (ICASSP2020).
    [arXiv, code & demos]
  • Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar (May, 2020). An empirical study of Conv-TasNet. In (ICASSP2020).
  • Jordi Pons & Xavier Serra (November, 2019). musicnn: pre-trained convolutional neural networks for music audio tagging. In Late breaking/demo session of the 20th International Society for Music Information Retrieval Conference (LBD-ISMIR2019).
    [arXiv, code, demo]
  • Francesc Lluís, Jordi Pons & Xavier Serra (September, 2019). End-to-end music source separation: is it possible in the waveform domain? In 20th Annual Conference of the International Speech Communication Association (INTERSPEECH2019).
    [arXiv, code, demo]
  • Jordi Pons, Joan Serrà & Xavier Serra (October, 2018). Training neural audio classifiers with few data. In (ICASSP2019).
    [arXiv, code] – Oral presentation
  • Jordi Pons & Xavier Serra (May, 2018). Randomly weighted CNNs for (music) audio classification. In (ICASSP2019).
    [arXiv, code, slides]
  • Dario Rethage, Jordi Pons & Xavier Serra (2018, April). A Wavenet for Speech Denoising. In (ICASSP2018).
    [arXiv, code, audioExamples] – Oral presentation
  • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik M. Schmidt, Andreas F. Ehmann & Xavier Serra (2017, December). End-to-end learning for music audio tagging at scale. Presented at the Workshop on Machine Learning for Audio Signal Processing (ML4Audio) at NIPS 2017, and in proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR2018).
    [code, demo, abstract, arXiv] – Best student paper award
  • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel PW Ellis, Xavier Favory, Jordi Pons & Xavier Serra (2018, October). General-purpose tagging of Freesound audio with Audioset labels: Task description, dataset, and baseline. In Detection and Classification of Acoustic Scenes and Events Workshop (DCASE2018).
  • Jordi Pons, Rong Gong & Xavier Serra (2017, October). Score-informed syllable segmentation for a capella singing voice with convolutional neural networks. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [code, data, arXiv, ISMIR, MTG]
  • Rong Gong, Jordi Pons & Xavier Serra (2017,October). Audio to score matching by combining phonetic and duration information. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [code, data, arXiv, ISMIR, MTG]
  • Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter & Xavier Serra (2017, October). Freesound Datasets: A platform for the creation of open audio datasets. In 18th International Society for Music Information Retrieval Conference (ISMIR2017).
    [GitHub, platform, ISMIR, MTG]
  • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez & Xavier Serra (2017, September). Timbre Analysis of Music Audio Signals with Convolutional Neural Networks. In 25th European Signal Processing Conference (EUSIPCO2017).
    [code1, code2, code3, arXiv, MTG]Oral presentation
  • Jordi Pons & Xavier Serra (2017, March). Designing efficient architectures for modeling temporal features with convolutional neural networks. In (ICASSP2017). Publisher: IEEE.
    [code, MTG, paper]
  • Jordi Pons, Thomas Lidy & Xavier Serra (2016, June). Experimenting with musically motivated convolutional neural networks. In 14th International Workshop on Content-Based Multimedia Indexing (CBMI2016).
    [code, MTG, paper, IEEE] – Best paper award
  • Axel Roebel, Jordi Pons, Marco Liuni & Mathieu Lagrange (2015, April). On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence. In IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP2015) on pp. 414-418.

Journal articles:

  • Jordi Pons, Jordi Janer, Thilo Rode & Waldo Nogueira (2016, December). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. Journal of the Acoustical Society of America, vol. 140, no 6, p. 4338-4349.
    [code1, code2, ASA]
  • Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra (October 2020). FSD50k: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30, 829-852.
    [arXiv, dataset]


  • Doctoral thesis (2019): Deep neural networks for music and audio tagging. Supervised by Xavier Serra (Music Technology Group – UPF).
    [PDF, open-source music tagging system]
  • Master thesis (2015): Music remixing using source separation to improve cochlear implant users music perception. Supervised by: Waldo Nogueira (German Hearing Center – MHH) and Jordi Janer (Music Technology Group – UPF).
    [code1, code2, PDF, MTG]
  • Undergraduate thesis (2014): Automatic Drums Transcription for polyphonic music using Non-Negative Matrix Factor Deconvolution. Supervised by: Antonio Bonafonte (UPC) and Axel Roebel (IRCAM).