Our ICASSP paper studying permutation ambiguity on speaker-independent source separation models is now accessible on arXiv:Continue reading
These are the papers we will be presenting at ICASSP 2021:
- Xiaoyu Liu, Jordi Pons. On permutation invariant training for speech source separation. [arxiv]
- Daniel Arteaga, Jordi Pons. Multichannel-based learning for audio object extraction. [arxiv]
- Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà. Upsampling artifacts in neural audio synthesis. [arXiv, code]
- Christian J Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà. Automatic multitrack mixing with a differentiable mixing console of neural audio effects. [arXiv, demo]
- Joan Serrà, Jordi Pons, Santiago Pascual. SESQA: semi-supervised learning for speech quality assessment. [arXiv]
Infinite thanks to all my collaborators for the amazing work 🙂
Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia that allow predictions with pre-trained deep learning models — and some of those are based on musicnn!Continue reading
I’m happy to share the highlights of my first paper with Dolby! We will be presenting this work at ICASSP 2020, in Barcelona.
Several improvements have been proposed to Conv-TasNet – that mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. We propose a (deep) non-linear variant of it, that is based on a deep stack of small filters. With this change, we can improve 0.6-0.9 dB SI-SNRi.
During the last summer, I have been a research intern at Telefónica Research (Barcelona). This article is the outcome of this short (but intense!) collaboration with Joan Serrà, where we explore how to train deep learning models with just 1, 2 or 10 audios per class. Check it out on arXiv, and reproduce our results running our code!