PLAAE (packet loss adversarial auto-encoder) is our proposal for packet loss concealment in a non-autoregressive fashion. Our goal is to reconstruct missing speech packets until a new (real) packet is received in a video-call. Our end-to-end non-autoregressive adversarial auto-encoder specially shines at long-term predictions, beyond 60ms. The paper has been accepted for presentation at WASPAA 2021! Check out our arXiv pre-print.
While I never had the chance of attending to the Web Audio Conference (WAC), I have followed with great interest the recent developments of the web audio API. But.. this time I couldn’t resist to go – since Dolby is the main sponsor of the event, and the conference is organised by friends in my city!
As a personal curiosity, the first WAC (back in 2015!) was being organized by IRCAM when I was a research intern there – and I remember having this feeling that audio processing in the browser was going to be THE THING. Five years later, we start to see the social impact of the ideas that were introduced back then.Continue reading
Actually, what I really need is less papers with “all you need” in the title – and to share a (non-virtual) beer with you folks!! Here some of the papers I enjoyed, together with the papers we presented. You’ll see that I don’t include classification/tagging papers, I guess I need a break from my PhD topic 🙂 Enjoy!Continue reading
On Thursday 13th May from 17:00 – 19:00 (CET) I’ll be part of the workshop ‘Exploring connections between AI and Music’. The live-streamed event is free to watch, and marks the presentation of the AI and Music Festival and its first activity (more information here). To prepare for it, I reviewed previous works by music AI artists and researchers. This slide deck contains a summary of how I perceive the current music AI scene.
These are the papers we will be presenting at ICASSP 2021:
- Xiaoyu Liu, Jordi Pons. On permutation invariant training for speech source separation. [arxiv]
- Daniel Arteaga, Jordi Pons. Multichannel-based learning for audio object extraction. [arxiv]
- Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà. Upsampling artifacts in neural audio synthesis. [arXiv, code]
- Christian J Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà. Automatic multitrack mixing with a differentiable mixing console of neural audio effects. [arXiv, demo]
- Joan Serrà, Jordi Pons, Santiago Pascual. SESQA: semi-supervised learning for speech quality assessment. [arXiv]
Infinite thanks to all my collaborators for the amazing work 🙂