In this paper we present DAG: a full-band (48kHz) waveform synthesizer based on diffusion-based generative modeling! And style transfer comes for free.. check out our demo! This is great work led by Santi.
I’m very proud of our recent work, because by simply improving the loss (keeping the same model and dataset) we obtain an improvement of 1.4 dB SI-SNRi! 1 dB in source separation is a lot, and is perceptually noticeable. This is great work led by Emilian, who worked with us as an intern during the summer of 2022.
In this work we propose to consider the task of speech enhancement as a holistic endeavor, and present a universal speech enhancement system that tackles 55 different distortions at the same time. Our approach consists of a generative model that employs score-based diffusion. We show that this approach significantly outperforms the state of the art in a subjective test performed by expert listeners.
My biggest learning this year: I’LL NOT SURVIVE ANOTHER ONLINE CONFERENCE 💔 I really miss in-person discussion in exotic places! This year I attended ICASSP to present two papers:
- “On Loss Functions and Evaluation Metrics for Music Source Separation” by Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà [Zenodo, arXiv].
- “PixInWav: Residual Steganography for Hiding Pixels in Audio” by Margarita Geleta, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, Xavier Giro-i-Nieto [arXiv].