Preprint: “Universal speech enhancement with score-based diffusion”

In this work we propose to consider the task of speech enhancement as a holistic endeavor, and present a universal speech enhancement system that tackles 55 different distortions at the same time. Our approach consists of a generative model that employs score-based diffusion. We show that this approach significantly outperforms the state of the art in a subjective test performed by expert listeners.

Check our project website, and paper on arXiv!

Continue reading

ICASSP 2022 – my learnings

My biggest learning this year: I’LL NOT SURVIVE ANOTHER ONLINE CONFERENCE 💔 I really miss in-person discussion in exotic places! This year I attended ICASSP to present two papers:

  • “On Loss Functions and Evaluation Metrics for Music Source Separation” by Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà [ZenodoarXiv].
  • “PixInWav: Residual Steganography for Hiding Pixels in Audio” by Margarita Geleta, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, Xavier Giro-i-Nieto [arXiv].

Continue reading

ICASSP 2022 paper: “On loss functions and evaluation metrics for music source separation”

During his internship at Dolby, Enric run an exhaustive evaluation of various loss functions for music source separation. After evaluating those losses objectively and subjectively, we recommend training with the following spectrogram-based losses: L2freq, SISDRfreq, LOGL2freq or LOGL1freq with, potentially, phase- sensitive objectives and adversarial regularizers.

link to arXiv!

Preprint: Upsampling layers for music source separation

We investigated various upsampling layers to consolidate the ideas we introduced in our previous paper. We benchmarked a large set of upsampling layers for music source separation: different transposed and subpixel convolution setups, different interpolation upsamplers (including two novel layers based on stretch and sinc interpolation), and different wavelet-based upsamplers (including a novel learnable wavelet layer).

Check our project website, and paper on arXiv!