Today we release “PodcastMix”, a dataset for separating music and speech in podcasts:

My biggest learning this year: I’LL NOT SURVIVE ANOTHER ONLINE CONFERENCE 💔 I really miss in-person discussion in exotic places! This year I attended ICASSP to present two papers:
During his internship at Dolby, Enric run an exhaustive evaluation of various loss functions for music source separation. After evaluating those losses objectively and subjectively, we recommend training with the following spectrogram-based losses: L2freq, SISDRfreq, LOGL2freq or LOGL1freq with, potentially, phase- sensitive objectives and adversarial regularizers.
These are the papers we will be presenting at ICASSP 2021! Infinite thanks to all my collaborators for the amazing work 🙂
We investigated various upsampling layers to consolidate the ideas we introduced in our previous paper. We benchmarked a large set of upsampling layers for music source separation: different transposed and subpixel convolution setups, different interpolation upsamplers (including two novel layers based on stretch and sinc interpolation), and different wavelet-based upsamplers (including a novel learnable wavelet layer).
Check our project website, and paper on arXiv!