This paper summarises the latest work I did at Dolby. We study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.
Stable Audio allows you creating custom-length audio just by describing it. It is powered by a generative audio model based on diffusion. You can generate and download audio in 44.1 kHz stereo. You also have a nice interface, no need to be a hacker! And the audio you create can be used in your commercial projects. I’ve been experimenting with it during the last weeks, and here some ideas on how to use it!
In this work lead by Joan, we explore the use of generative models that operate on top of a parametric stereo domain to generate plausible stereo samples from mono audio signals.
Check it on arXiv!