Preprint: “Fast Timing-Conditioned Latent Audio Diffusion”

The Stable Audio paper is finally out. In this work I’ve been mostly focusing on the evaluation of the model. With those metrics you can now evaluate long-form, full-band, and variable-length music and audio generations. Previous work focused on evaluating short-form, 16kHz music and audio. The results in our perceptual study show that Stable Audio is competitive, specially in terms of audio quality. We also assessed musicality, stereo correctness, and musical structure. Stable Audio is able to consistently generate music with structure!

Check its the model and evaluation code.
Also check it on arXiv and its demo!

On Prompting Stable Audio

Stable Audio allows you creating custom-length audio just by describing it. It is powered by a generative audio model based on diffusion. You can generate and download audio in 44.1 kHz stereo. You also have a nice interface, no need to be a hacker! And the audio you create can be used in your commercial projects. I’ve been experimenting with it during the last weeks, and here some ideas on how to use it!

Continue reading