Preprint: “Fast Timing-Conditioned Latent Audio Diffusion”

The Stable Audio paper is finally out. In this work I’ve been mostly focusing on the evaluation of the model. With those metrics you can now evaluate long-form, full-band, and variable-length music and audio generations. Previous work focused on evaluating short-form, 16kHz music and audio. The results in our perceptual study show that Stable Audio is competitive, specially in terms of audio quality. We also assessed musicality, stereo correctness, and musical structure. Stable Audio is able to consistently generate music with structure!

Check its the model and evaluation code.
Also check it on arXiv and its demo!