Estimating pitch in polyphonic music

Any python libraries to estimate the main melody’s pitch from polyphonic music?

Continue reading

Preprint: “Stable Audio Open”

Stable Audio Open paper is finally out. Our aim is to further improve current best practices for open model releases, with an emphasis on evaluation, data transparency, and accessibility for artists and scholars. Stable Audio Open allows for high-quality stereo sound synthesis at 44.1kHz, compatible with consumer-grade GPUs. The model was trained solely on Creative Commons licensed audio. Model weights, code, and data attributions are publicly available.

Link to the arXiv and to the HuggingFace.

Presenting at +RAIN Film Festival and Sonar+D

At this year Sonar’s week, we presented Stable Audio 2.0 and Stable Audio Open. At the +Rain Film Festival, we presented the potential of Stability AI tools in film and cinema, showing how they can enhance current sound design and audiovisual production tools. During Sonar+D, we engaged with musicians and the general audience, presenting our technology for music creation, from composition to sound design.

Stable Audio Open model release

Stable Audio Open WEIGHTS are out in the HuggingFace! What’s in there? Text-to-audio diffusion, T5-based text conditioning, diffusion transformer architecture (DiT), 47s long stereo generations at 44.1kHz, pre-trained on sfx/samples from FreeSound and FMA, good for making samples for your music and to fine-tune it.

The paper is on the way, but here’s a sneak peek at the results. Stable Audio Open achieves the best FD results among competitors in generating sound effects. FD is used by researchers to benchmark the realism of the generations.