On Prompting Stable Audio7 min read

Stable Audio allows you creating custom-length audio just by describing it. It is powered by a generative audio model based on diffusion. You can generate and download audio in 44.1 kHz stereo. You also have a nice interface, no need to be a hacker! And the audio you create can be used in your commercial projects. I’ve been experimenting with it during the last weeks, and here some ideas on how to use it!

Basic music generation

Follow this prompt structure: genre tags, instrument tags, mood tags, BPM. And set the duration between 1′ to 1’30”. See an example:

  • Prompt: Tropical House, Electric Guitar, Bass Guitar, Keyboards, Congas, Feel-Good, Uplifting, Beach Vibes, Danceable, Groovy. 130 BPM. (20” duration)

You can also prompt any text, not necessarily following the above prompt structure:

  • Prompt: Cinematic. In the dimly lit forest, a hauntingly beautiful melody from a solo violin weaves through the silence, as leaves rustle and distant whispers add an eerie sense of mystery to the scene. (1’30” duration)

Music generation: workflow ideas and tips

Let’s start with a simple idea.

  • Prompt: Blues, with horns. (20” duration)

Because part of the dataset contains MIDI instrumentation, some generations can sound like MIDI instruments.

Try appending Live or Band to get more organic sounds. Another option is to simply generate another song with the same prompts, until you get something you like!

  • Prompt: Blues, with horns. Live. (20” duration)

I really love the craziness of this hectic live recording that never happened.

  • Or try: Blues, with horns. Band. (20” duration)

Sometimes, generated songs lack musical direction. Including Solo is an interesting way to bring musicality in.

  • Prompt: Blues, with horns. Band. Horns solo. (20” duration)

We can also control the length of the generation (up to 1′ 30” with the professional subscription). Allowing longer generations provides more time to express musicality, resulting with more melodic generations:

  • Prompt: Blues, with horns. Band. Horns solo. (45” duration)

And with even longer generations we can get structure: intro, development, and outro.

  • Prompt: Blues, with horns. Band. Horns solo. (1’30” duration)

This last one introduced a melodic idea, and jams around it.

Don’t be shy, and generate a few examples for each prompt! Part of your work as a prompt artist is to curate. Generate some, select the best, and enjoy the process 🙂

Basic sound effects generation

For sound effects generation, I recommend exploring short durations.

  • Prompt: Glass shatter. (4” duration)
  • Prompt: One shot. Gun. Long reverb. (3” duration)
  • Prompt: Walking on broken glass. (15” duration)
  • Prompt: Walking in long grass. (15” duration)
  • Prompt: Leaves rustling. (10” duration)

Sound effects: another workflow and tips

For sound effects, one might want to generate shorter audios:

  • Prompt: Fire crackling. (10” duration)

Because of a large portion of the dataset is at low sampling rates, some generations can sound low-fi. To improve on audio fidelity you can explore with 44.1kHz and high-quality.

  • Prompt: Fire crackling. 44.1kHz. (10” duration)
  • Prompt: Fire crackling. High-quality. (10” duration)

Also consider using stereo to improve the spatial image.

  • Prompt: Motorbike passing by. High-quality. Stereo. (10” duration)
  • Prompt: Sports car passing by. High-quality. Stereo. (4” duration)

Combining audio and music ideas

Prompt sounds plus music information, like:

  • Prompt: Hammering wood at 120 BPM. (1′ 30” duration)
  • Prompt: Hammering concrete at 120 BPM. (10” duration)
  • Prompt: Mouse clicking at 130 BPM. (1’30” duration)

Above generations could be an interesting starting point for an urban/industrial music loop.