Meta has open sourced its text-to-music generative AI, AudioCraft, for researchers and practitioners to train their own models and help advance the state of the art.
AudioCraft is comprised of three different models: MusicGen, able to generate music from textual prompts; AudioGen, able to generate environmental sounds; and EnCodec, an AI-powered encoder/quantizer/decoder.
Today, we’re excited to release an improved version of our EnCodec decoder, which allows for higher quality music generation with fewer artifacts; our pre-trained AudioGen model, which lets you generate environmental sounds and sound effects like a dog barking, cars honking, or footsteps on a wooden floor; and all of the AudioCraft model weights and code
According to Meta, AudioCraft is able to generate high-quality audio using a natural interface. Furthermore, they say, it simplifies the state-of-the-art design in the audio generation field through a novel approach.
In particular, they explain, AudioCraft uses EnCodec neural audio codec to learn audio tokens from the raw signal. This step builds up a fixed "vocabulary" of music samples (audio tokens) which are then fed to an autoregressive language model. This model trains a new audio language model leveraging the tokens' internal structure to capture their long-term dependencies, which is critical for music generation. The new model is finally used to generate new tokens based on a textual description which are fed back to EnCodec's decoder to synthesize sounds and music.
Generating high-fidelity audio of any kind requires modeling complex signals and patterns at varying scales. Music is arguably the most challenging type of audio to generate because it’s composed of local and long-range patterns, from a suite of notes to a global musical structure with multiple instruments.
As mentioned, AudioCraft is open source, which Meta hopes can help the research community to further build on it:
Having a solid open source foundation will foster innovation and complement the way we produce and listen to audio and music in the future: think rich bedtime story readings with sound effects and epic music. With even more controls, we think MusicGen can turn into a new type of instrument — just like synthesizers when they first appeared.
While most of AudioCraft is open source, the license they chose for the model weights, CC-BY-NC, is sufficiently restrictive to not qualify as fully open source, Hacker News commenters pointed out.
Specifically, the non commercial use clause defeats point six in the Open Source Initiative definition for Open Source, which is likely explained by the fact that Meta used Meta-owned and specifically licensed music to calculate those weights. The rest of the components are instead released under the MIT license.