Amphion Text-to-Audio (TTA)
Overview
Amphion's Text-to-Audio system generates realistic audio from textual descriptions. It can create various sounds, from simple effects to complex soundscapes.
Features
- Natural sound generation
- Music generation
- Sound effect synthesis
- Conditional generation
- Length control
Quick Start
We provide a beginner recipe to demonstrate how to train a cutting edge TTA model. Specifically, it is designed as a latent diffusion model like AudioLDM, Make-an-Audio, and AUDIT.
Model Architecture
Until now, Amphion has supported a latent diffusion based text-to-audio model:
Similar to AUDIT, we implement it in two-stage training:
- Training the VAE which is called in Amphion
AutoencoderKL
- Training the conditional latent diffusion model which is called in Amphion
AudioLDM
Basic Usage
from amphion import TextToAudio # Initialize generator generator = TextToAudio() # Generate audio from description audio = generator.generate( "A calm forest ambience with birds chirping" ) audio.save("forest.wav")
Advanced Features
Conditional Generation
# Generate with style control audio = generator.generate( text="A melodic piano piece", style="classical", tempo=120 ) # Generate with multiple conditions audio = generator.generate( text="Electronic dance music", style="edm", bpm=128, key="C minor" )
Length and Quality Control
# Control generation length audio = generator.generate( text="Background music", duration=30.0, sample_rate=44100 ) # Control quality settings audio = generator.generate( text="High-quality orchestral music", quality="high", enhance=True )
Configuration
from amphion.config import Config config = Config( model_name="audioldm", sample_rate=44100, channels=2, audio_length=10.0 ) generator = TextToAudio(config)
Best Practices
-
Text Descriptions
- Be specific and detailed
- Use clear language
- Include key audio characteristics
-
Quality Control
- Set appropriate sample rate
- Consider stereo vs. mono
- Monitor generation length
-
Performance
- Use batch processing
- GPU acceleration
- Memory management