Configuration

Overview

Amphion can be configured through both Python code and configuration files. This guide explains all available configuration options.

Basic Configuration

Environment Variables

# Set CUDA device export AMPHION_DEVICE="cuda:0" # Set model directory export AMPHION_MODEL_DIR="/path/to/models" # Set cache directory export AMPHION_CACHE_DIR="/path/to/cache"

Python Configuration

from amphion.config import Config config = Config( device="cuda:0", model_dir="/path/to/models", cache_dir="/path/to/cache", sample_rate=44100, batch_size=32 )

Model Configuration

Text-to-Speech

tts_config = Config( model_name="fastspeech2", vocoder="hifigan", speaker_embedding=True, prosody_modeling=True )

Voice Conversion

vc_config = Config( model_name="contentvec", speaker_embedding="dvec", conversion_mode="zero-shot" )

Text-to-Audio

t2a_config = Config( model_name="musicgen", sampling_rate=44100, audio_length=10.0 )

Advanced Configuration

Custom Model Paths

config = Config( model_paths={ "tts": "/path/to/tts/model", "vocoder": "/path/to/vocoder", "speaker_encoder": "/path/to/speaker/encoder" } )

Performance Tuning

config = Config( batch_size=32, num_workers=4, pin_memory=True, mixed_precision=True )

Logging Configuration

config = Config( log_level="INFO", log_file="/path/to/log/file.log", enable_tensorboard=True )

Configuration File

You can also use a YAML configuration file:

# config.yaml device: cuda:0 model_dir: /path/to/models cache_dir: /path/to/cache tts: model_name: fastspeech2 vocoder: hifigan speaker_embedding: true voice_conversion: model_name: contentvec conversion_mode: zero-shot logging: level: INFO file: /path/to/log/file.log

Load the configuration file in Python:

from amphion.config import Config config = Config.from_yaml("config.yaml")