Configuration

Overview

Amphion can be configured through both Python code and configuration files. This guide explains all available configuration options.

Basic Configuration

Environment Variables

# Set CUDA device
export AMPHION_DEVICE="cuda:0"

# Set model directory
export AMPHION_MODEL_DIR="/path/to/models"

# Set cache directory
export AMPHION_CACHE_DIR="/path/to/cache"

Python Configuration

from amphion.config import Config

config = Config(
    device="cuda:0",
    model_dir="/path/to/models",
    cache_dir="/path/to/cache",
    sample_rate=44100,
    batch_size=32
)

Model Configuration

Text-to-Speech

tts_config = Config(
    model_name="fastspeech2",
    vocoder="hifigan",
    speaker_embedding=True,
    prosody_modeling=True
)

Voice Conversion

vc_config = Config(
    model_name="contentvec",
    speaker_embedding="dvec",
    conversion_mode="zero-shot"
)

Text-to-Audio

t2a_config = Config(
    model_name="musicgen",
    sampling_rate=44100,
    audio_length=10.0
)

Advanced Configuration

Custom Model Paths

config = Config(
    model_paths={
        "tts": "/path/to/tts/model",
        "vocoder": "/path/to/vocoder",
        "speaker_encoder": "/path/to/speaker/encoder"
    }
)

Performance Tuning

config = Config(
    batch_size=32,
    num_workers=4,
    pin_memory=True,
    mixed_precision=True
)

Logging Configuration

config = Config(
    log_level="INFO",
    log_file="/path/to/log/file.log",
    enable_tensorboard=True
)

Configuration File

You can also use a YAML configuration file:

# config.yaml
device: cuda:0
model_dir: /path/to/models
cache_dir: /path/to/cache

tts:
  model_name: fastspeech2
  vocoder: hifigan
  speaker_embedding: true

voice_conversion:
  model_name: contentvec
  conversion_mode: zero-shot

logging:
  level: INFO
  file: /path/to/log/file.log

Load the configuration file in Python:

from amphion.config import Config

config = Config.from_yaml("config.yaml")