Vocoders

Overview

Neural vocoders generate audible waveforms from acoustic representations, which is one of the key parts for current audio generation systems. Amphion supports various widely-used vocoders:

GAN-based Vocoders

MelGAN: Fast and lightweight vocoder using generative adversarial networks
HiFi-GAN: High-fidelity speech synthesis with adversarial learning
NSF-HiFiGAN: Neural source-filter model combined with HiFi-GAN
BigVGAN: Enhanced vocoder with better voice quality
APNet: Advanced parallel network for efficient waveform generation

Flow-based Vocoders

WaveGlow: Flow-based network capable of generating high quality speech
More flow-based models coming soon...

Diffusion-based Vocoders

Diffwave: High-quality vocoder using diffusion probabilistic models
More diffusion-based models in development...

Auto-regressive Vocoders

WaveNet: Deep generative model for raw audio waveforms
WaveRNN: Efficient neural autoregressive vocoder
Additional models under development...

Usage Example

from amphion.vocoders import HiFiGAN

# Initialize vocoder
vocoder = HiFiGAN(
    checkpoint="path/to/checkpoint",
    device="cuda"
)

# Generate waveform from mel-spectrogram
waveform = vocoder.generate(mel_spectrogram)

Model Configuration

from amphion.config import Config

config = Config(
    model_type="hifigan",
    sample_rate=44100,
    hop_length=256,
    # Model specific configurations
    upsample_rates=[8, 8, 2, 2],
    upsample_kernel_sizes=[16, 16, 4, 4],
    upsample_initial_channel=512,
    resblock_kernel_sizes=[3, 7, 11],
    resblock_dilation_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5]]
)

Training Your Own Vocoder

from amphion.vocoders import VocoderTrainer

trainer = VocoderTrainer(
    model_type="hifigan",
    config=config,
    training_data="path/to/data",
    validation_data="path/to/val_data"
)

trainer.train(
    epochs=1000,
    batch_size=16,
    save_dir="path/to/save"
)

Vocoders

Overview

GAN-based Vocoders

Flow-based Vocoders

Diffusion-based Vocoders

Auto-regressive Vocoders

Usage Example

Model Configuration

Training Your Own Vocoder

On This Page