Vocoders
Overview
Neural vocoders generate audible waveforms from acoustic representations, which is one of the key parts for current audio generation systems. Amphion supports various widely-used vocoders:
GAN-based Vocoders
- MelGAN: Fast and lightweight vocoder using generative adversarial networks
- HiFi-GAN: High-fidelity speech synthesis with adversarial learning
- NSF-HiFiGAN: Neural source-filter model combined with HiFi-GAN
- BigVGAN: Enhanced vocoder with better voice quality
- APNet: Advanced parallel network for efficient waveform generation
Flow-based Vocoders
- WaveGlow: Flow-based network capable of generating high quality speech
- More flow-based models coming soon...
Diffusion-based Vocoders
- Diffwave: High-quality vocoder using diffusion probabilistic models
- More diffusion-based models in development...
Auto-regressive Vocoders
- WaveNet: Deep generative model for raw audio waveforms
- WaveRNN: Efficient neural autoregressive vocoder
- Additional models under development...
Usage Example
from amphion.vocoders import HiFiGAN # Initialize vocoder vocoder = HiFiGAN( checkpoint="path/to/checkpoint", device="cuda" ) # Generate waveform from mel-spectrogram waveform = vocoder.generate(mel_spectrogram)
Model Configuration
from amphion.config import Config config = Config( model_type="hifigan", sample_rate=44100, hop_length=256, # Model specific configurations upsample_rates=[8, 8, 2, 2], upsample_kernel_sizes=[16, 16, 4, 4], upsample_initial_channel=512, resblock_kernel_sizes=[3, 7, 11], resblock_dilation_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5]] )
Training Your Own Vocoder
from amphion.vocoders import VocoderTrainer trainer = VocoderTrainer( model_type="hifigan", config=config, training_data="path/to/data", validation_data="path/to/val_data" ) trainer.train( epochs=1000, batch_size=16, save_dir="path/to/save" )