Vocoders

Overview

Neural vocoders generate audible waveforms from acoustic representations, which is one of the key parts for current audio generation systems. Amphion supports various widely-used vocoders:

GAN-based Vocoders

  • MelGAN: Fast and lightweight vocoder using generative adversarial networks
  • HiFi-GAN: High-fidelity speech synthesis with adversarial learning
  • NSF-HiFiGAN: Neural source-filter model combined with HiFi-GAN
  • BigVGAN: Enhanced vocoder with better voice quality
  • APNet: Advanced parallel network for efficient waveform generation

Flow-based Vocoders

  • WaveGlow: Flow-based network capable of generating high quality speech
  • More flow-based models coming soon...

Diffusion-based Vocoders

  • Diffwave: High-quality vocoder using diffusion probabilistic models
  • More diffusion-based models in development...

Auto-regressive Vocoders

  • WaveNet: Deep generative model for raw audio waveforms
  • WaveRNN: Efficient neural autoregressive vocoder
  • Additional models under development...

Usage Example

from amphion.vocoders import HiFiGAN # Initialize vocoder vocoder = HiFiGAN( checkpoint="path/to/checkpoint", device="cuda" ) # Generate waveform from mel-spectrogram waveform = vocoder.generate(mel_spectrogram)

Model Configuration

from amphion.config import Config config = Config( model_type="hifigan", sample_rate=44100, hop_length=256, # Model specific configurations upsample_rates=[8, 8, 2, 2], upsample_kernel_sizes=[16, 16, 4, 4], upsample_initial_channel=512, resblock_kernel_sizes=[3, 7, 11], resblock_dilation_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5]] )

Training Your Own Vocoder

from amphion.vocoders import VocoderTrainer trainer = VocoderTrainer( model_type="hifigan", config=config, training_data="path/to/data", validation_data="path/to/val_data" ) trainer.train( epochs=1000, batch_size=16, save_dir="path/to/save" )