Amphion: An Open-Source Audio, Music, and Speech Generation Toolkit

Introduction

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Core Features

Text-to-Speech (TTS)

  • FastSpeech2
  • VITS
  • VALL-E
  • NaturalSpeech2
  • Jets
  • MaskGCT

Singing Voice Conversion (SVC)

  • Multiple content-based features
  • State-of-the-art architectures
  • Diffusion-based models

Text-to-Audio (TTA)

  • Latent diffusion models
  • High-quality audio generation
  • Text-conditional synthesis

Vocoders

  • GAN-based: MelGAN, HiFi-GAN, NSF-HiFiGAN, BigVGAN, APNet
  • Flow-based: WaveGlow
  • Diffusion-based: Diffwave
  • Auto-regressive: WaveNet, WaveRNN

Latest News

  • 2024/10/19: Release of MaskGCT
  • 2024/09/01: Amphion, Emilia and DSFF-SVC accepted by IEEE SLT 2024
  • 2024/08/28: Join our Discord community
  • 2024/08/20: SingVisio accepted by Computers & Graphics
  • 2024/08/27: Emilia dataset now publicly available

View all news →

Documentation

Community

License

Amphion is released under the MIT License.