Amphion: An Open-Source Audio, Music, and Speech Generation Toolkit
Introduction
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Core Features
Text-to-Speech (TTS)
- FastSpeech2
- VITS
- VALL-E
- NaturalSpeech2
- Jets
- MaskGCT
Singing Voice Conversion (SVC)
- Multiple content-based features
- State-of-the-art architectures
- Diffusion-based models
Text-to-Audio (TTA)
- Latent diffusion models
- High-quality audio generation
- Text-conditional synthesis
Vocoders
- GAN-based: MelGAN, HiFi-GAN, NSF-HiFiGAN, BigVGAN, APNet
- Flow-based: WaveGlow
- Diffusion-based: Diffwave
- Auto-regressive: WaveNet, WaveRNN
Latest News
- 2024/10/19: Release of MaskGCT
- 2024/09/01: Amphion, Emilia and DSFF-SVC accepted by IEEE SLT 2024
- 2024/08/28: Join our Discord community
- 2024/08/20: SingVisio accepted by Computers & Graphics
- 2024/08/27: Emilia dataset now publicly available
Documentation
Community
License
Amphion is released under the MIT License.