Amphion: An Open-Source Audio, Music, and Speech Generation Toolkit

Introduction

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Core Features

Text-to-Speech (TTS)

FastSpeech2
VITS
VALL-E
NaturalSpeech2
Jets
MaskGCT

Singing Voice Conversion (SVC)

Multiple content-based features
State-of-the-art architectures
Diffusion-based models

Text-to-Audio (TTA)

Latent diffusion models
High-quality audio generation
Text-conditional synthesis

Vocoders

GAN-based: MelGAN, HiFi-GAN, NSF-HiFiGAN, BigVGAN, APNet
Flow-based: WaveGlow
Diffusion-based: Diffwave
Auto-regressive: WaveNet, WaveRNN

Latest News

2024/10/19: Release of MaskGCT
2024/09/01: Amphion, Emilia and DSFF-SVC accepted by IEEE SLT 2024
2024/08/28: Join our Discord community
2024/08/20: SingVisio accepted by Computers & Graphics
2024/08/27: Emilia dataset now publicly available

Amphion: An Open-Source Audio, Music, and Speech Generation Toolkit

Introduction

Core Features

Text-to-Speech (TTS)

Singing Voice Conversion (SVC)

Text-to-Audio (TTA)

Vocoders

Latest News

Documentation

Community

License

On This Page