Datasets
Overview
Amphion unifies the data preprocessing for various open-source datasets and provides tools for managing and preparing training data.
Supported Datasets
Speech Datasets
-
LibriTTS
- Multi-speaker English speech dataset
- 585 hours of speech data
- 2,456 speakers
-
LJSpeech
- Single speaker English speech dataset
- 24 hours of speech data
- Professional female voice
-
VCTK
- Multi-speaker English speech dataset
- 44 hours of speech data
- 109 speakers
Singing Datasets
-
M4Singer
- Mandarin singing voice dataset
- Professional singers
- Multiple singing styles
-
Opencpop
- Chinese popular music dataset
- Professional recordings
- Phoneme-level alignments
-
OpenSinger
- Open-source singing voice dataset
- Multiple languages
- Various singing styles
Audio Datasets
- AudioCaps
- Audio captioning dataset
- 46K audio clips
- Natural language descriptions
Emilia Dataset
The Emilia dataset is a large-scale multilingual speech dataset specifically designed for speech generation:
- 101K hours of speech data
- Multiple languages
- In-the-wild recordings
- High-quality annotations
Accessing Emilia
from amphion.data import EmiliaDataset dataset = EmiliaDataset( root="path/to/emilia", split="train" )
Data Preprocessing
from amphion.data import preprocess_dataset # Preprocess a supported dataset preprocess_dataset( dataset="ljspeech", input_dir="path/to/raw", output_dir="path/to/processed" )
Custom Dataset
from amphion.data import AudioDataset class CustomDataset(AudioDataset): def __init__(self, root, split): super().__init__(root, split) # Custom initialization def __getitem__(self, index): # Custom data loading logic return item