text-to-speech - a MerlinLi Collection

MerlinLi 's Collections

any-to-embedding

domain-specific-llm

llm-structured-data

text-to-speech

updated Apr 26, 2025

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Paper • 2306.15687 • Published Jun 23, 2023 • 1
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 37
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Paper • 2307.07218 • Published Jul 14, 2023 • 28
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Paper • 2306.03509 • Published Jun 6, 2023 • 5
parler-tts/dac_44khZ_8kbps

76.7M • Updated Apr 10, 2024 • 586 • 19
parler-tts/parler_tts_mini_v0.1

Text-to-Speech • 0.6B • Updated Apr 30, 2024 • 2.99k • 358
Wenetspeech4TTS/WenetSpeech4TTS

Updated Jul 25, 2024 • 2.49k • 86
liuhuadai/AudioLCM

Text-to-Audio • Updated Jun 6, 2024 • 9
kyutai/mimi

Feature Extraction • 96.2M • Updated Jul 2, 2025 • 2.32M • • 302
hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10, 2025 • 11.7M • • 6.21k
HKUSTAudio/Llasa-3B

Text-to-Speech • 4B • Updated May 10, 2025 • 243 • 526
Zyphra/Zonos-v0.1-hybrid

Text-to-Speech • 2B • Updated Jun 3, 2025 • 2.3k • 1.11k
stepfun-ai/Step-Audio-TTS-3B

Text-to-Speech • 4B • Updated Feb 17, 2025 • 74 • 197
ByteDance/MegaTTS3

Text-to-Speech • Updated Apr 4, 2025 • 143 • 417
nari-labs/Dia-1.6B

Text-to-Speech • 2B • Updated Jun 1, 2025 • 6.57k • • 2.86k