r/LocalLLaMA · May 26, 2026 · 1 min read

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

MOSS-TTS-v1.5

MOSS-TTS-v1.5 is continued from MOSS-TTS 1.0. It preserves the main 1.0 capabilities, including zero-shot voice cloning, long-form speech generation, token-level duration control, Pinyin/IPA pronunciation control, multilingual synthesis, and code-switching. For the full 1.0 feature walkthrough, input schema, decoding hyperparameters, and evaluation tables, please refer to the MOSS-TTS 1.0 README.

Compared with MOSS-TTS 1.0, v1.5 focuses on the following improvements:

Stronger multilingual synthesis with language tags: when the language field is omitted, v1.5 may improve some languages and regress slightly on others compared with 1.0. When the language is specified, v1.5 is stronger than 1.0 on almost all supported languages. Set the tag when building the user message, for example processor.build_user_message(text=text_fr, language="French").
More stable voice cloning: v1.5 improves speaker similarity and reduces cloning variance, making repeated generations more consistent.
Better long-reference, short-text cloning: v1.5 handles scenarios where the reference audio is much longer than the target text more reliably than 1.0.
More stable punctuation-following prosody: v1.5 follows punctuation-driven pauses more closely, especially in long sentences.
Explicit pause control: v1.5 supports inline pause markers such as "[pause 3.2s]". For example, 我今天学习了一首中国的古诗，它的名字是[pause 3.2s]静夜思！ inserts an explicit 3.2s pause before 静夜思.

Supported Languages

MOSS-TTS-v1.5 currently supports 31 languages. It keeps the 20 languages supported by MOSS-TTS 1.0 and extends multilingual continued training to additional languages including Cantonese, Dutch, Finnish, Hindi, Macedonian, Malay, Romanian, Swahili, Tagalog, Thai, and Vietnamese.

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

MOSS-TTS-v1.5

Supported Languages

Discussion (0)

More from r/LocalLLaMA