MiMo-V2.5 Voice

Bilingual ASR for dialects, code-switching, and songs

MiMo-V2.5-ASR is a powerful 8-billion parameter open-source speech recognition model developed by Xiaomi, engineered for exceptional accuracy and versatility. This advanced model proficiently transcribes a wide array of languages and speech styles, including Mandarin, English, eight distinct Chinese dialects, complex code-switched speech, and even song lyrics. It is an indispensable tool for machine learning engineers, researchers, and developers focused on creating robust, real-world voice applications that demand high-performance and broad linguistic coverage.

Categories:

API

Launch Date:

April 28, 2026

Product Info

https://platform.xiaomimimo.com/docs/usage-guide/speech-synthesis-v2.5