r/LocalLLaMA · · 1 min read

Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Just saw Xiaomi MiMo announce MiMo-V2.5-Pro UltraSpeed, claiming they broke the 1,000 tokens/sec output barrier on a 1 trillion parameter MoE model. According to them, they’re doing it on a single standard 8-GPU node, not custom wafer-scale hardware like Cerebras and not SRAM-heavy hardware like Groq.

Crazy if true.

submitted by /u/No-Selection2972
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA