r/LocalLLaMA · June 1, 2026 · 1 min read

100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.

#model-release

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

100 Trillion+ Pretraining data??? This is the largest data I've see a model being trained on.

https://preview.redd.it/oss7g2gnll4h1.png?width=894&format=png&auto=webp&s=5d4295707a700ed7541c274b8be8ad75bbd0903d

Edit: This is about Minimax-M3, I just realised I didn't mention it lol

Usually we see 27-50 Trillion tokens in most models, kimi, mimo, deepseek. They seem to have doubled the pretraining data. Minimax-m2.5 was like 27T tokens.

If we see mimo, they have done:

- 27T for the Mimo-v2.5-Pro 1 Trillion Parameters

- 48T for the smaller Mimo-v2.5 model which is multimodal.

- 32T for Deepseek V4 Flash and Pro

I find it difficult to believe this model will be much bigger than the previous M2 series models. The training data scale is way too big, and will require way more resources for a much bigger model.

M3 seems likely to be under 500B params.

submitted by /u/True_Requirement_891
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA