DeepSeek v4 Pro is too big for such a "midrange" performance, or am I missing something?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hi.
DeepSeek v4 Pro has 1.6T parameters, probably the largest in open models, or at least one of the largest.
Yet it's not the best/most performance open model, considering a wide variety of definitions of "best". Indeed, in most cases, it is not the second best, third best, or fourth best either.
GLM 5.1 with 750B parameters is less than half the size of it, but is considered by many "an opus" in open models. So is Kimi K2.6, with 1T models, still far less than 1.T of DSv4 Pro. Now we have K2.7 and GLM 5.2, apparently of the same size as their predecessors, but improving the performance even further.
We also have MiniMax M3, recently revealed to be ~450-ish billions of parameters, and a better performance in many benchmarks and use cases. And finally there is MiMo v2.5 pro, also ranking higher than DSv4 Pro in benchmarks, but charged by cloud providers at the same price and being also in the 1T parameter range.
So, what am I missing? Is DeepSeek v4 Pro really "living up to the hype", or we can say it's indeed too big for a "just okay"/mediocre performance? Or maybe it's because of being "preview" and we should wait more? Or as many say (and I fully agree), it's the Huawei-based inference that matters this time, not the model scores? Anything else?
Thanks.
P.S. My point is not about DSv4 Flash at all! It is indeed much slimmer and giving a quite impressive "performance per weight".
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.