r/LocalLLaMA · · 1 min read

Rollin' MiMo-2.5 on two Halo Strixeses

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Rollin' MiMo-2.5 on two Halo Strixeses

Twas a very high effort post on two 128GB machines with 8060s, proxmox/containers, usb4net secondary link and a rocm llama.cpp built with a crowbar and a lot of swearing options. Not mentioning the hair pulling while trying to build the other backends. So far 356pp and 15tg, provided it's at 1% or 10k of context length. Dis good? What do? Am I considered aristocracy here?
As for the other backends, have anyone had any actual luck building and serving models with vllm or sglang on that hardware? Because my experience so far is "it's always something" with the former and "it's really for datacenter not consumer hardware" with the latter. As far as I understod, I need one of them to run something like DeepSeek v4 Flash in its original form.

submitted by /u/Rude_Ambassador_6270
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA