For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hello guys, hoping you're doing fine!
I was wondering, for users with 4x-8x 6000 PROs (so between 384 and 768GB VRAM), how are bigger models working for you?
I have planned to either jump to 4 or 8 from my actual system, and want to see the experiences with these lately.
In theory you can run GLM 5.2 at 4 bits, but not 8 bits right? Same with Kimi 2.7, or DeepSeek V4 Pro. There is a ton of info here https://github.com/local-inference-lab/rtx6kpro/blob/master/benchmarks/results.md, but missing some of the latest models.
Is there a way too big agentic or programming performance hit by using less than 8 bits? I ask this mostly, because I have read that 4bit perf hit for agentic or programming is way too high vs 8bit, but for bigger models not sure how it really works here.
Are you running these on vLLM/SGLang or another backend?
Many thanks!
[link] [comments]
More from r/LocalLLaMA
-
Why Dario is on fire: lesson from dotcom bubble.
Jun 30
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.