Luce DFlash + PFlash on AMD Strix Halo: Qwen3.6-27B at 2.23x decode and 3.05x prefill vs llama.cpp HIP
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hey fellow Llamas, keeping it short. We just shipped DFlash and PFlash support for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, Strix Halo, 128 GiB unified memory). Same Luce DFlash stack from the RTX 3090 post a couple weeks back, now running on the consumer AMD APU class. Repo: https://github.com/Luce-Org/lucebox-hub (MIT) TL;DREnd-to-end on Qwen3.6-27B Q4_K_M with the Luce Q8_0 DFlash drafter: 26.85 tok/s decode and 20.2 s prefill at 16K context. That is 2.23x faster decode and 3.05x faster prefill than llama.cpp HIP on the same silicon. At a 16K prompt + 1K generation workload, total wall clock drops from 147 s to 58 s, 2.5x faster end to end. The same 128 GiB box hosts checkpoints up to ~100 GiB, a class of models a 24 GiB consumer GPU cannot touch (Qwen3.5-122B-A10B, MiniMax-M2.7-REAP 139B-A10B, full BF16 27B). The numbersHardware: Ryzen AI MAX+ 395, Radeon 8060S iGPU (gfx1151), 128 GiB LPDDR5X-8000, ROCm 7.2.2 Target: Qwen3.6-27B Q4_K_M (15.65 GiB) Drafter: Decode (Qwen3.6-27B Q4_K_M, tok/s):
Prefill (Qwen3.6-27B, 16K tokens):
Speedup grows with context: PFlash compress is O(S), AR prefill is O(S^2). NIAH retrieval still passes at 16K. Tuning note: Reproducebash
What is still missing
ConstraintsROCm 7.2.2+, gfx1151 tuned (gfx1100 also supported with arch-aware defaults), greedy verify only, no Vulkan / Metal / multi-GPU on this path yet. We're working hard on this but we know we need to improve on many things. Feedback is more than welcome :) [link] [comments] |
More from r/LocalLLaMA
-
Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options?
May 13
-
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)
May 13
-
Efficient pretraining with token superposition by Nous Research
May 13
-
New models possibly from Baidu (ERNIE) this month?
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.