Big News for AMD / Strix Halo+ Owners
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Admittedly this is news for me, but I'm hoping it could be of some use to others here as well!
So, THE NPU IS USABLE!!
I've owned an AMD Ryzen 395 Max AI+ (or whatever the naming is lol) for about a year now and have relied solely on GGUFs and Vulkan. I acknowledge that the AMD Ryzen AI team has been working hard to get their ROCm software up to speed w/ their hardware.
https://kyuz0.github.io/amd-strix-halo-toolboxes/
This database did NOT look so ROCm friendly 6 months ago.
Why should I care?
If you own a device w/ both an NPU and a iGPU (like the strix halo series) then you WANT hybrid models. The NPU is CRAZY FAST at PromptProcessing, and can run parallel to gpu firing.Okay, What is Hybrid Mode?
So, LLMs can run through the NPU only. If they're built for it. Check out "FastFlowLM NPU" models for examples that do that. BUT HYBRID mode combines the best of both, and FINALLY utilizes the hardware purchased nearly a year go (for some, more than that).What can i do to test this?
Download Lemonade! Thanks to their efforts that focus primarily on Ryzen AI and working directly w AMD, I've FINALLY got my machine working in ways it couldn't a year ago and Lemonade made it happen. It's GUI is ultra bare-bones and I wouldn't recommend it for any actual agentic/chat/harness usage BUT being able to sanity-test software without investing days or weeks into it?
10/10
Here's the link: lemonade-server.ai
Speaking of links, read more about Hybrid Mode and making your own Hybrid Models here: https://ryzenai.docs.amd.com/en/latest/llm/overview.htmlhttps://ryzenai.docs.amd.com/en/latest/llm/overview.html
---
So, that's it. Just wanted to share. REALLY EXCITED that my year old computer is still advancing in the software science of it all.
I have a single wishlist/request now: MTP-supported Hybrid Models. Qwen 3.6 has that speedup tech introduced by Unsloth, and AMD has a guide for "new processor shapes" since 3.6 GGUF can't simply be "converted to ONNX". Here's that guide: https://ryzenai.docs.amd.com/en/latest/oga_op_prepare.html
If anyone attempts it, please share on huggingface!
This was all written by hand btw, no llm assistance, just passionate dev obsessed w "new shiny".
[link] [comments]
More from r/LocalLLaMA
-
Why Dario is on fire: lesson from dotcom bubble.
Jun 30
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.