club-rdna16: practical 16GB AMD/Radeon local LLM testing repo
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Following on from club-5060ti, I’ve been doing some testing with my desktop AMD GPU and wanted to make a similar repo for 16GB Radeon cards.
Repo:
https://github.com/5p00kyy/club-rdna16
Pages/results:
https://5p00kyy.github.io/club-rdna16/
The first test machine is an RX 6900 XT 16GB running llama.cpp with ROCm/HIP. I’ve mainly been testing Qwen3.6 27B and Qwen3.6 35B-A3B using the Unsloth MTP GGUFs, currently using the UD-IQ3_XXS model quant with q8 KV cache.
The repo is meant to be practical rather than a synthetic leaderboard. I’m trying to capture the stuff that actually matters when someone wants to run a model locally:
- exact llama.cpp launch profiles
- context length that actually fits
- KV cache settings
- short prompt throughput
- long-context retrieval checks
- AMD power profile notes
- ROCm/HIP setup details
- result templates for other Radeon users
A few early findings from the RX 6900 XT:
- Qwen3.6 35B-A3B has been the strongest practical result so far on this card.
- 131k context with q8 KV works well as a stable non-MTP profile.
- 100k context with q8 KV and MTP also works, but needs careful settings.
- Some profiles that answer short prompts fine still fail or become impractical on longer prompts.
- The AMD compute power profile made a real difference for long-context prefill.
- Qwen3.6 27B runs, but so far the 35B-A3B profile has been more useful in my testing.
I’d like this to become useful for people with RX 6900 XT, RX 6800 XT, RX 7800 XT, RX 7900 GRE, RX 9070 XT, and similar 16GB AMD cards.
If anyone has a 16GB Radeon card and wants to run the same scripts, result submissions would be useful. The most useful reports would include the GPU, ROCm/driver version, backend, power profile, model, model quant, KV cache type, context length, and whether the long-context retrieval test passed.
Still early, but I figured it was worth pushing publicly so AMD users have somewhere to compare reproducible llama.cpp/ROCm results instead of piecing everything together from scattered comments.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.