r/LocalLLaMA · · 1 min read

Gemma4 26b a4b Apex quant is quite good

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I tried mudler's apex quant for gemma4 26b a4b and it was amazing! I got 38tps at 90.000 context with no loop and suprisingly no quality degradation. I used mudler/gemma-4-26B-A4B-it-APEX-GGUF / APEX-I-Compact (15gb) on my RX 9060 XT 16 GB with llama.cpp Vulkan.

For comperison, my previous quant gemma4 26b a4b unsloth ud-q5kxl quant (21.2gb) looped with similar long-context test at 50k context

Im not claiming its a universally better quant. But it is worth give a go imo.

submitted by /u/Any-Chipmunk5480
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA