r/LocalLLaMA · · 1 min read

Benchmark & Reality Check on Gemma 4 12B: Great model, but your local settings are probably breaking it (Fix inside)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I completed a Python bug hunting benchmark with Gemma 4 12B. I used the Unsloth Dynamic Q5 GGUF model. The model has good capabilities. Default settings in LM Studio disable the reasoning.

Fix the LM Studio reasoning configuration. LM Studio looks for Qwen tokens. Gemma 4 uses different tokens. Change your settings with these steps.

• Open your inference settings.

• Add this text to the first line of your Jinja template: {%- set enable_thinking = true %}

• Set the start token to <|channel>thought

• Set the end token to <channel|>

Change your sampling parameters. Do not decrease the temperature. Low temperature hurts the reasoning quality. Use the official Google parameters.

• Set temperature to 1.0

• Set top_p to 0.95

• Set top_k to 64

Benchmark results and data. The model rewrote spatial loops correctly. The model replaced slow loops with a BallTree algorithm. The small size creates a limit for the model.

  • Qwen 35B q4 k xl found 14 bugs.
  • Gemma 4 12B q5 k xl found 6 bugs.

Better than 26B run I had. Probably need to find the better jinja file for it to work.

Configure your backend correctly to get the correct performance.

submitted by /u/SummarizedAnu
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA