r/LocalLLaMA · · 3 min read

13 abliterated Gemma 4 E2B variants, 44 GPU hours, Benchmark and Comparison - Abliterlitics

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I compared 13 abliterated variants of Gemma 4 E2B across weight analysis, KL divergence, HarmBench safety, and 8 benchmark tasks. 44 GPU hours on a single RTX 5090. Here is what actually works and what destroys capabilities.

coder3101's variant achieves 96% ASR with capability fully preserved. It actually beats the base model on math. treadon hits 100% ASR but loses 3 points on GSM8K. Most "capabilities preserved" claims on model cards don't hold up.

Full report with all data tables, graphs, json and log artifacts of the entire progress: https://huggingface.co/DreamFast/Gemma4-e2b-abliterlitics

What I tested

13 abliterated variants of google/gemma-4-E2B-it from 9 creators. Four used the Heretic tool: coder3101, llmfan46, pew, and kasper. Two from Huihui (v1, v2). Plus TrevorJS, Wangzhang, WWT CyberLab, EtherOpus, Treadon, Prithiv, and Duoneural. Each got the same treatment: weight forensics, KL divergence, 400-prompt HarmBench evaluation with full LLM review of all 5,600 responses, and 8 benchmark tasks through lm-eval on native BF16.

Safety removal works regardless of technique

All 13 variants lift HarmBench ASR from the base model's 32.2% to between 82% and 100%. Five hit 99% or higher. treadon reaches 100% with zero refusals. The safety removal part is solved. That is not the interesting finding.

The interesting finding: abliteration can improve reasoning

Two variants beat the base model on GSM8K. coder3101 scores 84.8% versus base at 83.5%. llmfan46 scores 83.9%. Both use surgical, low-tensor-count approaches. The abliteration shortens thinking chains, so the model spends fewer tokens reasoning and more tokens answering. Within a fixed generation budget, that means more correct answers.

The capability damage is real for aggressive approaches

ether4o4 drops 6.9 points on GSM8K with 84 empty responses where the model thinks until it runs out of tokens without producing an answer. huihui-v2 drops 4.2 points. treadon drops 2.9 points.

LAMBADA perplexity tells a starker story. wangzhang hits 7.35x base perplexity. wwtcyberlab hits 5.69x. These variants disrupted language modelling beyond the refusal direction.

The "capabilities preserved" claims could be interpreted differently

duoneural claimed "near-zero divergence at approximately 0.001." I measured 0.187. That is 187x higher. After I raised this on their model card, they updated it with the real number.

wwtcyberlab claims "0.0% refusal rate and 101% quality preservation." I found 2 sort of refusals and LAMBADA perplexity at 5.69x base. Other benchmarks drop, although to be fair there's some areas preserved.

treadon says "same model, same weights, same knowledge." The KL divergence of 3.971 is 4.1x higher than any other variant.

Three creators got it right. coder3101 reports divergence of 0.1651 and I measured 0.1673, within 1.3%. pew reports 0.152, I got 0.153. trevorjs reports 0.346, I got 0.365. These match. The others, not so much.

My pick

coder3101 if you want one model and don't want to think about it. 96% ASR, beats base on math, benchmark scores within rounding error. trevorjs if you want near-maximal safety removal at 99.5% ASR with only minor math impact. llmfan46 if you want the most conservative approach with zero capability loss.

What broke along the way

5 of 13 models were missing 60 safetensor keys. Gemma4 uses shared KV projections for layers 15 to 34, and the export tools silently dropped them. Had to patch from base.

About 8 of the 44 GPU hours produced nothing usable. Crashes, wrong configs, silent failures. The data took roughly 36 hours to produce.

Links

Huggingface: https://huggingface.co/DreamFast/Gemma4-e2b-abliterlitics - Note that we now put all the json and log file artifacts onto huggingface going forward.

New abliterlitics website: https://abliterlitics.dev/models/gemma4-e2b/

Code: https://github.com/dreamfast/abliterlitics/tree/feat/gemma4-e2b-comparison - Snapshot of how the abliterlitics code looked after the results were completed.

What variants or models should I test next? Happy to answer methodology questions in the comments. Will move onto the Gemma 4 E4B next. :)

submitted by /u/nathandreamfast
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA