r/LocalLLaMA · · 6 min read

85 GPU-hours comparing 5 abliteration methods on Qwen3.6-27B: benchmarks, safety, weight forensics - Abliterlitics

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I've been building Abliterlitics, an open-source abliteration forensics toolkit. The idea is straightforward: take the same base model, compare the different abliteration techniques others have applied, then measure what actually changed using benchmarks, safety evaluation, distribution shift, and weight-level analysis. This post covers Qwen3.6-27B, comparing five abliteration variants against the base model. I recovered safetensors from HauhauCS's Q8_K_P GGUF, then ran 85 hours of benchmarks, HarmBench, KL divergence, and weight forensics across all six. Heretic and Huihui are the top two for capability preservation: Huihui has the smallest benchmark deltas, Heretic has the lowest KL divergence. All five abliterated models reach near-complete safety removal. AEON's "enhanced capabilities" claim is contradicted by the data. Abliterix has the worst capability preservation by far. Full report with all tables and charts: HuggingFace model card.

The six models

Name Type
Base Qwen/Qwen3.6-27B
Heretic llmfan46/Qwen3.6-27B-uncensored-heretic-v2
HauhauCS HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
Huihui huihui-ai/Huihui-Qwen3.6-27B-abliterated
AEON AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16
Abliterix wangzhang/Qwen3.6-27B-abliterated-v2

HauhauCS used a tool called "Reaper Abliteration," which was shown to be plagiarised from Heretic under AGPL-3.0 with all attribution stripped and relicensed to PolyForm Noncommercial. Based on our analysis of the recovered source code, Reaper adds subspace rank-k ablation, per-component continuous curves, and SOM clustering on top of the Heretic-derived core. The model was exported as Q8_K_P GGUF. I converted it back to safetensors with ungguf, our GGUF-to-safetensors tool. The weights therefore carry two layers of modification: Reaper's abliteration edits and GGUF quantisation round-trip noise, superimposed.

I will discontinue HauhauCS in all future comparisons. Without proper safetensors and the tool being plagiarized, there's no point. The lossless claims are debunked in every model and the tool Reaper Abliteration is open for anyone to see how the models are created.

Benchmarks

Evaluated with lm-evaluation-harness via vLLM 0.19.0, BitsAndBytes 4-bit quantisation on a single RTX 5090. All six models tested with identical settings. BNB4 quantisation drops absolute scores but preserves relative deltas between variants.

Task Base Heretic HauhauCS Huihui AEON Abliterix
MMLU 83.3% 82.8% 83.9% 83.4% 82.9% 81.3%
HellaSwag 83.5% 83.2% 83.1% 83.5% 82.7% 77.3%
ARC Challenge 59.1% 58.0% 57.9% 59.5% 56.1% 53.2%
WinoGrande 77.7% 77.7% 77.7% 77.4% 75.3% 74.9%
TruthfulQA MC2 56.7% 51.1% 47.2% 54.8% 46.1% 48.7%
PiQA 81.0% 81.0% 81.0% 81.2% 80.4% 75.7%
GSM8K (7168 tok) 34.4% 27.5% 51.0% 75.1% 51.2% 37.6%
Lambada (ppl) 3.18 3.24 3.35 3.15 3.44 9.12

Delta vs base

Task Heretic HauhauCS Huihui AEON Abliterix
MMLU -0.5 +0.6 +0.1 -0.4 -2.0
HellaSwag -0.3 -0.4 +0.0 -0.8 -6.2
ARC Challenge -1.1 -1.2 +0.4 -3.0 -5.9
WinoGrande +0.0 +0.0 -0.3 -2.4 -2.8
TruthfulQA MC2 -5.6 -9.5 -1.9 -10.6 -8.0
PiQA +0.0 +0.0 +0.2 -0.6 -5.3
GSM8K -6.9 +16.6 +40.7 +16.8 +3.2

Charts: Benchmark Comparison | Delta Chart

HarmBench

HarmBench with 400 textual behaviours, max_tokens=6144, classified with CoT direction analysis. Verified by three independent LLM reviewers.

Variant ASR Empty Full CoT ASR
Base 25.8% 1 26.0%
Huihui 98.5% 5 99.8%
HauhauCS 94.5% 22 100.0%
Abliterix 94.5% 22 100.0%
Heretic 92.5% 30 100.0%
AEON 88.8% 45 100.0%

Four of five reach 100% Full CoT ASR. The reported ASR differences come from how much the 6144-token generation budget is consumed by chain-of-thought reasoning before the visible response. When the budget is exhausted, the response is empty and the classifier marks it as a refusal. This understates the true ASR.

Charts: HarmBench Summary | By Category

KL Divergence

Lower is better. Measures output distribution shift from base on benign prompts.

Variant KL (batchmean) Rating
Heretic 0.0037 excellent
Huihui 0.0074 excellent
Abliterix 0.0222 very good
AEON 0.0238 very good
HauhauCS 0.0242 very good

All five are well below the capability damage threshold at KL around 0.1.

Weight Analysis

This is where things get interesting.

Metric AEON Abliterix Heretic Huihui HauhauCS
Tensors changed 88 (10.4%) 101 (11.9%) 120 (14.1%) 128 (15.1%) 564 (66.4%)
Relative edit 6.0% 5.2% 2.1% 1.5% 0.7%

HauhauCS is an extreme outlier with 4.4-6.4x more changed keys than any other variant. This is the combination of Reaper's abliteration targeting multiple component types plus GGUF Q8_K_P round-trip noise. A uniform ~0.57% relative edit is visible across all tensor types, including types that other methods don't touch like embed_tokens and q_proj. The abliteration signal sits on top of this noise floor.

Pairwise cosine similarities between the four other techniques are mostly below 0.07. No two techniques discovered the same weight direction. The "refusal direction" in weight space is not a single vector but a manifold with many viable removal pathways.

What stands out

Heretic has the lowest KL divergence at 0.0037, rated "excellent." Smallest weight footprint at 2.1% relative edit. Smallest GSM8K loss at just -6.9pp. Achieves 100% Full CoT ASR. 120 tensors, 3 types.

Huihui has the smallest benchmark deltas. Average delta on non-GSM8K tasks is just 0.5pp, beating Heretic's 1.3pp. Wins 4 of 6 non-GSM8K tasks head to head. Highest reported ASR at 98.5% with the fewest empty responses at just 5. KL divergence is 0.0074, also rated "excellent." But GSM8K at 75.1% is a +40.7pp jump over base. No abliteration should improve reasoning that much. We have double-checked these results and would be interested to see independent benchmarks from others.

HauhauCS has solid behavioural results despite the complex weight fingerprint. MMLU is +0.6pp over base. 94.5% ASR going to 100% Full CoT. The Reaper abliteration plus GGUF noise doesn't meaningfully damage output distributions. The "lossless" claim is simply not evident when Heretic and Huihui both preserve capabilities better. The weights themselves carry Reaper's abliteration edits plus quantisation artefacts.

AEON degrades on every non-GSM8K task. TruthfulQA drops 10.6pp. ARC drops 3.0pp. Has the worst thinking loops with 45 out of 400 empty responses. Claims "no looping, no philosophizing spirals" and "measurably enhanced capabilities" are contradicted by the data.

Abliterix has the worst capability preservation. Lambada perplexity increases 2.9x from 3.18 to 9.12. HellaSwag drops 6.2pp. Concentrated surgical strikes with extreme outliers cause broad collateral damage.

What went wrong

85 hours of productive GPU time across 7 days. Plus ~25 hours lost to failed runs. 14 failed runs total.

The bulk were GSM8K timeouts. Qwen3.5 architecture is incompatible with BNB4 plus tensor parallelism. The default 120s request timeout was too short for extended reasoning. Wrote a patched script with 900s timeout to fix it. Accidentally re-ran AEON HarmBench with max_tokens=4096 instead of 6144. 6.7 hours wasted.

GSM8K per-model times vary dramatically because abliterated models think harder on math problems. HauhauCS took 53 minutes. AEON took 11 hours.

Methodology notes

All models evaluated with BitsAndBytes 4-bit quantisation on a single RTX 5090. Absolute scores are not directly comparable to bf16 results. Relative deltas between variants are preserved. GSM8K scores use flexible-extract matching. Treat GSM8K numbers as relative comparisons only.

The thinking budget matters. Initial runs with max_gen_toks=2048 gave terrible GSM8K scores because for reasoning models, max_gen_toks includes thinking tokens. The model would think for 1900 tokens, get cut off, and never produce an answer. Re-running with max_gen_toks=7168 gave the results above.

Summary table

Metric Heretic HauhauCS Huihui AEON Abliterix
HarmBench ASR 92.5% to 100% 94.5% to 100% 98.5% to 99.8% 88.8% to 100% 94.5% to 100%
MMLU 82.8% 83.9% 83.4% 82.9% 81.3%
GSM8K 27.5% 51.0% 75.1% 51.2% 37.6%
KL divergence 0.0037 0.0242 0.0074 0.0238 0.0222
Avg delta excl GSM8K 1.3pp 2.0pp 0.5pp 3.0pp 5.0pp
Tensors changed 120 564 128 88 101

Links

Full report with provenance analysis, tensor breakdown, and all charts: HuggingFace model card

Forensics toolkit: Abliterlitics on GitHub

GGUF-to-safetensors converter: ungguf on GitHub

Other tensor comparisons: DreamFast HauhauCS collection

While I have taken the time to verify all results thoroughly, I am open to any corrections, additional benchmarks, or further analysis. If you spot something that looks wrong and can be confirmed, I am happy to fix it.

submitted by /u/nathandreamfast
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA