Our ICML paper on predictable hallucination (information-budget abstention gate), + ntkMirror: a training-free open-weight implementation we're releasing today
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Our paper, Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication, was accepted at ICML 2026. Paper: https://arxiv.org/abs/2509.11208
The idea: in evidence-grounded QA, the order you present exchangeable evidence in changes the model's answer probability (permutation dispersion). We treat order as a nuisance variable, derive the Expectation-level Decompression Law (EDFL) relating expected information budget to achievable reliability, and turn it into a fixed ISR=1 answer/abstain gate with no threshold tuning. When information is insufficient, the model abstains instead of guessing. In the paper's pre-specified held-out audit, the gate reaches 0.0–0.7% hallucination at ~24% abstention (80.5% accuracy on attempts), with the ISR=1 boundary fixed by theory rather than tuned.
What we're releasing today (ntkMirror): a training-free implementation of that gate for local open-weight models. It scores each claim under multiple evidence orderings (order-marginal verifier, exact tied-branch scoring), computes ISR from the per-permutation probabilities, and gates answer/abstain. No fine-tuning, no second model, runs on your own weights offline. We also ship a fused kernel that batches the permutation forwards: bit-identical to the naive loop at fp32, 2.6–10× faster.
New results (not in the paper): run as a hallucination detector across small local models, AUROC on VitaminC / BoolQ / SciFact:
| Model | VitaminC | BoolQ | SciFact |
|---|---|---|---|
| Qwen2.5-0.5B | 0.78 | 0.69 | 0.80 |
| Qwen2.5-1.5B | 0.69 | 0.78 | 0.91 |
| Gemma E4B | 0.88 | 0.84 | 0.96 |
| Qwen2.5-7B | 0.90 | 0.87 | 0.94 |
Separation scales with model size, strongest on SciFact and the larger models. Used as a gate on balanced data, the grounded fraction of accepted claims rises from 50% to roughly 75–90% depending on model/dataset, at the cost of dropping ~10–20% of valid claims. The kernel doesn't affect accuracy (AUROC gap ≤0.008); it just makes the gate cheap.
Please let me know if you find it useful https://github.com/leochlon/ntkmirror
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.