r/LocalLLaMA · May 25, 2026 · 1 min read

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

https://huggingface.co/Zhongzhu/OSCAR-RotationZoo

OSCAR RotationZoo

Precomputed K/V rotation matrices for OSCAR INT2 KV-cache quantization.

This repository contains the artifacts for the paper: OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization Zhongzhu Zhou, Donglin Zhuang, Jisen Li, Ziyan Chen, Shuaiwen Leon Song, Ben Athiwaratkun, Xiaoxia Wu

📄 Paper — arXiv:2605.17757
🌐 Website — https://oscar-quantize.github.io/
💻 Code — https://github.com/FutureMLS-Lab/OSCAR

OSCAR captures Q/K/V activations on a small calibration set, estimates attention-aware K/V covariance offline, and derives per-layer orthogonal rotations that align INT2 quantization with the directions attention actually consumes. The result is ~7× compression of the KV-cache memory footprint with single-digit pp accuracy drop on GPQA for dense reasoning models.

This repo packages the rotations as drop-in .pt files so you don't need to re-run the Q/K/V dump and eigendecomposition yourself.

Available rotations

Model	Calibration	GPQA (BF16)	GPQA (OSCAR INT2)
`Qwen/Qwen3-4B-Thinking-2507`	`seq20000_prompt83_group128`	67.27	67.17
`Qwen/Qwen3-4B-Thinking-2507`	`seq20000_prompt85_group128` (fresh re-dump)	67.27	—
`Qwen/Qwen3-8B`	`seq20000_prompt83_group128`	56.67	55.56
`Qwen/Qwen3-32B`	`seq16000_prompt69_group128`	58.49	60.40
`zai-org/GLM-4.7-FP8`	`seq10000_prompt43_group128`	73.23	73.57

Time to time, we're getting stuffs like this. And I keep updating this thread continuously with those things. Hopefully I can run medium size(30-40B) MOE models(Also 10-20B Dense models) better & faster with 8GB VRAM by end of this year.

Would be awesome to have this on llama.cpp.

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

OSCAR RotationZoo

Available rotations

Discussion (0)

More from r/LocalLLaMA