Hugging Face Daily Papers · · 3 min read

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Hi! KVarN is finally here! </p>\n<p>Happy to chat about our paper :)</p>\n","updatedAt":"2026-06-03T14:54:33.643Z","author":{"_id":"68b1e03b8aefe9d999b719f2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/XyPVTSmon49qmwELaMrnX.png","fullname":"Philippe Bich","name":"pbicho","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9473876357078552},"editors":["pbicho"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/XyPVTSmon49qmwELaMrnX.png"],"reactions":[{"reaction":"🚀","users":["pbicho","lukasc-ch","lokamu"],"count":3}],"isReport":false}},{"id":"6a2057ad13256663a9ed0b84","author":{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","fullname":"Urro","name":"urroxyz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false},"createdAt":"2026-06-03T16:34:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Very cool and useful.","html":"<p>Very cool and useful.</p>\n","updatedAt":"2026-06-03T16:34:53.146Z","author":{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","fullname":"Urro","name":"urroxyz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8435264825820923},"editors":["urroxyz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03458","authors":[{"_id":"6a203fea15100c5272a84417","name":"Lorenz K. Muller","hidden":false},{"_id":"6a203fea15100c5272a84418","name":"Philippe Bich","hidden":false},{"_id":"6a203fea15100c5272a84419","name":"Chiara Boretti","hidden":false},{"_id":"6a203fea15100c5272a8441a","name":"Hyun-Min Chang","hidden":false},{"_id":"6a203fea15100c5272a8441b","name":"Jiawei Zhuang","hidden":false},{"_id":"6a203fea15100c5272a8441c","name":"Lukas Cavigelli","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/68b1e03b8aefe9d999b719f2/Px9tWc8wYul4Ciswwu25M.mp4"],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-03T00:00:00.000Z","title":"KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks","submittedOnDailyBy":{"_id":"68b1e03b8aefe9d999b719f2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/XyPVTSmon49qmwELaMrnX.png","isPro":false,"fullname":"Philippe Bich","user":"pbicho","type":"user","name":"pbicho"},"summary":"Test-time scaling is a powerful approach to obtain better reasoning in large language models, but it becomes memory-bottlenecked during long-horizon decoding, as the KV-cache grows. KV-cache quantization can help improve this, but current methods are evaluated under prefill-like settings and errors behave differently under autoregressive decoding. We show that in the latter regime, quantization errors accumulate across timesteps, driven primarily by incorrect token scales. We introduce KVarN, a calibration-free KV-cache quantizer that applies a Hadamard rotation followed by a dual-scaling variance normalization across both axes of the K and V matrices. We find that this combination fixes outlying token-scale errors and substantially reduces error accumulation over existing baselines. KVarN establishes a new state-of-theart for KV-cache quantization on generative benchmarks, including MATH500, AIME24 and HumanEval, at 2-bit precision. A vLLM implementation of the KVarN method is available at https://github.com/huawei-csl/KVarN","upvotes":25,"discussionId":"6a203fea15100c5272a8441d","projectPage":"https://github.com/huawei-csl/KVarN","githubRepo":"https://github.com/huawei-csl/KVarN","githubRepoAddedBy":"user","ai_summary":"KVarN is a calibration-free KV-cache quantizer that uses Hadamard rotation and dual-scaling variance normalization to reduce error accumulation during autoregressive decoding in large language models.","ai_keywords":["KV-cache quantization","autoregressive decoding","Hadamard rotation","dual-scaling variance normalization","error accumulation","token scales","KVarN"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":25,"organization":{"_id":"68dd34af7ffcb962c2e1c461","name":"huawei-csl","fullname":"HUAWEI Computing Systems Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6442ef61860f7a25bef0ea51/rkv-GMqP_NCzoQxXhsvuW.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"68b1e03b8aefe9d999b719f2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/XyPVTSmon49qmwELaMrnX.png","isPro":false,"fullname":"Philippe Bich","user":"pbicho","type":"user"},{"_id":"6442ef61860f7a25bef0ea51","avatarUrl":"/avatars/8cccac5ac73498fa790e193a908c2057.svg","isPro":false,"fullname":"Lukas Cavigelli","user":"lukasc-ch","type":"user"},{"_id":"68e923d470aa8961570074a6","avatarUrl":"/avatars/e75bd1a7164839521284f0576bb86369.svg","isPro":false,"fullname":"Chiara Boretti","user":"chiaraboretti","type":"user"},{"_id":"6304c149dae2eb7d08407030","avatarUrl":"/avatars/efdca8cb990b16054970286382a2bbbd.svg","isPro":false,"fullname":"Lorenz Müller","user":"lokamu","type":"user"},{"_id":"6866980170bf4e858dd7bfeb","avatarUrl":"/avatars/520230eb0f015a4995d158215dfeac18.svg","isPro":false,"fullname":"Julien Vincent Eudine","user":"Julien234","type":"user"},{"_id":"68126df61c5b434f88876f2e","avatarUrl":"/avatars/441e017765f508fb77bbfbf731844331.svg","isPro":false,"fullname":"Igor Pavlovic","user":"igzi","type":"user"},{"_id":"692dba5ea479e5fb4f5ddb9e","avatarUrl":"/avatars/20fb7635ec3e072fe5beb5c656aaf13c.svg","isPro":false,"fullname":"George Bisbas","user":"georgebisbas","type":"user"},{"_id":"6670d9492e3154947fc485a5","avatarUrl":"/avatars/db3e07ced7072b8cca5e670ff9c302ab.svg","isPro":false,"fullname":"Hyun-Min Chang","user":"Mocchibird","type":"user"},{"_id":"671a701c15578cd5aa5fe203","avatarUrl":"/avatars/5ebd12f691f29c9eaa1db35b663b9685.svg","isPro":false,"fullname":"Felix Arnold","user":"plex1","type":"user"},{"_id":"6943d1bba0c3da3431d095d8","avatarUrl":"/avatars/9ddc40084f6943095fdadb177b8fa570.svg","isPro":false,"fullname":"Niclas","user":"vniclas","type":"user"},{"_id":"66545656f8137bb650d9dc8b","avatarUrl":"/avatars/5910a42e7da793635717e76a94e80037.svg","isPro":false,"fullname":"Ahmet Yuzuguler","user":"acyuzuguler","type":"user"},{"_id":"64b99bf99ac0b723d7d32ade","avatarUrl":"/avatars/0c9d258547dc9ce7fd00417b093343ab.svg","isPro":false,"fullname":"Axel Laborieux","user":"A-bao","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68dd34af7ffcb962c2e1c461","name":"huawei-csl","fullname":"HUAWEI Computing Systems Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6442ef61860f7a25bef0ea51/rkv-GMqP_NCzoQxXhsvuW.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03458.md"}">
Papers
arxiv:2606.03458

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

Published on Jun 2
· Submitted by
Philippe Bich
on Jun 3
Authors:
,
,
,
,
,

Abstract

KVarN is a calibration-free KV-cache quantizer that uses Hadamard rotation and dual-scaling variance normalization to reduce error accumulation during autoregressive decoding in large language models.

Test-time scaling is a powerful approach to obtain better reasoning in large language models, but it becomes memory-bottlenecked during long-horizon decoding, as the KV-cache grows. KV-cache quantization can help improve this, but current methods are evaluated under prefill-like settings and errors behave differently under autoregressive decoding. We show that in the latter regime, quantization errors accumulate across timesteps, driven primarily by incorrect token scales. We introduce KVarN, a calibration-free KV-cache quantizer that applies a Hadamard rotation followed by a dual-scaling variance normalization across both axes of the K and V matrices. We find that this combination fixes outlying token-scale errors and substantially reduces error accumulation over existing baselines. KVarN establishes a new state-of-theart for KV-cache quantization on generative benchmarks, including MATH500, AIME24 and HumanEval, at 2-bit precision. A vLLM implementation of the KVarN method is available at https://github.com/huawei-csl/KVarN

Community

Paper submitter about 6 hours ago

Hi! KVarN is finally here!

Happy to chat about our paper :)

Very cool and useful.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.03458
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.03458 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.03458 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.03458 in a Space README.md to link it from this page.

Collections including this paper 2

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers