Hugging Face Daily Papers · · 11 min read

Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

💡 Deeper is Not Always Better: Bypassing the \"Alignment Tax\" in LLMs<br>Standard practice assumes that the deeper a layer is in an autoregressive LLM, the more accurate its token representation becomes. In our latest collaborative research in Qwen Team, we prove this isn't always true.<br>Through an information-theoretic analysis of residual streams, we exposed a recurring Guess-Refine-Perturb phase structure in aligned models. While intermediate layers crystallize highly accurate logical and semantic reasoning, dense post-training alignment (e.g. RLHF or DPO) forces low-rank steering perturbations in the final layers. For complex scientific or mathematical problems, this causes an \"Alignment Tax\"—dragging pristine reasoning back toward generic, hyper-frequent filler words.<br>To solve this without retraining, we present Confident Decoding:</p>\n<ul>\n<li>Entropy Valley Tracking: Uses an entropy-guided, conservative backward search to dynamically decode tokens at the peak of model confidence before late-stage steering conflicts arise.</li>\n<li>Universal Efficacy: Tested across dense and MoE families (Qwen3.5, Gemma-4, gpt-oss), securing massive surges on frontier benchmarks—including up to a +22.4% jump on categorized Omni-MATH Level 4, +9.4% and +6.5% absolute improvement on LiveCodeBench and GPQA-Diamond, respectively.</li>\n<li>Production Viability: Requires zero modification to the core forward pass or KV Cache. It functions natively inside high-throughput engines like vLLM with less than 2% wall-clock latency overhead.<br>Optimizing where to stop internally inside the network opens up an entirely new vertical paradigm for test-time compute (TTC).</li>\n</ul>\n<p>Paper: <a href=\"https://arxiv.org/pdf/2606.21906\" rel=\"nofollow\">https://arxiv.org/pdf/2606.21906</a><br>Project: <a href=\"https://github.com/QwenLM/Confident-Decoding\" rel=\"nofollow\">https://github.com/QwenLM/Confident-Decoding</a></p>\n","updatedAt":"2026-06-23T07:07:37.378Z","author":{"_id":"65fc5109899083a2aad987c5","avatarUrl":"/avatars/289dbb8128746d931118cff6f6871a45.svg","fullname":"XUANMING ZHANG","name":"XUANMINGZHANG","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8285377621650696},"editors":["XUANMINGZHANG"],"editorAvatarUrls":["/avatars/289dbb8128746d931118cff6f6871a45.svg"],"reactions":[],"isReport":false}},{"id":"6a3a56d248f40ebaec7551d3","author":{"_id":"679223db89c82d08b2cd18c3","avatarUrl":"/avatars/e09e649f4a90014d18f6ee5436b5d2dc.svg","fullname":"xyz","name":"xyzblaz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-23T09:50:10.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Qwen3.7-Max/Plus is already live as a closed API — any plans for open-weight releases of the 3.7 family? (like 3.6-35B-A3B / 3.6-27B alongside 3.6-Max)\n\nWould love to run it locally via llama.cpp / GGUF.","html":"<p>Qwen3.7-Max/Plus is already live as a closed API — any plans for open-weight releases of the 3.7 family? (like 3.6-35B-A3B / 3.6-27B alongside 3.6-Max)</p>\n<p>Would love to run it locally via llama.cpp / GGUF.</p>\n","updatedAt":"2026-06-23T09:50:10.328Z","author":{"_id":"679223db89c82d08b2cd18c3","avatarUrl":"/avatars/e09e649f4a90014d18f6ee5436b5d2dc.svg","fullname":"xyz","name":"xyzblaz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8949437141418457},"editors":["xyzblaz"],"editorAvatarUrls":["/avatars/e09e649f4a90014d18f6ee5436b5d2dc.svg"],"reactions":[{"reaction":"👍","users":["Lck0427","deuxenun","mikkoph","claireraud","Yeely1310","Maverobot","FreeDa2","peter-tie","SupPud","frostdpr","SHACortes","kooostia16"],"count":12}],"isReport":false},"replies":[{"id":"6a3a6202d8362c3d4b3c57df","author":{"_id":"65fc5109899083a2aad987c5","avatarUrl":"/avatars/289dbb8128746d931118cff6f6871a45.svg","fullname":"XUANMING ZHANG","name":"XUANMINGZHANG","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2026-06-23T10:37:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Absolutely will do.","html":"<p>Absolutely will do.</p>\n","updatedAt":"2026-06-23T10:37:54.305Z","author":{"_id":"65fc5109899083a2aad987c5","avatarUrl":"/avatars/289dbb8128746d931118cff6f6871a45.svg","fullname":"XUANMING ZHANG","name":"XUANMINGZHANG","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9776430726051331},"editors":["XUANMINGZHANG"],"editorAvatarUrls":["/avatars/289dbb8128746d931118cff6f6871a45.svg"],"reactions":[{"reaction":"❤️","users":["elitehawkturd","Tribbler","Cell-AI","deuxenun","Seginus42","artilSkib","mikkoph","nokyan","victor","iamdero","jimthev","claireraud","WeAreAllOne111","Yeely1310","weather11","N8Programs","Smaik966","GenericUs3r","Bjorgz","FreeDa2","g3n3r4tor","psoconnor","Alex7979","Tralalabs","CHHORVORN","peter-tie","Elsephire","1teenarp","SupPud","frostdpr","FK2048","volodXYZ","smcleod","SHACortes","lolren","anonymousmaharaj","bravecuteshiba","Flugfuchs","mdierolf","Iwaku-Real","kooostia16","jazzdevils","whoisjeremylam"],"count":43},{"reaction":"🚀","users":["Yeely1310","weather11","N8Programs","Tralalabs","1teenarp","FK2048","smcleod","SHACortes","lolren","anonymousmaharaj","bravecuteshiba","mdierolf"],"count":12},{"reaction":"🔥","users":["Yeely1310","weather11","N8Programs","Tralalabs","SupPud","SHACortes","lolren","anonymousmaharaj","bravecuteshiba","mdierolf"],"count":10},{"reaction":"🤗","users":["Maverobot","weather11","N8Programs","hfaccount111155","Tralalabs","SHACortes","lolren","mdierolf"],"count":8},{"reaction":"😎","users":["Tralalabs","SHACortes","lolren"],"count":3},{"reaction":"👀","users":["peter-tie","SHACortes","lolren"],"count":3}],"isReport":false,"parentCommentId":"6a3a56d248f40ebaec7551d3"}},{"id":"6a3aceaafd6ce28b2a7879fa","author":{"_id":"68b7c9a511614179332e1f3b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xS2DKGIjTpcGq_lDWpwnT.png","fullname":"Darius Farnunkle","name":"vmlinux","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":0,"isUserFollowing":false},"createdAt":"2026-06-23T18:21:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"I will show my feet on only fans to help fund the gofundme for a 120b-130b :D ","html":"<p>I will show my feet on only fans to help fund the gofundme for a 120b-130b :D </p>\n","updatedAt":"2026-06-23T18:21:30.496Z","author":{"_id":"68b7c9a511614179332e1f3b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xS2DKGIjTpcGq_lDWpwnT.png","fullname":"Darius Farnunkle","name":"vmlinux","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":0,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8428441882133484},"editors":["vmlinux"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xS2DKGIjTpcGq_lDWpwnT.png"],"reactions":[{"reaction":"🤯","users":["mdierolf"],"count":1},{"reaction":"😔","users":["Iwaku-Real"],"count":1}],"isReport":false,"parentCommentId":"6a3a56d248f40ebaec7551d3"}},{"id":"6a3adf5ffaabc13f40ed10a1","author":{"_id":"6572580b536d823668dc3d7a","avatarUrl":"/avatars/06e22bfd35a243d3f030138a43b05ab2.svg","fullname":"Iwaku","name":"Iwaku-Real","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-06-23T19:32:47.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@vmlinux I'm sorry? this is a formal paper not a Lemmy circlejerk.","html":"<p><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;vmlinux&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/vmlinux\">@<span class=\"underline\">vmlinux</span></a></span> </span></span> I'm sorry? this is a formal paper not a Lemmy circlejerk.</p>\n","updatedAt":"2026-06-23T19:32:47.536Z","author":{"_id":"6572580b536d823668dc3d7a","avatarUrl":"/avatars/06e22bfd35a243d3f030138a43b05ab2.svg","fullname":"Iwaku","name":"Iwaku-Real","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8299087285995483},"editors":["Iwaku-Real"],"editorAvatarUrls":["/avatars/06e22bfd35a243d3f030138a43b05ab2.svg"],"reactions":[],"isReport":false,"parentCommentId":"6a3a56d248f40ebaec7551d3"}},{"id":"6a3aff59bb292c835dd1ee9a","author":{"_id":"62f8d9ff92e64c61bc689247","avatarUrl":"/avatars/3c3ab1ad0901ab8029f3ecf7b9ad6224.svg","fullname":"Lumpa","name":"volodXYZ","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-23T21:49:13.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"That would be amazing. 27B/122B are the most requested.","html":"<p>That would be amazing. 27B/122B are the most requested.</p>\n","updatedAt":"2026-06-23T21:49:13.412Z","author":{"_id":"62f8d9ff92e64c61bc689247","avatarUrl":"/avatars/3c3ab1ad0901ab8029f3ecf7b9ad6224.svg","fullname":"Lumpa","name":"volodXYZ","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9842773079872131},"editors":["volodXYZ"],"editorAvatarUrls":["/avatars/3c3ab1ad0901ab8029f3ecf7b9ad6224.svg"],"reactions":[],"isReport":false,"parentCommentId":"6a3a56d248f40ebaec7551d3"}},{"id":"6a3b01484aa4cfa31c9f7769","author":{"_id":"630fff3f02ce39336c495fe9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630fff3f02ce39336c495fe9/CZmQtRB4eGVbRBYT3_IH3.png","fullname":"Sam McLeod","name":"smcleod","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":15,"isUserFollowing":false},"createdAt":"2026-06-23T21:57:28.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"100-150B FTW, another 122B-A10B or perhaps an A12B would be so on point.","html":"<p>100-150B FTW, another 122B-A10B or perhaps an A12B would be so on point.</p>\n","updatedAt":"2026-06-23T21:57:28.089Z","author":{"_id":"630fff3f02ce39336c495fe9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630fff3f02ce39336c495fe9/CZmQtRB4eGVbRBYT3_IH3.png","fullname":"Sam McLeod","name":"smcleod","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":15,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9808404445648193},"editors":["smcleod"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/630fff3f02ce39336c495fe9/CZmQtRB4eGVbRBYT3_IH3.png"],"reactions":[],"isReport":false,"parentCommentId":"6a3a56d248f40ebaec7551d3"}},{"id":"6a3b0be017adb69d902333f4","author":{"_id":"63ed19b1679c2cc40ab5d4ac","avatarUrl":"/avatars/32c47bbba61ebd36d709e167781e5aae.svg","fullname":"Mark Dierolf","name":"mdierolf","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-23T22:42:40.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"80B & 120B would be amazing!","html":"<p>80B &amp; 120B would be amazing!</p>\n","updatedAt":"2026-06-23T22:42:40.221Z","author":{"_id":"63ed19b1679c2cc40ab5d4ac","avatarUrl":"/avatars/32c47bbba61ebd36d709e167781e5aae.svg","fullname":"Mark Dierolf","name":"mdierolf","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9781798720359802},"editors":["mdierolf"],"editorAvatarUrls":["/avatars/32c47bbba61ebd36d709e167781e5aae.svg"],"reactions":[],"isReport":false,"parentCommentId":"6a3a56d248f40ebaec7551d3"}},{"id":"6a3b253ca886aadda3d37e98","author":{"_id":"6930ba6df09f7d1c6fafc1d4","avatarUrl":"/avatars/bbadf6f2e36cf6ad58e555445204ee8f.svg","fullname":"Doradus-AI","name":"Doradus-AI","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2026-06-24T00:30:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"387b tooooooo! It's soooo good! \n\nMuch ❤️ for Qwen!","html":"<p>387b tooooooo! It's soooo good! </p>\n<p>Much ❤️ for Qwen!</p>\n","updatedAt":"2026-06-24T00:30:52.251Z","author":{"_id":"6930ba6df09f7d1c6fafc1d4","avatarUrl":"/avatars/bbadf6f2e36cf6ad58e555445204ee8f.svg","fullname":"Doradus-AI","name":"Doradus-AI","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6401125192642212},"editors":["Doradus-AI"],"editorAvatarUrls":["/avatars/bbadf6f2e36cf6ad58e555445204ee8f.svg"],"reactions":[],"isReport":false,"parentCommentId":"6a3a56d248f40ebaec7551d3"}}]},{"id":"6a3a9bd4a6c944c855d49e7e","author":{"_id":"658412f93a84a40185adaf37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/658412f93a84a40185adaf37/FKXH7e1jj09KO1v-B5sER.jpeg","fullname":"Aamer Mihaysi","name":"O96a","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-23T14:44:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"The 'Guess-Refine-Perturb' dynamic in Confident Decoding is a refreshing take on the alignment tax. Most of us just accept that the final layer is the 'truth', but the idea that alignment often manifests as a late-stage perturbation toward generic tokens is a critical insight for anyone trying to squeeze more raw reasoning out of a model. \n\nFrom an engineering perspective, a training-free decoding strategy that uses entropy to pick the layer is a huge win—it's the kind of low-overhead tweak that actually moves the needle on deployability without needing a full retraining cycle. I'm curious to see how this holds up across different model architectures (e.g., MoE vs Dense) where the layer dynamics might differ. Definitely worth testing on local weights to see if we can recover 'lost' capabilities without breaking the safety guardrails.","html":"<p>The 'Guess-Refine-Perturb' dynamic in Confident Decoding is a refreshing take on the alignment tax. Most of us just accept that the final layer is the 'truth', but the idea that alignment often manifests as a late-stage perturbation toward generic tokens is a critical insight for anyone trying to squeeze more raw reasoning out of a model. </p>\n<p>From an engineering perspective, a training-free decoding strategy that uses entropy to pick the layer is a huge win—it's the kind of low-overhead tweak that actually moves the needle on deployability without needing a full retraining cycle. I'm curious to see how this holds up across different model architectures (e.g., MoE vs Dense) where the layer dynamics might differ. Definitely worth testing on local weights to see if we can recover 'lost' capabilities without breaking the safety guardrails.</p>\n","updatedAt":"2026-06-23T14:44:36.246Z","author":{"_id":"658412f93a84a40185adaf37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/658412f93a84a40185adaf37/FKXH7e1jj09KO1v-B5sER.jpeg","fullname":"Aamer Mihaysi","name":"O96a","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9082345366477966},"editors":["O96a"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/658412f93a84a40185adaf37/FKXH7e1jj09KO1v-B5sER.jpeg"],"reactions":[],"isReport":false}},{"id":"6a3aa692a6c944c855d5b3e8","author":{"_id":"65fc5109899083a2aad987c5","avatarUrl":"/avatars/289dbb8128746d931118cff6f6871a45.svg","fullname":"XUANMING ZHANG","name":"XUANMINGZHANG","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false},"createdAt":"2026-06-23T15:30:26.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Great point!","html":"<p>Great point!</p>\n","updatedAt":"2026-06-23T15:30:26.176Z","author":{"_id":"65fc5109899083a2aad987c5","avatarUrl":"/avatars/289dbb8128746d931118cff6f6871a45.svg","fullname":"XUANMING ZHANG","name":"XUANMINGZHANG","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9146357774734497},"editors":["XUANMINGZHANG"],"editorAvatarUrls":["/avatars/289dbb8128746d931118cff6f6871a45.svg"],"reactions":[],"isReport":false}},{"id":"6a3acac72c471e4a366ceecd","author":{"_id":"6a2e5a6e503da5d1348a55c9","avatarUrl":"/avatars/87e1c102718dc267f165d5fe64674139.svg","fullname":"Henry Sherman","name":"ApalisFX","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-23T18:04:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@XUANMINGZHANG, here is how I see it, hope you can convince the management at Alibaba.\n\nRight now Qwen models have the best capacity/ parameters ratio but if you exclusively lock down your model to hosted solutions, almost 99% of Western companies will not use your models because of politics and routed through Chinese servers. \n\nWhy not releasing extremely capable models at 70B dense or 122B MOE to demonstrate your technical superiority then lock the best Opus-class model in your cloud ? This way, Western company employees will be your inside advocates and also boost your company technical profile (hence valuations). \n\nDon't keep people supporting your team and Alibaba in the dark. I think you guys have a real shot at winning this AI war. Western companies like Anthropic, OpenAI or Google are just smoke and mirrors, their models need to be 2x 3x size and highly inefficient to achieve the same capabilities.","html":"<p><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;XUANMINGZHANG&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/XUANMINGZHANG\">@<span class=\"underline\">XUANMINGZHANG</span></a></span> </span></span>, here is how I see it, hope you can convince the management at Alibaba.</p>\n<p>Right now Qwen models have the best capacity/ parameters ratio but if you exclusively lock down your model to hosted solutions, almost 99% of Western companies will not use your models because of politics and routed through Chinese servers. </p>\n<p>Why not releasing extremely capable models at 70B dense or 122B MOE to demonstrate your technical superiority then lock the best Opus-class model in your cloud ? This way, Western company employees will be your inside advocates and also boost your company technical profile (hence valuations). </p>\n<p>Don't keep people supporting your team and Alibaba in the dark. I think you guys have a real shot at winning this AI war. Western companies like Anthropic, OpenAI or Google are just smoke and mirrors, their models need to be 2x 3x size and highly inefficient to achieve the same capabilities.</p>\n","updatedAt":"2026-06-23T18:04:55.559Z","author":{"_id":"6a2e5a6e503da5d1348a55c9","avatarUrl":"/avatars/87e1c102718dc267f165d5fe64674139.svg","fullname":"Henry Sherman","name":"ApalisFX","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9349023699760437},"editors":["ApalisFX"],"editorAvatarUrls":["/avatars/87e1c102718dc267f165d5fe64674139.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.21906","authors":[{"_id":"6a3a1c1bfdcd3514343bb6af","name":"Xuanming Zhang","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b0","name":"Sining Zhoubian","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b1","name":"Yuxuan Chen","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b2","name":"Tianyi Tang","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b3","name":"An Yang","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b4","name":"Sean Du","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b5","user":{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","isPro":false,"fullname":"Chujie Zheng","user":"chujiezheng","type":"user","name":"chujiezheng"},"name":"Chujie Zheng","status":"claimed_verified","statusLastChangedAt":"2026-06-23T13:55:59.389Z","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b6","name":"Fei Huang","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b7","name":"Dayiheng Liu","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b8","name":"Gao Huang","hidden":false},{"_id":"6a3a1c1bfdcd3514343bb6b9","name":"Jingren Zhou","hidden":false}],"publishedAt":"2026-06-20T00:00:00.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding","submittedOnDailyBy":{"_id":"65fc5109899083a2aad987c5","avatarUrl":"/avatars/289dbb8128746d931118cff6f6871a45.svg","isPro":false,"fullname":"XUANMING ZHANG","user":"XUANMINGZHANG","type":"user","name":"XUANMINGZHANG"},"summary":"Autoregressive generation in large language models (LLMs) conventionally decodes from the final layer, assuming that deeper representations yield more reliable next-token predictions. We revisit this assumption by revealing a recurring Guess-Refine-Perturb dynamic: early layers form coarse guesses, intermediate layers refine reasoning-relevant semantics, and final layers can perturb these refined predictions toward generic or alignment-preferred tokens. We introduce Confident Decoding, a training-free decoding strategy that dynamically selects the most reliable near-final layer through entropy-guided conservative backward search. We further provide a theoretical formulation of layer selection as an optimal stopping problem, showing that under bounded projection noise and dominant late-stage alignment perturbation, our search rule filters perturbation while bounding the loss relative to the oracle refinement layer. Experiments across dense and Mixture-of-Experts LLMs demonstrate consistent gains on challenging reasoning benchmarks, including GPQA-Diamond, Omni-MATH, and HLE, with zero memory overhead and less than 2% latency increase. These results suggest dynamically bypassing final-layer perturbations can unlock stronger reasoning behavior from aligned LLMs.","upvotes":16,"discussionId":"6a3a1c1cfdcd3514343bb6ba","projectPage":"https://arxiv.org/pdf/2606.21906","githubRepo":"https://github.com/QwenLM/Confident-Decoding","githubRepoAddedBy":"user","ai_summary":"Autoregressive generation in large language models traditionally uses the final layer for token prediction, but a new decoding strategy dynamically selects more reliable intermediate layers based on entropy-guided search, improving reasoning performance with minimal computational overhead.","ai_keywords":["autoregressive generation","large language models","next-token predictions","Guess-Refine-Perturb dynamic","confident decoding","entropy-guided conservative backward search","optimal stopping problem","layer selection","projection noise","alignment perturbation","reasoning benchmarks","GPQA-Diamond","Omni-MATH","HLE"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":5,"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65fc5109899083a2aad987c5","avatarUrl":"/avatars/289dbb8128746d931118cff6f6871a45.svg","isPro":false,"fullname":"XUANMING ZHANG","user":"XUANMINGZHANG","type":"user"},{"_id":"6621d759c367a8f13d00ad57","avatarUrl":"/avatars/575cedf7249943c021be22638f8e84aa.svg","isPro":false,"fullname":"Sining Zhoubian","user":"SiningZhou","type":"user"},{"_id":"610b70452719facd4ea85e28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/610b70452719facd4ea85e28/S7nMy7D0Rxq0VIVblhYDG.jpeg","isPro":false,"fullname":"Chujie Zheng","user":"chujiezheng","type":"user"},{"_id":"679bc0ec7f3c28bf968321c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/679bc0ec7f3c28bf968321c8/uFqCcPbVjDISww46Jbovf.jpeg","isPro":false,"fullname":"Chenxi Wang","user":"Aurora-cx","type":"user"},{"_id":"68b2a4157f881fc640ba7d80","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/lMTgr3pe7pOHtMe7bVF7F.png","isPro":false,"fullname":"khtsly","user":"khtsly","type":"user"},{"_id":"69666b793e6d462eee0b5177","avatarUrl":"/avatars/cf6145d3f8120901748b1e8365d269b5.svg","isPro":false,"fullname":"GlitchNova","user":"QuantumStackOverflow","type":"user"},{"_id":"67ab63dcd25d4739d84d81f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/YkKB7r6oE4o3in_f_6aP6.png","isPro":false,"fullname":"Emile","user":"EmileWesh","type":"user"},{"_id":"697c8b15a7f796854ef333c4","avatarUrl":"/avatars/94de3a736fac914944f1b57609e3819a.svg","isPro":false,"fullname":"Joel Wang","user":"joelhenwang","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"66d58df54b87a685ccb8e4a0","avatarUrl":"/avatars/2566c20d79088ba761215b9a0197cb8e.svg","isPro":false,"fullname":"Mouxiang Chen","user":"chenmouxiang","type":"user"},{"_id":"69a5cba5ee290d6bb49457b8","avatarUrl":"/avatars/f80c17c13d6baf6bcd375d31efe21116.svg","isPro":true,"fullname":"Darrow O'Lykos","user":"darrowoflykos","type":"user"},{"_id":"64a8121e35fab7cd04c30ed0","avatarUrl":"/avatars/48849b84703158772f1022932331b143.svg","isPro":false,"fullname":"Chenrui Fan","user":"Fcr09","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"64c8b5837fe12ecd0a7e92eb","name":"Qwen","fullname":"Qwen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.21906.md","query":{}}">
Papers
arxiv:2606.21906

Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding

Published on Jun 20
· Submitted by
XUANMING ZHANG
on Jun 23
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Autoregressive generation in large language models traditionally uses the final layer for token prediction, but a new decoding strategy dynamically selects more reliable intermediate layers based on entropy-guided search, improving reasoning performance with minimal computational overhead.

Autoregressive generation in large language models (LLMs) conventionally decodes from the final layer, assuming that deeper representations yield more reliable next-token predictions. We revisit this assumption by revealing a recurring Guess-Refine-Perturb dynamic: early layers form coarse guesses, intermediate layers refine reasoning-relevant semantics, and final layers can perturb these refined predictions toward generic or alignment-preferred tokens. We introduce Confident Decoding, a training-free decoding strategy that dynamically selects the most reliable near-final layer through entropy-guided conservative backward search. We further provide a theoretical formulation of layer selection as an optimal stopping problem, showing that under bounded projection noise and dominant late-stage alignment perturbation, our search rule filters perturbation while bounding the loss relative to the oracle refinement layer. Experiments across dense and Mixture-of-Experts LLMs demonstrate consistent gains on challenging reasoning benchmarks, including GPQA-Diamond, Omni-MATH, and HLE, with zero memory overhead and less than 2% latency increase. These results suggest dynamically bypassing final-layer perturbations can unlock stronger reasoning behavior from aligned LLMs.

Community

💡 Deeper is Not Always Better: Bypassing the "Alignment Tax" in LLMs
Standard practice assumes that the deeper a layer is in an autoregressive LLM, the more accurate its token representation becomes. In our latest collaborative research in Qwen Team, we prove this isn't always true.
Through an information-theoretic analysis of residual streams, we exposed a recurring Guess-Refine-Perturb phase structure in aligned models. While intermediate layers crystallize highly accurate logical and semantic reasoning, dense post-training alignment (e.g. RLHF or DPO) forces low-rank steering perturbations in the final layers. For complex scientific or mathematical problems, this causes an "Alignment Tax"—dragging pristine reasoning back toward generic, hyper-frequent filler words.
To solve this without retraining, we present Confident Decoding:

  • Entropy Valley Tracking: Uses an entropy-guided, conservative backward search to dynamically decode tokens at the peak of model confidence before late-stage steering conflicts arise.
  • Universal Efficacy: Tested across dense and MoE families (Qwen3.5, Gemma-4, gpt-oss), securing massive surges on frontier benchmarks—including up to a +22.4% jump on categorized Omni-MATH Level 4, +9.4% and +6.5% absolute improvement on LiveCodeBench and GPQA-Diamond, respectively.
  • Production Viability: Requires zero modification to the core forward pass or KV Cache. It functions natively inside high-throughput engines like vLLM with less than 2% wall-clock latency overhead.
    Optimizing where to stop internally inside the network opens up an entirely new vertical paradigm for test-time compute (TTC).

Paper: https://arxiv.org/pdf/2606.21906
Project: https://github.com/QwenLM/Confident-Decoding

Qwen3.7-Max/Plus is already live as a closed API — any plans for open-weight releases of the 3.7 family? (like 3.6-35B-A3B / 3.6-27B alongside 3.6-Max)

Would love to run it locally via llama.cpp / GGUF.

·

Absolutely will do.

The 'Guess-Refine-Perturb' dynamic in Confident Decoding is a refreshing take on the alignment tax. Most of us just accept that the final layer is the 'truth', but the idea that alignment often manifests as a late-stage perturbation toward generic tokens is a critical insight for anyone trying to squeeze more raw reasoning out of a model.

From an engineering perspective, a training-free decoding strategy that uses entropy to pick the layer is a huge win—it's the kind of low-overhead tweak that actually moves the needle on deployability without needing a full retraining cycle. I'm curious to see how this holds up across different model architectures (e.g., MoE vs Dense) where the layer dynamics might differ. Definitely worth testing on local weights to see if we can recover 'lost' capabilities without breaking the safety guardrails.

Great point!

@XUANMINGZHANG , here is how I see it, hope you can convince the management at Alibaba.

Right now Qwen models have the best capacity/ parameters ratio but if you exclusively lock down your model to hosted solutions, almost 99% of Western companies will not use your models because of politics and routed through Chinese servers.

Why not releasing extremely capable models at 70B dense or 122B MOE to demonstrate your technical superiority then lock the best Opus-class model in your cloud ? This way, Western company employees will be your inside advocates and also boost your company technical profile (hence valuations).

Don't keep people supporting your team and Alibaba in the dark. I think you guys have a real shot at winning this AI war. Western companies like Anthropic, OpenAI or Google are just smoke and mirrors, their models need to be 2x 3x size and highly inefficient to achieve the same capabilities.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.21906
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.21906 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.21906 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.21906 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers