Hugging Face Daily Papers · June 12, 2026 · 3 min read

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Project page: <a href=\"https://zju-xyc.github.io/VIA-SD-Project-Page/\" rel=\"nofollow\">https://zju-xyc.github.io/VIA-SD-Project-Page/</a></p>\n","updatedAt":"2026-06-12T08:46:33.535Z","author":{"_id":"646c77911ee398a4e9404b8b","avatarUrl":"/avatars/05d1ea421dd4f3e2fd47cbe99fc52933.svg","fullname":"Yunqiu Xu","name":"Yunqiu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.32255589962005615},"editors":["Yunqiu"],"editorAvatarUrls":["/avatars/05d1ea421dd4f3e2fd47cbe99fc52933.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.12243","authors":[{"_id":"6a2bc4b7ca6c5360cc7cfa93","name":"Yuchen Xian","hidden":false},{"_id":"6a2bc4b7ca6c5360cc7cfa94","name":"Yang He","hidden":false},{"_id":"6a2bc4b7ca6c5360cc7cfa95","name":"Yunqiu Xu","hidden":false},{"_id":"6a2bc4b7ca6c5360cc7cfa96","name":"Yi Yang","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-12T00:00:00.000Z","title":"VIA-SD: Verification via Intra-Model Routing for Speculative Decoding","submittedOnDailyBy":{"_id":"646c77911ee398a4e9404b8b","avatarUrl":"/avatars/05d1ea421dd4f3e2fd47cbe99fc52933.svg","isPro":false,"fullname":"Yunqiu Xu","user":"Yunqiu","type":"user","name":"Yunqiu"},"summary":"Speculative decoding (SD) addresses the high inference costs of LLMs by having lightweight drafters generate candidates for large verifiers to validate in parallel. Existing draft-verify methods use binary decisions: accept or fully recompute. Yet we find that many rejected tokens can be verified correctly by a slim submodel derived from the full verifier via intra-model routing, instead of the full verifier. This motivates our slim-verifier to handle tokens requiring moderate verification resources, reducing expensive large-model calls. We propose Verification via Intra-Model Routing for Speculative Decoding (VIA-SD), a multi-tier framework using a routed slim-verifier. Draft tokens are processed hierarchically: direct acceptance for high-confidence cases, slim-verifier regeneration for medium-confidence cases, and full-model verification for uncertain cases. Across four representative tasks and multiple model families, VIA-SD reduces rejection rates by 0.10-0.22 and delivers 10-20% speedups over strong SD baselines, while achieving 2.5-3x acceleration over non-drafting decoding. Moreover, VIA-SD is compatible with existing SD frameworks without modifying their training procedures. Our results suggest multi-tier SD as a general paradigm for scalable and efficient LLM inference. Project page: https://zju-xyc.github.io/VIA-SD-Project-Page/","upvotes":10,"discussionId":"6a2bc4b7ca6c5360cc7cfa97","projectPage":"https://zju-xyc.github.io/VIA-SD-Project-Page/","ai_summary":"VIA-SD introduces a multi-tier speculative decoding framework that uses intra-model routing to reduce verification costs by employing slim submodels for medium-confidence token validation, achieving significant speedups over traditional approaches.","ai_keywords":["speculative decoding","drafters","verifiers","intra-model routing","slim-verifier","hierarchical processing","token rejection rates","speedup","multi-tier framework"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"646c77911ee398a4e9404b8b","avatarUrl":"/avatars/05d1ea421dd4f3e2fd47cbe99fc52933.svg","isPro":false,"fullname":"Yunqiu Xu","user":"Yunqiu","type":"user"},{"_id":"6640c647acae6bb179eedff5","avatarUrl":"/avatars/bcaafaaa1d4b4c241d72a886401772e3.svg","isPro":false,"fullname":"Yuetong Liu","user":"YuetongLiu","type":"user"},{"_id":"67b82b97701171142716c63c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/3nnH8ryAm8BFwV0F17I2S.png","isPro":false,"fullname":"Yi_Zhou","user":"YiZhou123","type":"user"},{"_id":"68306b65bf7369913aab8b36","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/65nIvbHPcUr1k2j301Wa0.png","isPro":false,"fullname":"Lee","user":"Julien0123","type":"user"},{"_id":"6724e8dff0e98abd46c6319f","avatarUrl":"/avatars/9ac7018cd00dd44490e079b511413d02.svg","isPro":false,"fullname":"Yunze Wang","user":"unimodular","type":"user"},{"_id":"69b116035e5605678c754787","avatarUrl":"/avatars/9e8d7b89fbb74f8ce186d2d52e399753.svg","isPro":false,"fullname":"as","user":"WZLWLM0913","type":"user"},{"_id":"65e59ecc63fb03ea72ef974b","avatarUrl":"/avatars/fad6bb396a9c35d3091f350842dfd418.svg","isPro":false,"fullname":"Liulei Li","user":"LLL-UTS","type":"user"},{"_id":"68e6d92f84c5321f09fcef0f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/4tg4vDNQXbQ2gvP7c_53E.png","isPro":false,"fullname":"Vera","user":"Vera0103","type":"user"},{"_id":"68301c11b80bb1d6863d2f42","avatarUrl":"/avatars/eea4d0756c5a3bd2f792629a41a26f45.svg","isPro":false,"fullname":"Wendong Huang","user":"donglongzi1","type":"user"},{"_id":"67f4f412504263bce12cb140","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Oe5Rnme38v5yAruD0vOmG.png","isPro":false,"fullname":"ding","user":"Yixuan-Ding-ZJU","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"query":{}}">

Papers

arxiv:2606.12243

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Published on Jun 10

· Submitted by

Yunqiu Xu on Jun 12

Upvote

Authors:

Abstract

VIA-SD introduces a multi-tier speculative decoding framework that uses intra-model routing to reduce verification costs by employing slim submodels for medium-confidence token validation, achieving significant speedups over traditional approaches.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Speculative decoding (SD) addresses the high inference costs of LLMs by having lightweight drafters generate candidates for large verifiers to validate in parallel. Existing draft-verify methods use binary decisions: accept or fully recompute. Yet we find that many rejected tokens can be verified correctly by a slim submodel derived from the full verifier via intra-model routing, instead of the full verifier. This motivates our slim-verifier to handle tokens requiring moderate verification resources, reducing expensive large-model calls. We propose Verification via Intra-Model Routing for Speculative Decoding (VIA-SD), a multi-tier framework using a routed slim-verifier. Draft tokens are processed hierarchically: direct acceptance for high-confidence cases, slim-verifier regeneration for medium-confidence cases, and full-model verification for uncertain cases. Across four representative tasks and multiple model families, VIA-SD reduces rejection rates by 0.10-0.22 and delivers 10-20% speedups over strong SD baselines, while achieving 2.5-3x acceleration over non-drafting decoding. Moreover, VIA-SD is compatible with existing SD frameworks without modifying their training procedures. Our results suggest multi-tier SD as a general paradigm for scalable and efficient LLM inference. Project page: https://zju-xyc.github.io/VIA-SD-Project-Page/

View arXiv page View PDF Project page Add to collection

Community

Yunqiu

Paper submitter about 1 hour ago

Project page: https://zju-xyc.github.io/VIA-SD-Project-Page/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.12243 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.12243 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.12243 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers