This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.03616\">The Format Tax</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.06165\">Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.06794\">GCoT-Decoding: Unlocking Deep Reasoning Paths for Universal Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.20924\">Strategy-Induct: Task-Level Strategy Induction for Instruction Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11435\">Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10135\">Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.06066\">From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:42:00.602Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7493101954460144},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.07525","authors":[{"_id":"6a1855fd9245bafffe463c8b","user":{"_id":"68baee3599cbe4b9ca38dcf8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/KPs04b2t7bE1abrIKWvJM.png","isPro":false,"fullname":"Ngoc Trinh Hung NGUYEN","user":"ng-hung","type":"user","name":"ng-hung"},"name":"Ngoc Trinh Hung Nguyen","status":"claimed_verified","statusLastChangedAt":"2026-05-28T15:08:51.715Z","hidden":false},{"_id":"6a1855fd9245bafffe463c8c","name":"Alonso Silva","hidden":false},{"_id":"6a1855fd9245bafffe463c8d","name":"Laith Zumot","hidden":false},{"_id":"6a1855fd9245bafffe463c8e","name":"Liubov Tupikina","hidden":false},{"_id":"6a1855fd9245bafffe463c8f","name":"Armen Aghasaryan","hidden":false},{"_id":"6a1855fd9245bafffe463c90","name":"Mehwish Alam","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Thinking Before Constraining: A Unified Decoding Framework for Large Language Models","submittedOnDailyBy":{"_id":"68baee3599cbe4b9ca38dcf8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/KPs04b2t7bE1abrIKWvJM.png","isPro":false,"fullname":"Ngoc Trinh Hung NGUYEN","user":"ng-hung","type":"user","name":"ng-hung"},"summary":"Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.","upvotes":3,"discussionId":"6a1855fd9245bafffe463c91","githubRepo":"https://github.com/Nokia-Bell-Labs/InWriting","githubRepoAddedBy":"user","ai_summary":"A hybrid approach called In-Writing is proposed that combines free-form reasoning with structured generation by delaying constraint application until after a trigger token is generated, improving accuracy in classification and reasoning tasks.","ai_keywords":["Large Language Models","constrained decoding","free-form reasoning","structured generation","trigger token","premature triggering"],"githubStars":0,"organization":{"_id":"615c1b195c737d82835f53d8","name":"Nokia","fullname":"Nokia","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1633426326086-615c1ad43a60fa8486f80632.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62c4244ab9045dabfc4b2cc0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62c4244ab9045dabfc4b2cc0/jRFsYSfqUBKqEEr0ZgcS7.jpeg","isPro":false,"fullname":"Alonso Silva Allende","user":"alonsosilva","type":"user"},{"_id":"68baee3599cbe4b9ca38dcf8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/KPs04b2t7bE1abrIKWvJM.png","isPro":false,"fullname":"Ngoc Trinh Hung NGUYEN","user":"ng-hung","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"615c1b195c737d82835f53d8","name":"Nokia","fullname":"Nokia","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1633426326086-615c1ad43a60fa8486f80632.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2601/2601.07525.md"}">
Thinking Before Constraining: A Unified Decoding Framework for Large Language Models
Abstract
A hybrid approach called In-Writing is proposed that combines free-form reasoning with structured generation by delaying constraint application until after a trigger token is generated, improving accuracy in classification and reasoning tasks.
AI-generated summary
Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.
Community
This comment has been hidden This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2601.07525 in a model README.md to link it from this page.
Cite arxiv.org/abs/2601.07525 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2601.07525 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.