Hugging Face Daily Papers · June 16, 2026 · 3 min read

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Framework that compresses the complex reasoning capabilities of large, resource-heavy models into a structured, highly expressive System Prompt for smaller models.</p>\n","updatedAt":"2026-06-16T10:29:36.613Z","author":{"_id":"687b0a88a565d4670d30dd44","avatarUrl":"/avatars/fe181c08304363e98ff07f8fa459a46d.svg","fullname":"Sanket B","name":"sanketbadhe","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8876485824584961},"editors":["sanketbadhe"],"editorAvatarUrls":["/avatars/fe181c08304363e98ff07f8fa459a46d.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.21103","authors":[{"_id":"6a2c99b8a0d4daae4285f0b4","user":{"_id":"687b0a88a565d4670d30dd44","avatarUrl":"/avatars/fe181c08304363e98ff07f8fa459a46d.svg","isPro":false,"fullname":"Sanket B","user":"sanketbadhe","type":"user","name":"sanketbadhe"},"name":"Sanket Badhe","status":"claimed_verified","statusLastChangedAt":"2026-06-15T12:21:37.193Z","hidden":false},{"_id":"6a2c99b8a0d4daae4285f0b5","name":"Deep Shah","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-16T00:00:00.000Z","title":"Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning","submittedOnDailyBy":{"_id":"687b0a88a565d4670d30dd44","avatarUrl":"/avatars/fe181c08304363e98ff07f8fa459a46d.svg","isPro":false,"fullname":"Sanket B","user":"sanketbadhe","type":"user","name":"sanketbadhe"},"summary":"Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57\\% to 90.0\\%) and Contract-NLI (67\\% to 83\\%), while increasing LogiQA accuracy to 70\\%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.","upvotes":1,"discussionId":"6a2c99b8a0d4daae4285f0b6","ai_summary":"Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency.","ai_keywords":["Chain-of-Thought prompting","fine-tuning","Prompt-Level Distillation","teacher model","student model","System Prompt","Macro F1 scores","LogiQA","cross-architecture generalizability"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62cfe10d6a61a88ea0cf88bf","avatarUrl":"/avatars/50a3a270f91a45d92e454d350e6f9af4.svg","isPro":false,"fullname":"Marco Togni","user":"KRLLRZz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2602/2602.21103.md","query":{}}">

Papers

arxiv:2602.21103

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Published on Jun 2

· Submitted by

Sanket B on Jun 16

Google

Upvote

Authors:

Sanket Badhe ,

Abstract

Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57\% to 90.0\%) and Contract-NLI (67\% to 83\%), while increasing LogiQA accuracy to 70\%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.

View arXiv page View PDF Add to collection

Community

sanketbadhe

Paper author Paper submitter about 3 hours ago

Framework that compresses the complex reasoning capabilities of large, resource-heavy models into a structured, highly expressive System Prompt for smaller models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2602.21103

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.21103 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.21103 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.21103 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers