Hugging Face Daily Papers · · 3 min read

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Framework that compresses the complex reasoning capabilities of large, resource-heavy models into a structured, highly expressive System Prompt for smaller models.</p>\n","updatedAt":"2026-06-16T10:29:36.613Z","author":{"_id":"687b0a88a565d4670d30dd44","avatarUrl":"/avatars/fe181c08304363e98ff07f8fa459a46d.svg","fullname":"Sanket B","name":"sanketbadhe","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8876485824584961},"editors":["sanketbadhe"],"editorAvatarUrls":["/avatars/fe181c08304363e98ff07f8fa459a46d.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.21103","authors":[{"_id":"6a2c99b8a0d4daae4285f0b4","user":{"_id":"687b0a88a565d4670d30dd44","avatarUrl":"/avatars/fe181c08304363e98ff07f8fa459a46d.svg","isPro":false,"fullname":"Sanket B","user":"sanketbadhe","type":"user","name":"sanketbadhe"},"name":"Sanket Badhe","status":"claimed_verified","statusLastChangedAt":"2026-06-15T12:21:37.193Z","hidden":false},{"_id":"6a2c99b8a0d4daae4285f0b5","name":"Deep Shah","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-16T00:00:00.000Z","title":"Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning","submittedOnDailyBy":{"_id":"687b0a88a565d4670d30dd44","avatarUrl":"/avatars/fe181c08304363e98ff07f8fa459a46d.svg","isPro":false,"fullname":"Sanket B","user":"sanketbadhe","type":"user","name":"sanketbadhe"},"summary":"Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57\\% to 90.0\\%) and Contract-NLI (67\\% to 83\\%), while increasing LogiQA accuracy to 70\\%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.","upvotes":1,"discussionId":"6a2c99b8a0d4daae4285f0b6","ai_summary":"Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency.","ai_keywords":["Chain-of-Thought prompting","fine-tuning","Prompt-Level Distillation","teacher model","student model","System Prompt","Macro F1 scores","LogiQA","cross-architecture generalizability"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62cfe10d6a61a88ea0cf88bf","avatarUrl":"/avatars/50a3a270f91a45d92e454d350e6f9af4.svg","isPro":false,"fullname":"Marco Togni","user":"KRLLRZz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5e6aca39878b8b2bf9806447","name":"google","fullname":"Google","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/WtA3YYitedOr9n02eHfJe.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2602/2602.21103.md","query":{}}">
Papers
arxiv:2602.21103

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Published on Jun 2
· Submitted by
Sanket B
on Jun 16
Authors:

Abstract

Prompt-Level Distillation extracts reasoning patterns from teacher models to enhance student model performance while maintaining interpretability and reducing latency.

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57\% to 90.0\%) and Contract-NLI (67\% to 83\%), while increasing LogiQA accuracy to 70\%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.

Community

Paper author Paper submitter about 3 hours ago

Framework that compresses the complex reasoning capabilities of large, resource-heavy models into a structured, highly expressive System Prompt for smaller models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2602.21103
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.21103 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.21103 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.21103 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers