Hugging Face Daily Papers · June 9, 2026 · 4 min read

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We propose a pipeline for automatically generating step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), on Xibe and Chintang as test cases.</p>\n","updatedAt":"2026-06-09T11:10:27.725Z","author":{"_id":"617a92e16f37340367d5d791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png","fullname":"Shaoxiong","name":"jisx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8116663694381714},"editors":["jisx"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03782","authors":[{"_id":"6a2172523490a593e87b0eca","name":"Renhao Pei","hidden":false},{"_id":"6a2172523490a593e87b0ecb","name":"Yihong Liu","hidden":false},{"_id":"6a2172523490a593e87b0ecc","name":"Sampo Pyysalo","hidden":false},{"_id":"6a2172523490a593e87b0ecd","name":"Hinrich Schütze","hidden":false},{"_id":"6a2172523490a593e87b0ece","name":"Shaoxiong Ji","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?","submittedOnDailyBy":{"_id":"617a92e16f37340367d5d791","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/omgyzmaF90KBLa3YgFxhS.png","isPro":false,"fullname":"Shaoxiong","user":"jisx","type":"user","name":"jisx"},"summary":"Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress in chain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline for automatically generating step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), on Xibe and Chintang as test cases. Our results show that linguistic reasoning traces are most effective as inference-time guidance: in ICL, reliable sentence-specific traces substantially improve translation performance across most models, languages, and metrics. In contrast, using the linguistic reasoning traces as training data yields smaller and less consistent gains, as models learn the trace format but often generate erroneous content. These findings suggest that LLMs can leverage grammatical information for low-resource MT when given reliable linguistic analyses, while learning to generate such analyses remains a major bottleneck.","upvotes":0,"discussionId":"6a2172533490a593e87b0ecf","ai_summary":"Large language models can improve translation for low-resource languages through structured linguistic reasoning traces, with the most significant benefits occurring during inference rather than training.","ai_keywords":["large language models","machine translation","in-context learning","supervised fine-tuning","reinforcement fine-tuning","Universal Dependencies treebanks","linguistic reasoning traces","low-resource languages","chain-of-thought reasoning"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"6a1ffda6310029dbc8ac9adc","name":"OLAResearchX","fullname":"Omni Language AI Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/dMU_WZq7clwnvUTMragYe.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"6a1ffda6310029dbc8ac9adc","name":"OLAResearchX","fullname":"Omni Language AI Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/617a92e16f37340367d5d791/dMU_WZq7clwnvUTMragYe.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03782.md"}">

Papers

arxiv:2606.03782

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Published on Jun 2

· Submitted by

Shaoxiong on Jun 9

Omni Language AI Research

Upvote

Authors:

Abstract

Large language models can improve translation for low-resource languages through structured linguistic reasoning traces, with the most significant benefits occurring during inference rather than training.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress in chain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline for automatically generating step-by-step linguistic reasoning traces from Universal Dependencies treebanks, dictionaries, and grammar-rule banks. We evaluate these traces in three settings: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), on Xibe and Chintang as test cases. Our results show that linguistic reasoning traces are most effective as inference-time guidance: in ICL, reliable sentence-specific traces substantially improve translation performance across most models, languages, and metrics. In contrast, using the linguistic reasoning traces as training data yields smaller and less consistent gains, as models learn the trace format but often generate erroneous content. These findings suggest that LLMs can leverage grammatical information for low-resource MT when given reliable linguistic analyses, while learning to generate such analyses remains a major bottleneck.

View arXiv page View PDF Add to collection

Community

jisx

Paper submitter about 8 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.03782

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.03782 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.03782 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers