Hugging Face Daily Papers · · 3 min read

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.</p>\n","updatedAt":"2026-06-02T13:40:28.913Z","author":{"_id":"644495b81bc692d87b28ed62","avatarUrl":"/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg","fullname":"Ahan Chatterjee","name":"ahan2000","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9111011028289795},"editors":["ahan2000"],"editorAvatarUrls":["/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.09156","authors":[{"_id":"6a1e0bef808ddbc3c7d43b5f","user":{"_id":"644495b81bc692d87b28ed62","avatarUrl":"/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg","isPro":false,"fullname":"Ahan Chatterjee","user":"ahan2000","type":"user","name":"ahan2000"},"name":"Ahan Chatterjee","status":"claimed_verified","statusLastChangedAt":"2026-06-02T12:12:26.245Z","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b60","name":"Matthias Schöffel","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b61","name":"Matthias Aßenmacher","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b62","name":"Marinus Wiedner","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b63","name":"Esteban Garces Arias","hidden":false}],"publishedAt":"2026-05-26T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan","submittedOnDailyBy":{"_id":"644495b81bc692d87b28ed62","avatarUrl":"/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg","isPro":false,"fullname":"Ahan Chatterjee","user":"ahan2000","type":"user","name":"ahan2000"},"summary":"The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine) in most Romance languages. In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available at https://github.com/ahan-2000/Lost-in-Translation-{https://github.com/ahan-2000/Lost-in-Translation-}.","upvotes":1,"discussionId":"6a1e0bef808ddbc3c7d43b64","ai_summary":"A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.","ai_keywords":["tokenization","deep learning framework","morphological features","part-of-speech categories","gender prediction","lexical level","contextual level"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"62e50495ae9d3f10acb6a9ca","name":"LMU","fullname":"Ludwig Maximilian University of Munich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659176121442-5fcaabed246881afd5b00167.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":false,"fullname":"Urro","user":"urroxyz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"62e50495ae9d3f10acb6a9ca","name":"LMU","fullname":"Ludwig Maximilian University of Munich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659176121442-5fcaabed246881afd5b00167.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.09156.md"}">
Papers
arxiv:2605.09156

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Published on May 26
· Submitted by
Ahan Chatterjee
on Jun 2
Authors:
,
,
,

Abstract

A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.

The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine) in most Romance languages. In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available at https://github.com/ahan-2000/Lost-in-Translation-{https://github.com/ahan-2000/Lost-in-Translation-}.

Community

Paper author Paper submitter about 12 hours ago

A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.09156
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09156 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09156 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09156 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers