A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.</p>\n","updatedAt":"2026-06-02T13:40:28.913Z","author":{"_id":"644495b81bc692d87b28ed62","avatarUrl":"/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg","fullname":"Ahan Chatterjee","name":"ahan2000","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9111011028289795},"editors":["ahan2000"],"editorAvatarUrls":["/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.09156","authors":[{"_id":"6a1e0bef808ddbc3c7d43b5f","user":{"_id":"644495b81bc692d87b28ed62","avatarUrl":"/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg","isPro":false,"fullname":"Ahan Chatterjee","user":"ahan2000","type":"user","name":"ahan2000"},"name":"Ahan Chatterjee","status":"claimed_verified","statusLastChangedAt":"2026-06-02T12:12:26.245Z","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b60","name":"Matthias Schöffel","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b61","name":"Matthias Aßenmacher","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b62","name":"Marinus Wiedner","hidden":false},{"_id":"6a1e0bef808ddbc3c7d43b63","name":"Esteban Garces Arias","hidden":false}],"publishedAt":"2026-05-26T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan","submittedOnDailyBy":{"_id":"644495b81bc692d87b28ed62","avatarUrl":"/avatars/7174a79974cd7d72e85ee8e8f8afeb76.svg","isPro":false,"fullname":"Ahan Chatterjee","user":"ahan2000","type":"user","name":"ahan2000"},"summary":"The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine) in most Romance languages. In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available at https://github.com/ahan-2000/Lost-in-Translation-{https://github.com/ahan-2000/Lost-in-Translation-}.","upvotes":1,"discussionId":"6a1e0bef808ddbc3c7d43b64","ai_summary":"A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.","ai_keywords":["tokenization","deep learning framework","morphological features","part-of-speech categories","gender prediction","lexical level","contextual level"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"62e50495ae9d3f10acb6a9ca","name":"LMU","fullname":"Ludwig Maximilian University of Munich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659176121442-5fcaabed246881afd5b00167.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":false,"fullname":"Urro","user":"urroxyz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"62e50495ae9d3f10acb6a9ca","name":"LMU","fullname":"Ludwig Maximilian University of Munich","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1659176121442-5fcaabed246881afd5b00167.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.09156.md"}">
Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan
Abstract
A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.
The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine) in most Romance languages. In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available at https://github.com/ahan-2000/Lost-in-Translation-{https://github.com/ahan-2000/Lost-in-Translation-}.
Community
A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.09156 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.09156 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.09156 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.