We introduce <strong>Lius</strong>, an Indonesian → Kupang Malay translation model designed for low-resource machine translation.</p>\n<p>Kupang Malay is a Malay-based creole spoken in East Nusa Tenggara, Indonesia, but it remains underrepresented in current NLP resources and commercial MT systems. In this work, we propose <strong>Instructional Linguistic</strong>, a linguistically informed instruction design strategy, and <strong>Continual Instruction Tuning (CIT)</strong>, where the model is trained iteratively with multiple instruction types for the same translation target.</p>\n<p>Our approach uses four instruction families: context-based, semantic mapping-based, phonetic-based, and list-group-label-based prompts. We train three Cendol-mT5 variants: small, base, and large. The best model, <strong>Lius-Large-MT</strong>, improves over standard instruction tuning and outperforms several multilingual LLM and NMT baselines on Indonesian → Kupang Malay translation.</p>\n<p>Models are available on Hugging Face:</p>\n<ul>\n<li><a href=\"https://huggingface.co/joanitolopo/lius-cendol-large-inst-mt\">https://huggingface.co/joanitolopo/lius-cendol-large-inst-mt</a></li>\n<li><a href=\"https://huggingface.co/joanitolopo/lius-cendol-base-inst-mt\">https://huggingface.co/joanitolopo/lius-cendol-base-inst-mt</a></li>\n<li><a href=\"https://huggingface.co/joanitolopo/lius-cendol-small-inst-mt\">https://huggingface.co/joanitolopo/lius-cendol-small-inst-mt</a></li>\n</ul>\n<p>Code:<br><a href=\"https://github.com/joanitolopo/instructional-linguistic-llm\" rel=\"nofollow\">https://github.com/joanitolopo/instructional-linguistic-llm</a></p>\n","updatedAt":"2026-06-11T02:11:19.470Z","author":{"_id":"61728a033edf4cc38a81237a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652231681579-61728a033edf4cc38a81237a.jpeg","fullname":"Joanito Agili Lopo","name":"joanitolopo","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7891653180122375},"editors":["joanitolopo"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1652231681579-61728a033edf4cc38a81237a.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.11786","authors":[{"_id":"6a2a161280a9c7c6830c0e71","name":"Joanito Agili Lopo","hidden":false},{"_id":"6a2a161280a9c7c6830c0e72","name":"Yunita Sari","hidden":false},{"_id":"6a2a161280a9c7c6830c0e73","name":"Guntur Budi Herwanto","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay","submittedOnDailyBy":{"_id":"61728a033edf4cc38a81237a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652231681579-61728a033edf4cc38a81237a.jpeg","isPro":false,"fullname":"Joanito Agili Lopo","user":"joanitolopo","type":"user","name":"joanitolopo"},"summary":"Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.","upvotes":2,"discussionId":"6a2a161380a9c7c6830c0e74","projectPage":"https://huggingface.co/joanitolopo/lius-cendol-large-inst-mt","ai_summary":"Continual Instruction Tuning enables effective fine-tuning of large language models for low-resource language translation, achieving superior performance compared to standard instruction tuning and multilingual models.","ai_keywords":["large language models","low-resource languages","fine-tuning","bilingual dictionary","Continual Instruction Tuning","neural machine translation","instruction tuning"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"66461182240d8da56f4a5aa1","name":"haimmifenapah","fullname":"haim","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61728a033edf4cc38a81237a/9vof6bFyPFIAl7Mm2J2Qe.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"61728a033edf4cc38a81237a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652231681579-61728a033edf4cc38a81237a.jpeg","isPro":false,"fullname":"Joanito Agili Lopo","user":"joanitolopo","type":"user"},{"_id":"6a2ae6c2e36bc84d91b6e7cc","avatarUrl":"/avatars/abf4b4c0020f9332b6827952cc53163e.svg","isPro":false,"fullname":"mmgood","user":"mmgood","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"66461182240d8da56f4a5aa1","name":"haimmifenapah","fullname":"haim","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61728a033edf4cc38a81237a/9vof6bFyPFIAl7Mm2J2Qe.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.11786.md"}">
Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay
Abstract
Continual Instruction Tuning enables effective fine-tuning of large language models for low-resource language translation, achieving superior performance compared to standard instruction tuning and multilingual models.
Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.
Community
We introduce Lius, an Indonesian → Kupang Malay translation model designed for low-resource machine translation.
Kupang Malay is a Malay-based creole spoken in East Nusa Tenggara, Indonesia, but it remains underrepresented in current NLP resources and commercial MT systems. In this work, we propose Instructional Linguistic, a linguistically informed instruction design strategy, and Continual Instruction Tuning (CIT), where the model is trained iteratively with multiple instruction types for the same translation target.
Our approach uses four instruction families: context-based, semantic mapping-based, phonetic-based, and list-group-label-based prompts. We train three Cendol-mT5 variants: small, base, and large. The best model, Lius-Large-MT, improves over standard instruction tuning and outperforms several multilingual LLM and NMT baselines on Indonesian → Kupang Malay translation.
Models are available on Hugging Face:
Code:
https://github.com/joanitolopo/instructional-linguistic-llm
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.11786 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.11786 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.