Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay
Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.
Computer Science > Computation and Language
Title:Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay
Abstract:Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.
| Comments: | This paper is the result of the Master Thesis in Master of Artificial Intelligence at Universitas Gadjah Mada |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2606.11786 [cs.CL] |
| (or arXiv:2606.11786v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2606.11786
arXiv-issued DOI via DataCite (pending registration)
|
Submission history
From: Joanito Agili Lopo [view email][v1] Wed, 10 Jun 2026 08:20:09 UTC (1,809 KB)
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — NLP / Computation & Language
-
EDEN: A Large-Scale Corpus of Clinical Notes for Italian
Jun 12
-
Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures
Jun 12
-
MARD: Mirror-Augmented Reasoning Distillation for Mechanism-Level Drug-Drug Interaction Prediction
Jun 12
-
Constrained Semantic Decompression in LLMs through Persian Proverb-Conditioned Story Generation
Jun 12
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.