Speculative Decoding Across Languages
Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.
Computer Science > Computation and Language
Title:Speculative Decoding Across Languages
Abstract:Speculative decoding has become a crucial component of large language model (LLM) inference, enabling faster generation by drafting multiple tokens and verifying them in parallel. However, small draft models tend to suffer from disproportionately poor multilingual capabilities. Thus, when generating text in a non-English language, speculative decoding is far less effective.
We compare three strategies to improve speculative decoding efficiency for eleven languages: finetuning the draft model on task-specific data (translation); finetuning the draft model on unlabeled monolingual corpora; and training simple n-gram draft models on the same monolingual corpora. We evaluate efficiency on translation (from English into the target language) and the held-out task of story generation. We find that while task-specific distillation can significantly improve efficiency, distilled models generalize poorly to a new task. Meanwhile, n-gram draft models, despite lower acceptance rates, consistently provide large speed-ups due to much faster draft generation.
| Comments: | 10 pages, 11 figures, submitted to ACL ARR May 2026 |
| Subjects: | Computation and Language (cs.CL); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.30580 [cs.CL] |
| (or arXiv:2605.30580v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.30580
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — NLP / Computation & Language
-
Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow
Jun 1
-
Exploring Autonomous Agentic Data Engineering for Model Specialization
Jun 1
-
Domain Adaptation and Reasoning Frameworks in Language Models: A Controlled Experiment with Historical Cosmology
Jun 1
-
Cross-Lingual Steering for Figurative Language Generation
Jun 1
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.