r/LocalLLaMA · June 6, 2026 · 1 min read

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

#model-release #paper #inference

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Up to 5.8x throughput speedup on Qwen3

Paper : https://arxiv.org/abs/2605.29707
Code : https://github.com/jianuo-huang/Domino
Models : https://huggingface.co/Huang2020

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA