r/LocalLLaMA · May 13, 2026 · 1 min read

Efficient pretraining with token superposition by Nous Research

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

submitted by /u/de4dee
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA

24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

May 13
Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options?

May 13
Side Projects.

May 13
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

May 13