r/LocalLLaMA · May 27, 2026 · 1 min read

Is there any use case for large models with very slow token output for batch processing?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Maybe I'm influenced by the sci-fi story "The Last Question" by Issac Assimov but I've always got a tickle imagining a huge model like Kimi running on, say, disk. Even if it is 0.001 tok/sec to ask complex questions and get an answer in a week

Is there any use or community focused on this?

submitted by /u/Last_Bad_2687
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA