r/LocalLLaMA · · 1 min read

Is there any use case for large models with very slow token output for batch processing?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Maybe I'm influenced by the sci-fi story "The Last Question" by Issac Assimov but I've always got a tickle imagining a huge model like Kimi running on, say, disk. Even if it is 0.001 tok/sec to ask complex questions and get an answer in a week

Is there any use or community focused on this?

submitted by /u/Last_Bad_2687
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA