Is there any use case for large models with very slow token output for batch processing?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Maybe I'm influenced by the sci-fi story "The Last Question" by Issac Assimov but I've always got a tickle imagining a huge model like Kimi running on, say, disk. Even if it is 0.001 tok/sec to ask complex questions and get an answer in a week
Is there any use or community focused on this?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.