FYI llamacpp server can hot swap models now-a-days in under 30sec
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| See this question at least a handful of times when browsing new and in the comments, llamacpp has one of the cleaner model hotswap apis now that just works with openwebui and hermes. Bonus: the 2nd model gemma went derp as i was recording this, but the time spent swapping has gotten stupid fast... I remember starting a load and talking a walk while pytorch did its thing just a few months back [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.