What is your current go-to stack for running a fully local AI agent?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Curious to know what quantization level (GGUF/EXL2) you find balances speed and smarts for daily use.
[link] [comments]
More from r/LocalLLaMA
-
Unsloth just dropped MTP GGUF weights for Gemma 4!
Jun 5
-
FYI llamacpp server can hot swap models now-a-days in under 30sec
Jun 5
-
I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising!
Jun 5
-
Suggestion - this sub should have post flairs that mention the amount of vram/unified ram
Jun 5
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.