How many of you tried BeeLlama.cpp? How's it? Agentic coding possible with 8GB VRAM?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
We'll be getting those features(check bottom link) on mainline soon or later anyway. But for now this fork could be useful to see the full potential of our poor GPUs(and also big, large GPUs).
Any 8GB VRAM(and 32GB RAM) folks already doing Agentic coding with models(@ Q4 at least) like Qwen3.6-35B-A3B, Qwen3.6-27B, Gemma-4-31B, Gemma-4-26B-A4B? I would love to see some t/s stats, full commands & more details on that. I'm not expecting any miracle with 8GB VRAM, still want to do something decent with limited constraints. Though I'm getting new rig this month, I want to use my current laptop(8GB VRAM) too for Agentic coding.
Others(who has more than 8GB VRAM), please share your stats, full commands & comparison with mainline.
Below is related thread by creator. Hope the creator adds more features continuously.
[link] [comments]
More from r/LocalLLaMA
-
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)
May 13
-
Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options?
May 13
-
Side Projects.
May 13
-
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.