Inference provider tiers by Cache-hit rates, using openrouter data
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| submitted by /u/Comfortable-Rock-498 [link] [comments] |
More from r/LocalLLaMA
-
Does GPU spacing matter if we’re undervolting anyways?
May 23
-
Run Chrome’s tiny Gemma4 (aka Gemini Nano) directly on PC without GPU
May 23
-
Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else
May 23
-
Any reason to run dense over MOE for RAGs?
May 23
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.