r/LocalLLaMA · May 23, 2026 · 1 min read

Inference provider tiers by Cache-hit rates, using openrouter data

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Like Read original ↗

Inference provider tiers by Cache-hit rates, using openrouter data

submitted by /u/Comfortable-Rock-498
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA

Does GPU spacing matter if we’re undervolting anyways?

May 23
Run Chrome’s tiny Gemma4 (aka Gemini Nano) directly on PC without GPU

May 23
Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else

May 23
Any reason to run dense over MOE for RAGs?

May 23