I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| I've been comparing GPU/LLM providers for a side project and ended up with way too many browser tabs and spreadsheets. So I decided to pull the public pricing data into one sheet and compare it side by side. A quick disclaimer: this is not benchmark data. I didn't run latency tests or throughput measurements. Everything comes from public pricing pages and APIs (OpenRouter, DeepSeek, Together AI, Fireworks, Groq, etc.). The spreadsheet currently tracks:
The thing that surprised me most was caching. For example, when looking at DeepSeek V4 Pro pricing across providers, cached input costs vary dramatically. In some cases a cache hit is tens of times cheaper than a cache miss. That made me realize that if you're running:
...the "headline" token price can be a lot less important than the caching policy. A few other interesting things I noticed:
A few things I haven't figured out how to compare yet:
I'm curious how others evaluate providers. When you're choosing between OpenRouter, Together, Fireworks, Groq, DeepSeek, etc., what metrics actually matter to you beyond token pricing? Am I missing any important data points that should be included in a v2? [link] [comments] |
More from r/MachineLearning
-
Loss functions in Instance Representation Learning [R]
Jun 29
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.