r/MachineLearning · June 24, 2026 · 1 min read

I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]

#benchmark #pricing #gpu

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Like Read original ↗

I compiled LLM inference pricing across 7 providers — the caching numbers are surprising(spreadsheet included) [R]

I've been comparing GPU/LLM providers for a side project and ended up with way too many browser tabs and spreadsheets.

So I decided to pull the public pricing data into one sheet and compare it side by side.

A quick disclaimer: this is not benchmark data. I didn't run latency tests or throughput measurements. Everything comes from public pricing pages and APIs (OpenRouter, DeepSeek, Together AI, Fireworks, Groq, etc.).

The spreadsheet currently tracks:

Input/output token pricing
Context windows
Cached input pricing (where available)
Supported models
Provider-specific pricing differences

The thing that surprised me most was caching.

For example, when looking at DeepSeek V4 Pro pricing across providers, cached input costs vary dramatically. In some cases a cache hit is tens of times cheaper than a cache miss.

That made me realize that if you're running:

Agents with large system prompts
RAG pipelines with reusable context
Multi-turn conversations
Repeated prompt templates

...the "headline" token price can be a lot less important than the caching policy.

A few other interesting things I noticed:

The same model can vary by multiple times in cost depending on provider.
Some providers expose caching clearly, while others barely document it.
Model availability and context windows aren't always consistent across providers.
It's surprisingly hard to find all of this information in one place.

A few things I haven't figured out how to compare yet:

Real throughput (tokens/sec)
Cold-start / queue times
Whether providers are serving FP16, FP8, quantized variants, etc.
Egress/network costs
Reliability/uptime

I'm curious how others evaluate providers.

When you're choosing between OpenRouter, Together, Fireworks, Groq, DeepSeek, etc., what metrics actually matter to you beyond token pricing?

https://preview.redd.it/4vj50mvhu79h1.png?width=1615&format=png&auto=webp&s=6c6c084927f83bfdadb5ed8e4378f520a1da6766

Am I missing any important data points that should be included in a v2?

submitted by /u/Technomadlyf
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/MachineLearning