r/LocalLLaMA · May 31, 2026 · 1 min read

GPU Prices. Buy now, or buy later?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

If the Community could sound off on this, I'd be grateful.

Do you think GPU prices are going to stop skyrocketing? Is this FOMO and hype driving the adoption of local inference? I wonder if this mass-market adoption will last for years? Is it a long-term trend? If I wait 6 months, will I regret it? (cause prices are going to keep screaming). I don't know about RAM pricing... is that temporary?

Backstory:

I bought an M3 mbp max in Nov 2023 (128g, 4tb, 16core cpu / 40core gpu).
I use it as a desktop, with 20tb of external memory.

5 different production workflows running about a dozen daily crons. (everything from BERT models to 30b LLMs in prod, with RSLoRA adapters I've trained for specific tasks.)

3 different agent harnesses (2 customs and Hermes). I still hit openrouter (glm-5.1/minimax) for orchestration, and even anthropic for heavy coding tasks.

I'm sitting on the fence about buying a 1x5090 rig, expandable to 3 GPUs, and plug-n-play with a Pro 6000. But $10k is a hard swallow.

This would allow me to run Qwen3.6-35B-A3B-4bit and 27b-4bit in production for sub-agent delegations (4x sub agents concurrent with sufficient KV Cache).

Plan to run this headless as an inference server:

Build: ~$10k

AMD Ryzen 9 9950X 4.3GHz 16 Core 170W

64GB (2x DDR5 32GB)

NVIDIA GeForce RTX 5090 32GB

2TB NVMe PCIe Gen5 M.2 SSD

Fractal Design Define 7 XL case

Super Flower LEADEX Titanium 1700W

Asetek 624S-M2 240mm CPU Cooler

Case Fans Upgrade Kit (PWM Ramping)

Be kind. lol

submitted by /u/knob-0u812
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA