r/LocalLLaMA · · 3 min read

I have (even faster) DeepSeek V4 Pro at home

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Few days ago I posted about my DeepSeek V4 Pro at home - now time for an update. Yesterday I finally managed to run this model in ktransformers (sglang + kt-kernel). I followed the tutorial for DeepSeek V4 Flash and tweaked some options (NUMA, cores) for my hardware (Epyc 9374F + RTX PRO 6000 Max-Q). Then I ran llama-benchy with increasing context depth to check the performance. Results:

Depth 0:

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) | |:----------------------------|-------:|-------------:|------------:|----------------:|----------------:|----------------:| | deepseek-ai/DeepSeek-V4-Pro | pp512 | 39.76 ± 0.00 | | 12878.44 ± 0.00 | 12877.59 ± 0.00 | 12878.44 ± 0.00 | | deepseek-ai/DeepSeek-V4-Pro | tg32 | 7.54 ± 0.00 | 8.00 ± 0.00 | | | | 

Depth 2048:

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) | |:----------------------------|--------------:|-------------:|------------:|----------------:|----------------:|----------------:| | deepseek-ai/DeepSeek-V4-Pro | pp512 @ d2048 | 45.13 ± 0.00 | | 56726.85 ± 0.00 | 56725.93 ± 0.00 | 56726.85 ± 0.00 | | deepseek-ai/DeepSeek-V4-Pro | tg32 @ d2048 | 7.32 ± 0.00 | 8.00 ± 0.00 | | | | 

Depth 4096:

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) | |:----------------------------|--------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:| | deepseek-ai/DeepSeek-V4-Pro | pp512 @ d4096 | 45.75 ± 0.00 | | 100729.28 ± 0.00 | 100728.46 ± 0.00 | 100729.28 ± 0.00 | | deepseek-ai/DeepSeek-V4-Pro | tg32 @ d4096 | 7.29 ± 0.00 | 8.00 ± 0.00 | | | | 

Depth 8192:

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) | |:----------------------------|--------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:| | deepseek-ai/DeepSeek-V4-Pro | pp512 @ d8192 | 45.97 ± 0.00 | | 189354.94 ± 0.00 | 189354.03 ± 0.00 | 189354.94 ± 0.00 | | deepseek-ai/DeepSeek-V4-Pro | tg32 @ d8192 | 7.25 ± 0.00 | 8.00 ± 0.00 | | | | 

Depth 16384:

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) | |:----------------------------|---------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:| | deepseek-ai/DeepSeek-V4-Pro | pp512 @ d16384 | 46.16 ± 0.00 | | 365997.22 ± 0.00 | 365996.26 ± 0.00 | 365997.22 ± 0.00 | | deepseek-ai/DeepSeek-V4-Pro | tg32 @ d16384 | 7.17 ± 0.00 | 8.00 ± 0.00 | | | | 

Depth 32768:

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) | |:----------------------------|---------------:|-------------:|------------:|-----------------:|-----------------:|-----------------:| | deepseek-ai/DeepSeek-V4-Pro | pp512 @ d32768 | 46.18 ± 0.00 | | 720687.13 ± 0.00 | 720685.67 ± 0.00 | 720687.13 ± 0.00 | | deepseek-ai/DeepSeek-V4-Pro | tg32 @ d32768 | 7.07 ± 0.00 | 8.00 ± 0.00 | | | | 

During 64k test (that took over 20 min) llama-benchy did not report the result despite sglang finishing processing the request so I aborted the test. I don't know, maybe there is some kind of timeout happening.

This is all running the original model files, no need for conversion.

  • GPU VRAM usage: 90815MiB / 97887MiB
  • GPU power usage: ~100W during PP, ~150W during TG
  • RAM usage: 907.5GB / 1152GB
  • CPU+MB power usage: ~400W
submitted by /u/fairydreaming
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA