r/LocalLLaMA · · 3 min read

Small comparison on full compute performance (Anima) of 5090 (600,475 and 400W) vs 6000 PRO MaxQ (325W), and 6000 PRO WS/SE (600W).

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Small comparison on full compute performance (Anima) of 5090 (600,475 and 400W) vs 6000 PRO MaxQ (325W), and 6000 PRO WS/SE (600W).

Hello guys, hoping you're doing fine!

After selling some cards, I got a 6000 PRO MaxQ, which it's power limit range from 250W to 325W.

I still have a 5090, which it's power limit range ranges from 400W to 600W.

Since I had these, and I like to do compute for diffusion (txt2img, txt2video, img2img, etc), I wanted to compare them.

I also rented on runpod, a 6000 PRO WS edition, which it's power limit ranges from 150W to 600W (yes, lower than the MaxQ)

Important note: I did undervolt+overclock the 5090 and the 6000 PRO MaxQ. I can't modify the clocks or power on the rented GPUs on runpod.

So for this test, I ran these settings for the software:

I ran these settings for the samplers and steps:

Sampler settings

On text:

  • EXP Heun 2 x0 SDE for first 25 steps
  • ER SDE for 10 hires pass steps
  • Upscale by 1.5x
  • 896x1088 resolution
  • Batch size 4
  • CFG 5
  • Shift 3
  • Denoise Strength: 0.2
  • Upscaler: NVIDIA Ultra
  • Seed: 999999999

Prompt used was:

Positive:

masterpiece, high quality, score_7, '@' \(orange maru\), sfw, 1girl, solo, fully clothed, cynthia \(sygna suit\) \(aura\) \(pokemon\), pokemon masters ex, blonde hair, long hair, ponytail, hair over one eye, grey eyes, :|, full body, blurry background 

Negative:

worst quality, low quality, bad anatomy, (jpeg artifacts:0.8), watermark, sketch, no pupils, 

For the hardware, I ran them headless, (with LACT):

  • RTX 5090:
    • 2930Mhz max core clock
    • 1000Mhz core clock offset
    • +4400Mhz on VRAM (total 16000Mhz)
    • 400, 475 and 600W
  • RTX 6000 PRO MaxQ:
    • 550 core clock offset
    • No max core clock
    • +5270Mhz on VRAM (total 16000Mhz)
    • 325W
  • RTX 6000 PRO WS:
    • Stock
    • 600W

With all this data, I have these results:

GPU Power Notes Time VS Baseline
RTX 5090 600W Baseline (OC + UV) 36s -
RTX 6000 PRO SE/WS 600W No tuning 39s -8.3%
RTX 5090 475W UV+OC 42s -16.7%
RTX 6000 PRO MaxQ 325W OC 48s -33.3%
RTX 5090 400W UV+OC 48s -33.3%

Or also, using the 5090 at 400W as baseline:

GPU Power Notes Time Faster vs Baseline
RTX 5090 400W Baseline (OC + UV) 48s -
RTX 6000 PRO MaxQ 325W OC 48s 0%
RTX 5090 475W UV+OC 42s +12.5%
RTX 6000 PRO WS/SE 600W No tuning 39s +18.8%
RTX 5090 600W UV+OC 36s +25.0%

While running this task, the cards hovered around these core clocks:

  • 5090 600W: ~2500Mhz core clock
  • 5090 475W: ~2100Mhz core clock
  • 6000 PRO WS/SE 600W: ~2200Mhz core clock
  • 5090 400W: ~1800Mhz core clock
  • 6000 PRO MaxQ: 1400-1500Mhz core clock.

So, as you can see, the 5090 is 25% faster than the 6000 MaxQ here but by using 84% more power.

At the same time, the 6000 PRO WS/SE, untuned is 18.8% faster and also using 84% more power. In theory though, if you undervolt + overclock the WS/SE, it would be faster than the 5090.

And lastly, the 6000 PRO MaxQ performs the same as 5090 while using 75% of the power, which is quite impressive for how much power limited it is.

If anyone with a tuned 6000 PRO/WS can do the test, let me know!

submitted by /u/panchovix
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA