r/LocalLLaMA · · 1 min read

Finally finished my LLM server: EPYC 9575F, 4× RTX 3090 (96GB VRAM), 768GB ECC RAM

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Finally finished my LLM server: EPYC 9575F, 4× RTX 3090 (96GB VRAM), 768GB ECC RAM

Took a while, but Nalthis is finally up and assembled.

Specs:

  • Supermicro H13SSL-N
  • AMD EPYC 9575F (64C/128T Zen 5)
  • 768GB DDR5-5600 ECC RDIMM
  • 4× RTX 3090 (96GB VRAM total)
  • 1× 2TB NVMe OS
  • 2× 3.94TB NVMe data
  • 2050W ATX 3.1 PSU
  • Corsair 9000D

Planned use:

  • vLLM - high throughput small models
  • llamacpp - larger reasoning models

I have been making a space simulation and finally ready to integrate AI into how the NPCs doing planning, hoping to get decent throughput on smaller models with lots of requests

The original plan involved a lot more MCIO risers and custom mounting, but I was able to fit two of the 3090s directly on the motherboard and front-mount the other two.

Planning to run all four cards power-limited to 250W since this box is primarily for LLM inference.

The 9000D has been surprisingly good for a 4×3090 build. I also used these fan mounts for additional airflow:

https://www.thingiverse.com/thing:2804306

Still need to finish thermal testing, but the hardware side is finally done.

Head of Cluster Operations: Stannis leading from the couch as well

submitted by /u/C0smo777
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA