r/LocalLLaMA · · 12 min read

Project Blackwell: It Will Work, Eventually — Making an RTX Pro 6000 Run in a Dell R730 at 650K Context

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Project Blackwell: It Will Work, Eventually — Making an RTX Pro 6000 Run in a Dell R730 at 650K Context

# Project Blackwell-R730: It Will Work, Eventually

How a 2016-era Dell PowerEdge R730, an RTX Pro 6000 Blackwell, firmware archaeology, SlimSAS chaos, and unreasonable persistence turned into a 650k-context local AI box.

**AI was also used extensively during the debugging process because at some point 580+ tabs stops being a research method and starts becoming distributed cognition. - also i suck at reddit, nevermind it isn't me, it's f'n reddit gallery, sorry no captions, use imagination of stages.**

This is not a recommended build guide.

This is a documentary of the nonsense required to make something work that absolutely did not want to work.

https://preview.redd.it/3uwwzjm3h84h1.jpg?width=1542&format=pjpg&auto=webp&s=0524e32aad763d3ff087e195ea0b395410a322fb

Photo 1 is the final rack context: garage datacenter, host R730, external GPU chassis, and surrounding heat generators.

## The Innocent Assumption

The first assumption was simple: big server, big power supplies, big PCIe slots, lots of airflow, expensive GPU. Therefore, the GPU should work.

That assumption was reasonable. It was also wrong.

The R730 is physically capable in many ways, but it belongs to a different era of PCIe assumptions. The RTX Pro 6000 belongs to a world of large BARs, modern firmware expectations, aggressive power behavior, and workstation-class physical dimensions. The R730 looked at that and more or less said: no.

https://preview.redd.it/rzsury35h84h1.jpg?width=1542&format=pjpg&auto=webp&s=0fc13669c7fb124c00086f38ea2e09dc434f7a76

https://preview.redd.it/q2esteg5h84h1.jpg?width=1542&format=pjpg&auto=webp&s=8964cfe7987afa5d075571337f2aa473535a31f1

Photos 2 and 3 show the baseline system and early internal accelerator/topology experiments.

## Mechanical Fitment: The Fan Shroud War

The first real fight was physical.

Slot 4 was initially attractive. It had better topology characteristics and looked like the cleaner slot. But installing the card there forced Riser 3 out of the way, which eliminated a power source I needed.

Slot 6 allowed Riser 2 and Riser 3 to be present so I could feed the RTX from both risers. Unfortunately, slot 6 put the GPU in conflict with the fan shroud and airflow plastic.

So the shroud became negotiable.

The goal was not to randomly destroy Dell airflow engineering. The goal was to remove only the geometry preventing the card from physically fitting while preserving as much useful airflow behavior as possible.

Eventually the card physically seated. That moment was the first major morale boost: holy shit, finally it is in there.

https://preview.redd.it/as7sr947h84h1.jpg?width=1542&format=pjpg&auto=webp&s=8d87ad39b95325065ada611f3b4f030acea20648

https://preview.redd.it/3kt4dbm7h84h1.jpg?width=1542&format=pjpg&auto=webp&s=128af03b1c1a5a8b4a0b196ebb5f39a22b632353

Photos 4 and 5 are the fan-shroud negotiation phase and the eventual internal fitment.

But physical installation was not victory. It was only permission to discover the next failure.

## Power Topology: Enough to Fail Correctly

With the card seated in slot 6 and powered from both risers, the system could at least get far enough to encounter the real blocker: PCIe resource allocation.

The slot and riser power arrangement gave the GPU enough to try. That mattered, because before this point every symptom was ambiguous. Was it physical? Was it power? Was it firmware? Was it Dell refusing to cooperate? Was it the card?

Once the system reached BAR failure consistently, the problem became more technical and less mystical.

That was progress.

## The BAR War

This was the longest, most technical, and most exhausting phase.

At this point I had not seen the NVIDIA GPU in `lspci` at all. Nothing useful. No compute device. No partial victory. Just failure.

The RTX Pro 6000 needed PCIe BAR resources that the R730 firmware did not want to allocate correctly. Above 4G / large MMIO handling was either hidden, insufficient, or constrained by Dell's platform assumptions.

This turned into BIOS extraction, IFR inspection, hidden setting hunting, ACPI / DSDT investigation, `_CRS` resource descriptor analysis, MMIO aperture math, hex arithmetic, changing something, rebooting, breaking something else, undoing that, and trying again.

The fun part about firmware work is that you can absolutely fix one address range by stomping on something else. Then you do not have a GPU problem anymore. You have a different bus problem, a RAID controller problem, or a boot problem.

So the BAR war was not one breakthrough. It was a long grind of discovering exactly how the R730 described and allocated PCIe memory resources, then trying to convince it to behave like a newer platform.

https://preview.redd.it/73ej8o9ah84h1.png?width=879&format=png&auto=webp&s=dee1c86a4a88ba66e18dcb8c376b8a5d2200c2e8

Photo 6 is the kind of ACPI / BAR resource work that became the technical center of the war.

If BAR did not work, nothing else mattered.

## Kernel Flag Roulette

After the firmware work came the Linux boot argument phase.

This was the try-every-PCIe-flag-until-something-different-happens stage.

Some combinations helped resource allocation. Some combinations broke unrelated parts of the system. A few disabled the RAID controller path badly enough that the machine could not find what it needed.

The pattern became: change kernel flags, reboot, observe failure, compare logs, repeat.

This was not elegant engineering. This was trench warfare. But it slowly narrowed the problem.

## iDRAC Had Opinions

Somewhere in this process, another unpleasant reality became obvious: the R730 with dual 1100W PSUs does not simply mean 2200W of arbitrary GPU-friendly power.

The server had dual 1100W PSUs installed. That did not matter. iDRAC had already decided reality was approximately 650W and would not be accepting appeals.

That was misleading because power symptoms could look like PCIe symptoms, BAR symptoms, or driver symptoms. At that point, continuing to fight the Dell power domain stopped making sense.

https://preview.redd.it/q723akach84h1.jpg?width=2048&format=pjpg&auto=webp&s=76827b4ea5f3b162ff7dca70942d750a9690e60d

Photo 7 shows the power-monitoring phase that helped force the externalization decision.

The R730 would remain the host: CPUs, RAM, storage, PCIe root complex, management.

The GPU would become its own appliance.

That was the externalization pivot.

## The Abandoned Antec Case Enters the Story

There was an old early-2000s Antec case sitting around. Heavy steel. Ugly in the right way. Built like it expected to survive being thrown out of a truck.

At first it was just a convenient empty box. Then it became obvious it was actually close to ideal: huge internal volume, strong steel chassis, real front filtration, large airflow paths, normal ATX PSU support, rear PCI slots, enough room to mount a PCIe slot adapter, and enough space to route cables without tight bends.

Modern cases often optimize around glass, RGB, and aesthetics. This thing optimized around surviving the Pentium 4 era.

Perfect.

The motherboard came out. The old cable chaos came out. The case became a dedicated GPU airflow and power appliance.

https://preview.redd.it/w52wd05eh84h1.jpg?width=1542&format=pjpg&auto=webp&s=48b3c177bf64d4b2563543b8b5154b9d1ffbc55f

https://preview.redd.it/8mey24leh84h1.jpg?width=1542&format=pjpg&auto=webp&s=0cbe6e91aba45ac4faad509f47625e12b25f1efc

https://preview.redd.it/felvr3zeh84h1.jpg?width=1542&format=pjpg&auto=webp&s=7d608fcc725c1fe8a20fd5fbde77da3924ae0556

Photos 8 through 10 show the externalization pivot and the RM1200e power-domain plan.

## SlimSAS, Retimers, and the External PCIe Plan

The final architecture became:

```text

Dell R730 PCIe slot

-> host-side PCIe/SlimSAS retimer card

-> SlimSAS cables

-> GPU-side SlimSAS-to-PCIe x16 slot adapter

-> RTX Pro 6000

```

This avoided raw ribbon risers and gave the build a cleaner, more mechanically serviceable path.

The key design choices were externalized GPU power, retimer-assisted PCIe transport, SlimSAS cabling, rigid GPU-side mounting, same rack/PDU grounding, and controlled startup order.

The GPU-side adapter created another mechanical problem: the case had no standoff points where the adapter needed them. The holes were marked, drilled, and threaded.

After the GPU was seated and the second 1m cable finally arrived, the bottom-left adapter screw had to be carefully removed because the cable could not physically clear it with the GPU installed. At that point, removing a seemingly functional GPU/SlimSAS assembly felt more dangerous than modifying the mount around it.

The host-side retimer also contributed a surprise: one SlimSAS socket appeared bent from shipment. The cable that had taken weeks to arrive would not seat properly until the connector geometry was carefully corrected.

https://preview.redd.it/n2kp12tgh84h1.jpg?width=1542&format=pjpg&auto=webp&s=05b8965cda13b06d6613d0dac22ad2e1fe43c48b

https://preview.redd.it/f6kkl2tgh84h1.jpg?width=1542&format=pjpg&auto=webp&s=73d17d6a15d0be7eb36e8f8324d490f374af326c

https://preview.redd.it/it9ek1tgh84h1.jpg?width=1542&format=pjpg&auto=webp&s=2899f56bba9a49670062c9c1c6f68578f68ab23c

https://preview.redd.it/o4fgd2tgh84h1.jpg?width=1542&format=pjpg&auto=webp&s=32d0a3bfabfb62a5ecf97463a33696c432bcb0c3

https://preview.redd.it/i3p1x1tgh84h1.jpg?width=1542&format=pjpg&auto=webp&s=fec95c49ab19e4368e71d8b71e580e2e7a2f9e4b

https://preview.redd.it/l03y52tgh84h1.jpg?width=1542&format=pjpg&auto=webp&s=616e84b7baf9900af074f2366a89118997204408

https://preview.redd.it/nfffv1tgh84h1.png?width=494&format=png&auto=webp&s=401aed43b372f2a11d0776b9fff2a95f113d6277

Photos 11 through 17 show the adapter, drilling, retimer, external build, and the bent SlimSAS connector.

## Cable Hell

Parts arrived piecewise. Internationally. Slowly. In the wrong order.

The first cables were 0.5m. On paper that should have worked because the systems were directly adjacent in the rack. Reality disagreed.

The white-front chassis became the eGPU. Directly above it sat the host R730, heavy because of the drive loadout. Even with the two chassis right next to each other, 0.5m cables created bad routing and bad strain. The cable path needed 1m.

One 1m cable arrived quickly. The second cable failed delivery multiple times. The entire project stalled because one piece of cable did not want to exist in my garage.

Eventually I sent a sufficiently motivated email to the right place, and the second cable appeared within roughly 16 hours.

That part of the story was not technically difficult. It was just psychologically damaging.

Also, the R730 is heavy. Moving it around repeatedly while debugging this was its own workout program. I am around 150 pounds; the server feels like it is trying to become my personal trainer.

https://preview.redd.it/hoo97nmih84h1.jpg?width=1542&format=pjpg&auto=webp&s=343d87adb6a33bcfe3c230b42fb21b268ddd4d79

Photo 18 is the 1m SlimSAS cable reality check.

## The First Real Breadcrumb: NVIDIA HD Audio

After the BAR work and the external PCIe path started coming together, something finally appeared in `lspci`.

Not the GPU.

NVIDIA HD Audio.

That sounds small, but it was huge. It meant the platform was no longer completely rejecting the card. The PCIe path existed. The GPU package was partially visible. The system had moved from nothing to something.

At this stage the build was still not working through `nvidia-smi`, but the appearance of HD Audio was the first proof that the BAR/resource work had moved the platform into the correct neighborhood.

That was the so-you-are-saying-there-is-a-chance moment.

## Firmware Conflict: Stop Helping, Linux

The next issue looked like a driver/firmware conflict.

The kernel/driver stack appeared to be trying to load firmware that did not match what the card actually needed. In practical terms, the system seemed to be stepping on itself.

After enough trial and error, the fix became almost philosophical: stop forcing the firmware load. Use what is already there.

At this point, after DSDT hacking, BAR work, kernel flags, and partial PCIe enumeration, Linux was effectively running half the BIOS anyway.

I am the captain now.

And then it happened.

`nvidia-smi` worked.

The RTX Pro 6000 Blackwell appeared. 96GB VRAM visible.

Beer time.

https://preview.redd.it/uasxb96kh84h1.png?width=554&format=png&auto=webp&s=071af883fd365171049a7aa9153b0285e2c57c80

Photo 19 is the victory screenshot.

## Large-Context Validation

Once the system was actually working, the next question was: does it do the thing?

The model used for the test was Llama 4 Scout 17B Q5_K_M. I configured llama-server for the new GPU setup, moved away from my old CPU-only assumptions, and set context to 650k.

For a deliberately ugly test, I concatenated an entire repo into a single file using BEGIN FILEPATH / END FILEPATH markers. It was not a perfect RAG system. It was not carefully indexed. It was not elegant.

It was a brute-force here-is-the-repo-now-answer-this-known-bug-question test.

The prompt loaded roughly 630k tokens. Initial prefill was around 800-900 tokens/sec. Total ingest took roughly 40 minutes.

Partway through the run, the garage was around 90F and the GPU temperature got high. This was not really a system failure; it was the absurdity of running a rack, a 2016 server, and a very expensive GPU appliance in a garage.

I dropped the power limit with `nvidia-smi`. The GPU temperature fell hard. The prefill rate barely visibly changed.

Near the end of prefill, the model began producing the answer before the load had fully completed, which was unexpected but welcome.

It answered the test correctly.

After prefill completed, follow-up interaction was fast — roughly ChatGPT-like interactive speed, around the 80-90 tokens/sec class depending on settings and prompt state.

That was the point where the whole project stopped being theoretical.

## What Actually Mattered

Some symptoms and lessons mattered more than expected:

- BAR allocation was the central blocker. Without that, nothing else mattered.

- NVIDIA HD Audio appearing in `lspci` was a major milestone.

- Kernel flags were useful but dangerous.

- Firmware/driver behavior can look like hardware failure.

- Dell power behavior on 110V was not going to cooperate with the original plan.

- Externalizing the GPU was not defeat; it was better architecture.

- The workstation card was not the mistake it seemed to be. A server card would not have magically solved BAR, power, or platform assumptions either.

- The old Antec case became useful because it obeyed physics better than the R730 did.

- 0.5m cables were too optimistic even with adjacent chassis. 1m was the practical routing answer.

- The project required equal parts firmware debugging, mechanical adaptation, logistics rage, and stubbornness.

==== a bunch of AI whatever rambling from what was an encyclopedia of conversations below, it kind of pharaphased and both gpt and gemini turned into assholes, i guess personality markers, more or less after so many conversations and rage.

## Final Architecture

The final system is best understood as two cooperating machines:

```text

Dell R730:

- control plane

- CPUs

- RAM

- storage

- PCIe root complex

- host OS

- llama-server

External GPU chassis:

- RTX Pro 6000

- independent 1200W PSU

- PCIe slot adapter

- retimer/SlimSAS path

- dedicated airflow

- dedicated power domain

Rack reality:

- eGPU chassis directly below host R730

- AI box above

- multiple heat sources stacked in a garage

- High Performance Rack Serenity Device pending/installed for thermal and emotional support

```

This is not just a GPU in a server.

It is closer to a garage-built GPU appliance attached to an enterprise server.

And somehow, it works.

## Closing Thought

This project started because I thought a big server should accept a big GPU.

It did not.

Then it became a firmware problem. Then a power problem. Then a mechanical problem. Then a cable problem. Then a Linux problem. Then a delivery problem.

Then, finally, a working local AI system capable of loading roughly 630k tokens of repo context and answering questions about it.

The lesson is not that everyone should do this.

The lesson is that unsupported is not the same thing as impossible.

Sometimes it means nobody sane has suffered through the dependency chain yet.

This time, the chain ended in `nvidia-smi`.

submitted by /u/tacticalhat
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA