Cannot get NCCL test to run in docker with 2 x 6000 Pro connected x8 to AM4 CPU
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
nvidia-smi topo -m is showing the both GPU as PHB (i.e. via CPU) connected as expected but I cannot get NCCL all_reduce_perf to run at all, it always hangs after starting up. It seems that vllm won't work with TP=2 until I can fix this.
Is there any reason why this setup would not work (it's X570 based)?
TIA
[link] [comments]
More from r/LocalLLaMA
-
How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)
May 22
-
BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
May 22
-
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser
May 22
-
ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop
May 22
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.