Qwen 3.6-27B on vLLM with dual RTX 3090s: looking for launch parameters
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hi everyone. Please share your working launch commands for running Qwen 3.6-27B via vLLM on dual RTX 3090s (both running in PCIe 4.0 x8). I'm interested in setups both with and without an NVLink bridge.
I'm familiar with the club-3090 repo, but their ready-to-use vLLM recipes are focused on 4-bit models. With 48GB of total VRAM, I'd rather not compress it that much—I want to use bigger quant to retain maximum generation quality.
Questions for anyone running this model on similar hardware:
- Which specific quantization of Qwen 3.6-27B are you using?
- What exact commands/parameters are you using to launch vLLM?
I'd appreciate any configs or launch advice you can share.
[link] [comments]
More from r/LocalLLaMA
-
A cooling chamber for dgx spark and gb10 machines at computex 2026
Jun 6
-
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Jun 6
-
Has there been any recent new development on which quant is considered optimal?
Jun 6
-
Local vs Frontier on low-level systems engineering
Jun 6
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.