r/LocalLLaMA · May 29, 2026 · 1 min read

vLLM PR adding native HIP W4A16 kernel was merged

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

The performance increase introduced by the PR is awesome. Makes my ROCm rig a lot more useful.

Numbers from the PR:

Kernel	dtype	max-num-seqs=8	max-num-seqs=32
Triton W4A16	bf16	82.4 tk/s	-
Triton W4A16	fp16	83.2 tk/s	-
ExLlama (no bf16)	fp16	255.0 tk/s	382.5 tk/s
RDNA3 W4A16 (this PR)	bf16	205.3 tk/s	382.5 tk/s
RDNA3 W4A16 (this PR)	fp16	270.2 tk/s	445.7 tk/s

EDIT: The numbers are for Qwen3.6-27B-GPTQ-W4A16-G32.

See more here: PR link

Discussion (0)

No comments yet. Sign in and be the first to say something.