vLLM releases
27 articles archived · Visit source ↗ · RSS
-
vLLM releases dev-tools 3d ago
v0.24.0
[CI] Raise gsm8k startup timeout for MoE Refactor Qwen3 NVFP4 configs…
23 -
vLLM releases dev-tools 4d ago
v0.24.0rc2: Fix P/D with DP Supervisor (#46628)
Signed-off-by: Robert Shaw [email protected] (cherry picked from commit c5e3c40 )
7 -
vLLM releases dev-tools 15d ago
v0.23.1rc0: [Bugfix][CI] Update Dockerfile dependency graph PNG (#45602)
Signed-off-by: sfeng33 [email protected]
37 -
vLLM releases dev-tools 26d ago
v0.22.1rc2: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init
Signed-off-by: khluu [email protected]
9 -
vLLM releases dev-tools 26d ago
v0.22.1: fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init
Signed-off-by: khluu [email protected]
28 -
vLLM releases dev-tools 27d ago
v0.22.1rc1: [docker] Stop using extra-index-url for flashinfer-jit-cache (#44366)
Signed-off-by: Kevin H. Luu [email protected]
34 -
vLLM releases dev-tools 1mo ago
v0.22.0rc2: Fix early CUDA init (#43791)
Signed-off-by: Harry Mellor [email protected] (cherry picked from commit 41688e2 )
11 -
vLLM releases dev-tools 1mo ago
v0.21.1rc0: [ROCm][CI] Stage B gating (#42025)
Signed-off-by: Andreas Karatzas [email protected]
17 -
vLLM releases dev-tools 1mo ago
v0.21.0
Highlights This release features 367 commits from 202 contributors (49 new)! Transformers v4 deprecated : This release formally deprecates transformers v4 support ( #40389 ). Users should migrate to transformers v5. C++20 build requirement : vLLM now requires a C++20-compatible…
23 -
vLLM releases dev-tools 1mo ago
v0.21.0rc3
[MLA Attention Backend] Add TOKENSPEED_MLA backend for DSR1/Kimi K25 …
28 -
vLLM releases dev-tools 1mo ago
v0.21.0rc2
[Bugfix] Install nvidia-cutlass-dsl[cu13] extra on CUDA 13 platforms …
16 -
vLLM releases dev-tools 1mo ago
v0.21.0rc1
[Build] Build bundled DeepGEMM _C per-Python so the wheel imports o…
7 -
vLLM releases dev-tools 1mo ago
v0.20.2
vLLM v0.20.2 Highlights This release features 6 commits from 6 contributors (0 new)! This is a small patch release with bug fixes for DeepSeek V4, gpt-oss, and Qwen3-VL Bug Fixes DeepSeek V4 sparse attention : Re-enable the persistent topk path on Hopper and ensure the memset…
11 -
vLLM releases dev-tools 1mo ago
v0.20.1
vLLM v0.20.1 This is a patch release on top of v0.20.0 primarily focused on DeepSeek V4 stabilization and performance improvements , along with several important bug fixes. DeepSeek V4 Base model support ( #41006 ). Multi-stream pre-attention GEMM ( #41061 ), configurable…
37 -
vLLM releases dev-tools 1mo ago
v0.20.2rc0: [MRV2] Add shutdown() method (#41297)
Signed-off-by: Woosuk Kwon [email protected]
35 -
vLLM releases dev-tools 2mo ago
v0.20.0
vLLM v0.20.0 Highlights This release features 752 commits from 320 contributors (123 new)! DeepSeek V4 : Initial DeepSeek V4 support landed ( #40860 ), with DSML token-leakage fix in DSV4/3.2 ( #40806 ), DSA + MTP IMA fix ( #40772 ), and a silu clamp limit on the shared expert (…
33 -
vLLM releases dev-tools 2mo ago
v0.20.1rc0: Add system_fingerprint field to OpenAI-compatible API responses (#40537)
Co-authored-by: Claude [email protected]
6 -
vLLM releases dev-tools 2mo ago
v0.20.0rc1
Revert "[Misc] Move pyav and soundfile to common requirements" (#…
25 -
vLLM releases dev-tools 2mo ago
v0.19.1
This is a patch release on top of v0.19.0 with Transformers v5.5.3 upgrade and bug fixes for Gemma4: Update to transformers v5 ( #30566 ) [Bugfix] Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial delimiters ( #38992 ) [Bugfix][Frontend] Fix Gemma4 streaming…
17 -
vLLM releases dev-tools 2mo ago
v0.19.2rc0: [Bugfix] Fix k_proj's bias for GLM-ASR (#40160)
Signed-off-by: Rishapveer Singh [email protected]
4