b9254
Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.
Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) (#22522)
-
Adds initial PDL setup.
-
Adds PDL barriers based on simple heuristic: place "sync" before first input pointer access, and "launch" after last write, e.g. to tensors like dst.
-
Further optimization pass of the first half of kernels
-
Optimized PDL barriers for the second batch of kernels
-
Further refinements after rebase.
-
Moves pdl logic to separate function, removes some whitespace
-
Strips post-hoc PDL logic
-
Adds stream capture PDL setup. Enrolls quantize_q8_1 to leverage pdl to
overlap execution with previous kernels -
Enrolls mul_mat_vec_q, rms_norm_f32 and k_bin_bcast (partly) into PDL
-
Enrolls mmvf, rope, set-rows and topk kernels for gpt-oss into PDL
-
Introduce ggml_cuda_kernel_launch, to abstract away cudaLaunchKernelEx,
to enable hip/musa compatibility -
Enrolls cpy_scalar_contiguous, k_get_rows_float and rms_norm_f32
-
Enrolls flash_attn_combine_results
-
Fix: Drops needless and broken check of CUDA arch for PDL. PDL either
works or is without effect. -
Enrolls flash-attention kernels to pdl
-
Fix: inlines ggml_cuda_kernel_launch, and uses perfect forwarding for
kernels args. This fixes PDL. -
Perf: Enrolls k_bin_bcast variadic template invocation into PDL, via
and template alias and template expansion -
Enrolls all remaining kernels for qwen3-coder-next into PDL
-
Remove all PDL LC calls to create a baseline
-
Added LC according to internal guidance and tested kernel performance.
-
Enrols missing qwen3-5 kernels passively into PDL.
-
Kernel optimizations (LC signals) for qwen3.5
-
Enrolls ssm-scan kernels into PDL
-
Adds GGML_CUDA_PDL command line option to toggle PDL.
-
Fix: Ada and lower compilation by guarding PDL calls correctly
-
Cleanup: Removes commented out GGML_CUDA_PDL_LC
-
Cleanup: Removes experimental comments
-
Adds 90-virtual to build script so that Hopper GPUs can leverage PDL.
-
Adds stricter checks to enable PDL, adds env-check to disable it, and removes now superfluous compile option to enable PDL.
-
Fix: Correct PDL en/disablement based on device-side arch check. Host
side check is UB. Required moving from macros to inlined functions -
Fix: default-disable PDL. Enable by setting GGML_CUDA_ENABLE_PDL=1
-
Enable PDL by default for Hopper+ devices
-
Enrolls softcap_f32 and two flash_attn kernels into PDL.
-
Improves flash attn PDL barrier placement
-
Fix: Perf regression on ada; excludes ada and below from PDL launches
-
Improves some sync barrier placements
-
Drops superfluous constructor
-
Adds #endif guard comments
-
Reverts experimental change to top-k-moe.cu, which moved expensive allocations
in front of the PDL barrier. It did not have a meaningful impact. -
Exchanges GGML_CUDA_DISABLE_PDL with GGML_CUDA_PDL. IFF GGML_CUDA_PDL=0
PDL is disabled -
Revert "Drops superfluous constructor". Adds const to remaining
arguments
This reverts commit 12b1d25.
-
Cleanup: Removes and fixes some comments and whitespace
-
Clarifies comment of sync-barrier position
-
Relocates and refactors PDL launch functions and accessories
-
Adds error checking to the regular kernel launch path
-
Drops "auto" in favor of "ggml_cuda_kernel_params"
-
Adds "const" to ggml_cuda_kernel_launch_params
-
[Whitespace] Adds final newline to common.cuh to make editorconfig CI job happy
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
- Ubuntu x64 (SYCL FP32)
- Ubuntu x64 (SYCL FP16)
Android:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.