llama.cpp releases · · 1 min read

b9413

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

CUDA: Check PTX version on host side to guard PDL dispatch (#23530)

  • CUDA: Check PTX version on host side to guard PDL dispatch

Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this
variable doesn't differentiate between compiling for say sm_90, sm_90a
or sm_90f (so forward-jittable PTX vs. arch/family-specific PTX).

Thus, one can have a bug when compiling with
DCMAKE_CUDA_ARCHITECTURES="89;90a", where current code would wrongly
dispatch to PDL on sm_90/sm_120 in forward-JIT mode.

This PR fixes this issue by checking cudaFuncAttributes::ptxVersion of
the incoming kernel at runtime. A check on ptxVersion alone is
sufficient, as device-codes will always be >= ptxVersion (and any
violation of this would be a severe bug in CUDA/nvcc), see:
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-code-code-code

  • Implement MurmurHash3 mixer for better hash distribution

Magic constants were taken from boost:
https://github.com/boostorg/container_hash/blob/2698b43803c012601e6bb1a6116e83767b97986c/include/boost/container_hash/detail/hash_mix.hpp#L19-L65

  • Update ggml/src/ggml-cuda/common.cuh

Co-authored-by: Johannes Gäßler [email protected]

  • Address review comments, make seed non-zero

  • Apply code-formatting

  • Replace std::size_t -> size_t for consistency


Co-authored-by: Johannes Gäßler [email protected]

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from llama.cpp releases