llama.cpp releases · May 30, 2026 · 1 min read

b9426: llama : do not skip iGPU when only RPC devices are present (#23868)

Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.

After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device
selection logic dropped the local iGPU whenever any RPC server was added,
because RPC devices made model->devices non-empty. On systems where the
"iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified
memory), this caused all tensors to be allocated on the RPC peer alone and
model loading to fail.

Gate the iGPU inclusion on gpus.empty() instead, so RPC peers no longer
suppress the local iGPU.

closes: #23858

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from llama.cpp releases