b9426: llama : do not skip iGPU when only RPC devices are present (#23868)
Mirrored from llama.cpp releases for archival readability. Support the source by reading on the original site.
After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device
selection logic dropped the local iGPU whenever any RPC server was added,
because RPC devices made model->devices non-empty. On systems where the
"iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified
memory), this caused all tensors to be allocated on the RPC peer alone and
model loading to fail.
Gate the iGPU inclusion on gpus.empty() instead, so RPC peers no longer
suppress the local iGPU.
closes: #23858
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.