v0.30.0-rc32: llama-server followups (#16353)
Mirrored from Ollama releases for archival readability. Support the source by reading on the original site.
- llama-server followups
Misc fixes for #16031
- Add back dropped ROCm build flag for multi-GPU support on windows
- Fix amdhip64_*.dll version detection for "latest" selection
- Fix embeddings API for consistent normalize behavior with prior versions
-
ci: set up for automated llama.cpp update testing
-
reduce batch for fa-disabled, and constrained vram
-
mlx: fix v3 load bug on m5
Imagegen was incorrectly loading v3 first. This DRYs out the loading code so imagegen gets the same new v4/v3 selection logic.
-
fix reload bug on embedding models
-
bump version
-
steer user how to enable iGPU when disabled
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.