Benchmarked inference engines for M1 Max 64gb-results & analysis
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I'm a hobbyist on a budget, and am using a M1 Max MacBook Pro for local inference, with Hermes Agent. I've endlessly researched which inference engines to use, and there's probably no right answer.
This caught my attention today: https://www.reddit.com/r/LocalLLM/comments/1ts3how/i_built_mlxchronos_a_community_benchmark/
I ran the dev's mlx-chronos (github.com/igurss/mlx-chronos) across rapid-mlx, omlx, mlx-lm, and ollama using Qwen3.5-4B on an M1 Max 64GB. Results submitted to the mlx-chronos community leaderboard.
Full write-up with charts: https://bright-lotus-8q5y.here.now . Credit to Claude Code for the webpage and analysis.
Short version: rapid-mlx leads on speed and memory efficiency. I'm using it to serve Qwen 35b-A3b.
thanks to u/igor__004 for his fine work.
[link] [comments]
More from r/LocalLLaMA
-
It's funny how everything changes, yet somehow stays the same.
May 31
-
Dell confirms XPS laptop with NVIDIA N1X at Computex ( basically a DGX Spark GB10 for consumers with Windows )
May 31
-
My home data center
May 31
-
All DGX Station GB300 OEM systems side-by-side in one image (roughly actual size)
May 31
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.