r/LocalLLaMA · · 1 min read

Benchmarked inference engines for M1 Max 64gb-results & analysis

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I'm a hobbyist on a budget, and am using a M1 Max MacBook Pro for local inference, with Hermes Agent. I've endlessly researched which inference engines to use, and there's probably no right answer.

This caught my attention today: https://www.reddit.com/r/LocalLLM/comments/1ts3how/i_built_mlxchronos_a_community_benchmark/

I ran the dev's mlx-chronos (github.com/igurss/mlx-chronos) across rapid-mlx, omlx, mlx-lm, and ollama using Qwen3.5-4B on an M1 Max 64GB. Results submitted to the mlx-chronos community leaderboard.

Full write-up with charts: https://bright-lotus-8q5y.here.now . Credit to Claude Code for the webpage and analysis.

Short version: rapid-mlx leads on speed and memory efficiency. I'm using it to serve Qwen 35b-A3b.

thanks to u/igor__004 for his fine work.

submitted by /u/jarec707
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA