I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Hey! I'm a CS student and I got tired of not being able to compare MLX inference engines properly — every benchmark out there is either made by the engine's own developers, runs on an M3 Ultra nobody has, or just shows tok/s with zero context.
So I built mlx-Chronos — a small open source CLI tool that runs a standardized benchmark protocol on your Mac and lets you submit your results to a shared community leaderboard.
What it measures:
- Cold and cached TTFT (Time to First Token), with a proper methodology — unique prompts per trial, cache priming, no interleaved phases
- Throughput (tok/s), with mean/stddev/min/max across repeated trials
- Engine process RSS and system RAM peak, sampled continuously during inference
- Thermal state and hardware info
Supported engines: oMLX, Rapid-MLX, mlx-lm, Ollama (MLX backend)
The leaderboard is basically empty right now since I only have an M2 8GB. Would love results from M3 Max, M4, M4 Ultra, or anything with more RAM — that's where things get actually interesting.
→ Leaderboard: https://igurss.github.io/mlx-chronos → GitHub: https://github.com/igurss/mlx-chronos → Install: pip install mlx-chronos
It's early, the methodology is documented (there's a methodology.md if you want to pick it apart), and I'm 100% open to feedback, contributions, and getting told what I'm doing wrong. The goal is just to have one place where you can compare engines on your specific hardware instead of trusting someone else's numbers.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.