r/MachineLearning · May 31, 2026 · 1 min read

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama) [P]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Hey! I'm a CS student and I got tired of not being able to compare MLX inference engines properly — every benchmark out there is either made by the engine's own developers, runs on an M3 Ultra nobody has, or just shows tok/s with zero context.

So I built mlx-Chronos — a small open source CLI tool that runs a standardized benchmark protocol on your Mac and lets you submit your results to a shared community leaderboard.

What it measures:

Cold and cached TTFT (Time to First Token), with a proper methodology — unique prompts per trial, cache priming, no interleaved phases
Throughput (tok/s), with mean/stddev/min/max across repeated trials
Engine process RSS and system RAM peak, sampled continuously during inference
Thermal state and hardware info

Supported engines: oMLX, Rapid-MLX, mlx-lm, Ollama (MLX backend)

The leaderboard is basically empty right now since I only have an M2 8GB. Would love results from M3 Max, M4, M4 Ultra, or anything with more RAM — that's where things get actually interesting.

→ Leaderboard: https://igurss.github.io/mlx-chronos → GitHub: https://github.com/igurss/mlx-chronos → Install: pip install mlx-chronos

It's early, the methodology is documented (there's a methodology.md if you want to pick it apart), and I'm 100% open to feedback, contributions, and getting told what I'm doing wrong. The goal is just to have one place where you can compare engines on your specific hardware instead of trusting someone else's numbers.

submitted by /u/igor__004
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/MachineLearning