You can now convert EXL3 quants on Apple Silicon Mac
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Hi, I'm here with an update. But this time it's quite a bigger news on local llm. Normally accessing the high fidelity quant like EXL3 is CUDA gated, and imagine you need 96GB-128GB with RTX cards, they are very specialized and expensive. But now on a more general basis, MacOS and Apple Silicon you can find those with 64GB+ quite easily, they don't come cheap but they are available for normal people. You can now run, inference and even convert EXL3 models. I've done it with MiniCPM5 and Qwen3.6-27B. The mean KLD of MiniCPM5 is on par with model converted with RTX card, and Qwen3.6-27B is just a tiny bit behind.
If you don't know about EXL3, it's a wonderful work from turboderp and co. Best quant quality-to-weight on a consumer machine. It's approximately around half a bit per weight better than MLX quant in general.
https://github.com/beamivalice/PonyExl3 Grab it - Apache 2.0
Cheers,
Beam
[link] [comments]
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.