r/LocalLLaMA · June 9, 2026 · 1 min read

Apple announced new on device inference engine for Apple Silicon

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

This news seem to have flown under the radar. Apple announced CoreAI on WWDC which is basically a future replacement for CoreML and an alternative to MLX/llama.cpp/torch for on-device optimized inference, especially on phones and tablets.

The model weights need to be converted similarly to CoreML via python script, atm the list of supported models is mostly from mid 2025 year though https://github.com/apple/coreai-models/tree/main/models . For anyone wondering how is that anything new - CoreML out of the box didn't even support models beyond a few billion params and had very limited supported operations pool. This implies big update to ANE ops too.

There's nothing on performance yet, it is very likely that it's inferior to pure MLX on GPU atm. The only other interesting thing is that they boast 20B model to be deployed on device for foundation models https://machinelearning.apple.com/research/introducing-third-generation-of-apple-foundation-models, which looks to be lazily loaded MoE, so perhaps CoreAI will allow to deploy larger models with apps as well.

submitted by /u/bakawolf123
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA