r/LocalLLaMA · June 9, 2026 · 1 min read

Releasing Cohere North Mini Code

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hi folks! Jay here from Cohere. we just officially launched North Mini Code after getting some great feedback from you guys this weekend on the unreleased version. I wanted to come here and answer some of the questions you asked and provide some extra detail about the model itself.

You can download the weights on Hugging Face (fp8 here) or try it on OpenCode for free. if you want to read more about what I mentioned in the video, feel free to look at our technical blog post on HuggingFace as well as the announcement post!

If you're deploying with vllm, please use vLLM main for North Mini Code until a new release is available, and accurate response parsing also requires installing Cohere’s melody library.

uv pip install "git+https://github.com/vllm-project/vllm.git" uv pip install cohere_melody>=0.9.0

Then the vllm server can be started with the following command:

vllm serve CohereLabs/North-Mini-Code-1.0 \ -tp 2 \ --max-model-len 320000 \ --tool-call-parser cohere_command4 \ --reasoning-parser cohere_command4 \ --enable-auto-tool-choice

A couple of PRs were pushed to make this work better based on your feedback.

Useful tidbits from the previous post:

/u/germangrower69 points out a 3rd party MLX version here
We hear you on quantization and llama.cpp and we're flagging that internally.

if you have any questions or feedback, don't hesitate. We're really interested in seeing your builds and any problems you run into so we can build even better models for devs in the future. Really excited to hear what you think! Thanks again for all your help on this.

submitted by /u/jayalammar
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA