r/LocalLLaMA · June 29, 2026 · 1 min read

Mellum2 local deployments

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hey local community,

I work at JetBrains with the team that trained Mellum2 models — 12B-2.5A LLMs. Those models are trained completely from scratch, targeting fast inference: our primary goal were H100/H200s prod deployments, but local deployments are good as well. We open-sourced few checkpoints on HF earlier this month and also published full technical report on arxiv.

Our benchmarks show that we work as well as other small language models (SLMs), but provide significantly higher throughput under concurrent load (pic attached).

Various GGUFs are now available on ollama and HF as well, and we really would like to hear your feedback. What works well for you, what doesn't? What are your expectations from such small models, and do we meet those? What's your hardware setup, and is this model useful for you?

https://preview.redd.it/6j02yvpc68ah1.png?width=1080&format=png&auto=webp&s=c95f9fb12ec8df3533ced68cd6bcbf81bdefc9ba

submitted by /u/topshik59
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA