Mellum2 local deployments
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hey local community, I work at JetBrains with the team that trained Mellum2 models — 12B-2.5A LLMs. Those models are trained completely from scratch, targeting fast inference: our primary goal were H100/H200s prod deployments, but local deployments are good as well. We open-sourced few checkpoints on HF earlier this month and also published full technical report on arxiv. Our benchmarks show that we work as well as other small language models (SLMs), but provide significantly higher throughput under concurrent load (pic attached). Various GGUFs are now available on ollama and HF as well, and we really would like to hear your feedback. What works well for you, what doesn't? What are your expectations from such small models, and do we meet those? What's your hardware setup, and is this model useful for you? [link] [comments] |
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.