Hacker News — AI on Front Page · · 4 min read

MAI-Code-1-Flash

Mirrored from Hacker News — AI on Front Page for archival readability. Support the source by reading on the original site.

228 pts · 104 comments on Hacker News

Models

Introducing MAI-Code-1-Flash 

Superintelligence team
June 2, 2026
Models
Superintelligence team

Today we’re introducing MAI-Code-1-Flash, a new Microsoft coding model built for fast, efficient assistance in everyday developer workflows. It is built end-to-end by Microsoft using clean and appropriately licensed data. The model is rolling out to GitHub Copilot individual users in Visual Studio Code in the model picker and under the default auto picker.

Features and capabilities

  • Agentic coding in real developer environments, trained and designed for GitHub Copilot harness, to work better together.
  • Adaptive thinking, stays concise for simple requests and spends more reasoning budget on complex tasks.
  • Strong instruction-following across single-turn and multi-turn scenarios.

MAI-Code-1-Flash is designed around the simple goal of delivering high-quality coding help with better efficiency. It outperforms Claude Haiku 4.5 with better price to performance across coding benchmarks.

A scatter plot compares coding models on pass rate vs. average token usage. MAI-Code-1-Flash (green) outperforms Claude Haiku 4.5 (orange) across benchmarks, with higher pass rates and lower token use in the highlighted “Ideal Zone.”.

Build for developers, not benchmarks

Coding models are most useful when they perform well in the same environment developers use every day. That is why we built MAI-Code-1-Flash with production workflows at the center, rather than optimizing only for benchmarks. The model was trained directly with GitHub Copilot harnesses used in production. This allows it to learn how to interact with surrounding tools and systems in agentic coding tasks, making it uniquely well suited to real-world Copilot workflows compared to other available models.

During training, we evaluated checkpoints across core software engineering tasks, repository question answering, refactoring, and telemetry-grounded tasks adapted from real GitHub Copilot usage. This alignment between training, evaluation, and production helps offline improvements translate into real-world developer quality.

Designed to maximize value per token

MAI-Code-1-Flash was trained with adaptive solution length control, which helps the model adjust the depth of its response to the task. It can stay concise for simpler requests and spend more reasoning budget when a problem requires deeper analysis or broader code changes. In practice, this means developers start seeing useful output sooner. We see MAI-Code-1-Flash solving harder problems with up to 60% fewer tokens. This helps reduce latency, lower cost, improve return on token, and make interactive workflows feel smoother.

Benchmark results in the production harness

To understand both quality and efficiency, we evaluated MAI-Code-1-Flash against Claude Haiku 4.5 on SWE-Bench Verified, SWE-Bench Pro, SWE-Bench Multilingual, and Terminal Bench 2 using the same production harness that developers use for their everyday coding tasks. We measured task success and the average number of solution tokens required to complete each task.

MAI-Code-1-Flash outperforms Claude Haiku 4.5 across all core coding benchmarks tested, with higher pass rates on all 4 evaluations, including a +16-point lead on the diverse, real-world tasks of SWE-Bench Pro (51.2% vs. 35.2%). It’s not just smarter; it’s leaner, solving harder problems with up to 60% fewer tokens on SWE-Bench Verified, proving that higher accuracy and greater efficiency are no longer a trade-off.

A comparison table of coding benchmarks for MAI-Code-7-Flash and Claude Haiku 4.5, showing pass rates and average token usage for four benchmarks, with MAI-Code-7-Flash outperforming in all categories.

Math, Science, Instruction Following, and Agentic coding tasks

Bar chart comparing four benchmark scores (IF Bench, Advanced IF, Robust IF, τ¹-Bench) for MAI-Code-1-Flash and Claude Haiku 4.5, with MAI-Code-1-Flash consistently scoring higher in all categories.

MAI-Code-1-Flash comes out ahead on every benchmark in the table, with the widest margin on IF Bench precise instruction following (+28.9) and the narrowest on rubric-based Advanced IF (+14.5). The strong instruction-following carries over to agentic tool use.

Furthermore, MAI-Code-1-Flash also outperforms Claude Haiku-4.5 on core reasoning capabilities in math, science, and visual generation coding.

A comparison table shows benchmarks for MAI-Code-T-Flash and Claude Haiku 4.5, listing accuracy and average token usage (K) for tasks like math, science, text reasoning, and coding. MAI-Code-T-Flash leads in all benchmarks.

Standard benchmarks reward memorization as much as reasoning, for example a model that has seen the Monty Hall problem will answer it correctly, but invert the prizes and it fails. We built a 186-question, 34-category benchmark around adversarial traps like inverted classics, impossible tasks, and underdetermined scenarios to see whether models were actually reasoning or just pattern-matching. MAI-Code-1-Flash surpasses Claude Haiku 4.5 overall and reached 85.8% adjusted accuracy, with especially strong performance in reasoning, instruction-following, and recognizing impossible problems. We also see room for the model to grow, since core adversarial categories like Einstellung traps remained below 50% accuracy.

Try it out

MAI-Code-1-Flash is now rolling out to VS Code GitHub Copilot individual users. No additional setup is required. As the rollout progresses, you may see GitHub Copilot route tasks to MAI-Code-1-Flash through the Auto picker, or see the model available directly in the model picker.

Here are a few fun sample apps we built with MAI-Code-1-Flash in VS Code:

We would love to hear from you! Please join the GitHub Community to share your feedback.

Build the Future With Us

We’re a lean, fast-moving lab made up of some of the world’s most talented minds. We have an exciting roadmap of compute at MAI, with our next-generation GB200 cluster now operational. And we have an ambitious mission we truly believe in. We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!

Explore all jobs

Related Stories

Building a hill-climbing machine: Launching seven new MAI models

announcements

MAI-Image-2.5 launches at No. 2 for image editing on Arena

models

Introducing MAI-Voice-2

models

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hacker News — AI on Front Page