r/LocalLLaMA · · 1 min read

OpenMythos benchmarks

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

OpenMythos benchmarks

Hey everyone! OpenMythos benchmarks are finally here sorry it took about a week to post these.

The delay was mainly because SWE-bench results weren't matching up with Qwen 3.6 27B official numbers. Turns out Qwen used a different eval harness and also refined/filtered the benchmark problems, even there prev 3.5 (72.4 in SWE Verified ) version benchmark score is not matching with the numbers published in 3.6 (75 in SWE Verified).

https://preview.redd.it/n1hoj90rw29h1.png?width=1351&format=png&auto=webp&s=fb03ba37f908b8b5cc1c170434084dc47cd3ced9

Anyway, here are the results across SWE-bench Pro, CyberGym, and cybench.
OpenMythos holds up pretty well for a small cybersecurity-focused model! But it has capability to do better. So, will train it further.

Also huge thanks to u/giveen for
GGUF version: https://huggingface.co/jabbatheduck/OpenMythos-GGUF

Demo: https://huggingface.co/spaces/build-small-hackathon/OpenMythos

Model: https://huggingface.co/build-small-hackathon/OpenMythos

submitted by /u/RealKingNish
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA