r/LocalLLaMA · May 31, 2026 · 1 min read

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Is this accurate? I use DS v4 in OpenCode and find it nearly on par with Sonnet 4.6, so I'm surprised the score is so low.

Discussion (0)

No comments yet. Sign in and be the first to say something.