r/LocalLLaMA · · 1 min read

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Is this accurate? I use DS v4 in OpenCode and find it nearly on par with Sonnet 4.6, so I'm surprised the score is so low.

https://preview.redd.it/u9ccy5h8hg4h1.png?width=2042&format=png&auto=webp&s=1a7ccb98d449a07c87621703d1af2851fdbd4afe

https://deepswe.datacurve.ai/

submitted by /u/Federal_Spend2412
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA