GLM-5.2 is on DeepSWE
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Side note, why does this sub dislike DeepSWE? I want to know more and did some research and found this post which has since been retracted by the original author (highly respect them as they handled the correction well and admitted bias) Another criticism was Opus 4.6 being low, which is true, but Opus 4.6 also dropped in swe-rebench since February, as I assume it's being deprecated. I'm interested in other opinions and what you think is a good benchmark. One thing that is true is that DeepSeek scores were done before the 75% discount on the v1 bench. They should be ~4-5x cheaper. [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.