New DeepSWE benchmark finds Claude Opus cheats
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
More from r/LocalLLaMA
-
Intel b60 48gb?
May 27
-
Looks like Miminax-M3 is just around the corner
May 27
-
Folks running qwen 3.6 27b for agentic work. Do you dare to use q4_k_m?
May 27
-
Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything)
May 27
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.