SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

(surgehq.ai)

20 points | by landonxi 13 hours ago ago

1 comments