Top model scores may be skewed by Git history leaks in SWE-bench

(github.com)

359 points | by mustaphah 12 hours ago ago

114 comments