LLM Speedrunner: Eval for frontier models to reproduce scientific findings

(github.com)

1 points | by zerojames 6 hours ago ago

No comments yet.