Terminal-Bench Challenges: long-horizon, token-intensive, single-task benchmarks

(tbench.ai)

2 points | by matt_d 8 hours ago ago

No comments yet.