Show HN: LLM Debate Benchmark

(github.com)

7 points | by zone411 9 hours ago ago

3 comments

$valentinconan 9 hours ago

Nice work ! Did you plan to run tests with Opus 4.6 using Max reasoning ? I'm curious to see if there's really a difference compared to “High” mode.
[-]