I post-trained Qwen3-Coder to effectively use a real debugger harness (stepping through code, inspecting runtime state, and reasoning from execution traces) instead of reasoning only over the static code.
Results on 27 held-out bug tasks:
- Base model + debugger harness (no RL): 67% solve rate, only ~5% fewer turns
- After post-training with RL: 70% → 89% solve rate, 46 → 19 median turns (−59%)
The key was teaching the model how to use the debugger effectively. Just giving the base model the harness did not help.
More detailed results and a full blog post coming soon. Happy to answer questions in the meantime!
I post-trained Qwen3-Coder to effectively use a real debugger harness (stepping through code, inspecting runtime state, and reasoning from execution traces) instead of reasoning only over the static code.
Results on 27 held-out bug tasks:
- Base model + debugger harness (no RL): 67% solve rate, only ~5% fewer turns
- After post-training with RL: 70% → 89% solve rate, 46 → 19 median turns (−59%)
The key was teaching the model how to use the debugger effectively. Just giving the base model the harness did not help.
More detailed results and a full blog post coming soon. Happy to answer questions in the meantime!