DeepSeek: Inference-Time Scaling for Generalist Reward Modeling

(arxiv.org)

158 points | by tim_sw 3 days ago ago

34 comments