Batched reward model inference and Best-of-N sampling

(raw.sh)

33 points | by rawsh 4 days ago ago

No comments yet.