VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

(arxiv.org)

333 points | by timhigins 16 hours ago ago

175 comments