Probes trace an emergent jailbreak in OLMo 2 to mislabeled training data

(lesswrong.com)

1 points | by aranguri 9 hours ago ago

No comments yet.