Latent Introspection: Models Can Detect Prior Concept Injections

(arxiv.org)

2 points | by tosh 2 days ago ago

No comments yet.