Kimi introduces Attention Residuals: 1.25x compute performance at <2% overhead

(arxiv.org)

7 points | by nekofneko 6 hours ago ago

No comments yet.