Long-Context Attention from Kernel Efficiency to Distributed Context Parallelism

(arxiv.org)

1 points | by PaulHoule 10 hours ago ago

No comments yet.