Scalable Training of Mixture-of-Experts Models with Megatron Core

(arxiv.org)

2 points | by matt_d 10 hours ago ago

No comments yet.