ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

(firethering.com)

17 points | by steveharing1 2 hours ago ago

14 comments