Q8 KV cache lets a 30B model fit 100K context on a 24 GB RTX 5090

(buraak.com)

2 points | by bozdemir 7 hours ago ago

No comments yet.