Tokasaurus: An LLM inference engine for high-throughput workloads

(scalingintelligence.stanford.edu)

213 points | by rsehrlich a day ago ago

31 comments