vLLM large scale serving: DeepSeek 2.2k tok/s/h200 with wide-ep

(blog.vllm.ai)

147 points | by robertnishihara 2 months ago ago

60 comments