DSpark: Speculative decoding accelerates LLM inference [pdf]

(github.com)

653 points | by aurenvale 9 hours ago ago

245 comments