Lossless LLM compression for efficient GPU inference via dynamic-length float

(arxiv.org)

411 points | by CharlesW 6 days ago ago

121 comments