FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

(twitter.com)

192 points | by limoce 7 hours ago ago

81 comments