Might be worth explicitly stating popular/known datapoints you were unable to include or evaluate yet.
For example, the fastest inference provider by multiples as of this writing, Cerebras, is missing. It's popular, so I'm surprised it was missed: https://news.ycombinator.com/item?id=42178761 and makes me wonder if other evaluations are missing.
I hope this is useful. It was created using Sonnet 3.5 + o1 + Cursor.
Let me know if you have any feedback! Thanks.
PS:
It's hard to compare providers' quality because they use different precision at inference. Also, some labs cherry pick the benchmarks they want to report for their models. Medium term goal is to run the evals myself.
Might be worth explicitly stating popular/known datapoints you were unable to include or evaluate yet.
For example, the fastest inference provider by multiples as of this writing, Cerebras, is missing. It's popular, so I'm surprised it was missed: https://news.ycombinator.com/item?id=42178761 and makes me wonder if other evaluations are missing.
See also a similar (commercial AFAIK) project: https://artificialanalysis.ai/
Hey HN, I built this site to compare provider metrics and benchmark results between models.
All the data is open and with references. Here: https://github.com/JonathanChavezTamales/LLMStats
I hope this is useful. It was created using Sonnet 3.5 + o1 + Cursor.
Let me know if you have any feedback! Thanks.
PS:
It's hard to compare providers' quality because they use different precision at inference. Also, some labs cherry pick the benchmarks they want to report for their models. Medium term goal is to run the evals myself.
looks awesome, it's been getting harder to stay on top of new updates across all models. hoping this helps with that!