Fun concept. Unsurprisingly a lot of these are common jokes 'plagiarised' directly from the training data. I wonder if there's a way to integrate a search tool to rule out those with near-exact matches on the web.
It would be hard to do well, but something basic could probably at least make a dent in forcing creativity out of the models.
Edit: better yet - manually curate a few dozen unseen 'first halves' of jokes and have the models complete the joke
Hey HN, I made FunnyBench, a not-so-serious benchmark that lets you vote on which model tells the funniest jokes.
Many of these models are uncreative when it comes to joke diversity (often times telling the same joke over and over again), though there are a couple gems in there.
Benchmark details can be found at the bottom of the page, let me know what you think :)
Fun concept. Unsurprisingly a lot of these are common jokes 'plagiarised' directly from the training data. I wonder if there's a way to integrate a search tool to rule out those with near-exact matches on the web.
It would be hard to do well, but something basic could probably at least make a dent in forcing creativity out of the models.
Edit: better yet - manually curate a few dozen unseen 'first halves' of jokes and have the models complete the joke
Hey HN, I made FunnyBench, a not-so-serious benchmark that lets you vote on which model tells the funniest jokes.
Many of these models are uncreative when it comes to joke diversity (often times telling the same joke over and over again), though there are a couple gems in there.
Benchmark details can be found at the bottom of the page, let me know what you think :)