I got 19x. When they say "curious about" it's always a good sign that it's AI, same with X not Y construction, saying "genuinely", saying things like "absolutely slaps" and other millennial slang, being overly positive: generally sounding like the transcript of an Instagram food review. When they're trying to be casual they seem to default to some kind of 2017 millennial stereotype. Typos and "edit:" are always a good sign that it's human, so I'm sure people will start adding those in to AI-generated text to seem more real
Thank you for trying! I first built it as 'detect the human' response, but that was counter to the 'slop or not' framing. Yeah I'm also observing the same based on the first few hundred people's results. The harder models seem to write almost too well and that's generally not how humans write on the internet unless it is a blog post/essay. The easier models seem to be the ones that are tripping people the most.
thank you so much for taking a look :) Yeah you'd be surprised how difficult it can get to spot nuances sometimes. Sometimes, there isn't any nuance at all and the AI is just as good at writing about something pretending to know about the topic.
Nice idea! Em dashes were giveaways for AIs and typos for human, at least in the ones I did, so those are at least trivial. So might have to do some filtering at least for those.
Some were hard though, yeah (at least if not looking longer than 5-10 seconds). Btw, it seemed more logical to me to just see a green/red card when you click, i.e. right choice or wrong choice. Getting red for the correct answer confused me a bit (but this might just be me).
Also for example this one has a giveaway for the human case: "There are lots of great people here at /r/personalfinance" (actually, not sure if that is a giveaway, that was my guess, but depends on how the model was prompted, I guess). And human ones often seem to have two spaces sometimes instead of one, idk why. If you want to get a serious dataset, maybe you could use this one to find all the flaws and perfect it, and then try to get a real dataset from the next one? People will be more eager to help too if they've seen you designed it all very carefully. (Or you could filter the results from this one to make it a good dataset if you get lots of responses.)
You'd be surprised at the nuances we tend to miss :)
This time around I prompted the models not necessarily to be adversarial - i didn't ask them to try and fool the reader. But i gave them contextual info - something to the effect of "you're a user posting on hacker news"
True, if you look for all "obvious patterns" and filter those out of the dataset, not much will be left probably. Maybe the best is then to just publish as complete a dataset as possible, so all questions, all user answers, for each user the nr of questions they did, time for each question, etc. Then people using that dataset can draw their own conclusions.
Thanks for checking it out! The color signal is useful feedback. Let me think about it and rework!
Yeah there are some very obvious tells, but the models that are most capable are very good at writing like human.
Especially when the human responses for reddit or HN prompts were presumably made after reading the content of the article or the post; whilw the model is simply going off of the title.
Ha :) I'm not building models, nor am I affiliated with any big labs. The idea is to use this to educate people how to spot tells of AI writing. Although like any data that's made open this can be used to train future models as well I suppose.
Hey, congratulations on the final product. It even feels fun. Some are really hard, but some feel blatantly obvious. I don't know why though. I guess it's just because the way we communicate feels off when compared to AI, some times.
Thanks for checking it out! The obvious ones are (hopefully) weaker models :) but yes my experience has been unless you're engaging with human written content consistently the line really blurs easily.
I got 19x. When they say "curious about" it's always a good sign that it's AI, same with X not Y construction, saying "genuinely", saying things like "absolutely slaps" and other millennial slang, being overly positive: generally sounding like the transcript of an Instagram food review. When they're trying to be casual they seem to default to some kind of 2017 millennial stereotype. Typos and "edit:" are always a good sign that it's human, so I'm sure people will start adding those in to AI-generated text to seem more real
Thank you for taking a look! Yeah there's definitely a few tells which are noticable if you look for it :)
I keep accidentally clicking on the human one because my brain wants to treat it as “find the human content”.
FWIW, I found the “medium” one’s hardest. Most of the “hard” ones have dead giveaways in the form of either punctuation or common AI text rhythms.
Thank you for trying! I first built it as 'detect the human' response, but that was counter to the 'slop or not' framing. Yeah I'm also observing the same based on the first few hundred people's results. The harder models seem to write almost too well and that's generally not how humans write on the internet unless it is a blog post/essay. The easier models seem to be the ones that are tripping people the most.
Yeah the UX itself is clearly slop
Was able to get an 8x streak. The question that made me lose it was really hard, I basically took a guess.
Some were hard but spottable after re-reading the answers a good 10 times... ahah.
thank you so much for taking a look :) Yeah you'd be surprised how difficult it can get to spot nuances sometimes. Sometimes, there isn't any nuance at all and the AI is just as good at writing about something pretending to know about the topic.
Nice idea! Em dashes were giveaways for AIs and typos for human, at least in the ones I did, so those are at least trivial. So might have to do some filtering at least for those.
Some were hard though, yeah (at least if not looking longer than 5-10 seconds). Btw, it seemed more logical to me to just see a green/red card when you click, i.e. right choice or wrong choice. Getting red for the correct answer confused me a bit (but this might just be me).
Also for example this one has a giveaway for the human case: "There are lots of great people here at /r/personalfinance" (actually, not sure if that is a giveaway, that was my guess, but depends on how the model was prompted, I guess). And human ones often seem to have two spaces sometimes instead of one, idk why. If you want to get a serious dataset, maybe you could use this one to find all the flaws and perfect it, and then try to get a real dataset from the next one? People will be more eager to help too if they've seen you designed it all very carefully. (Or you could filter the results from this one to make it a good dataset if you get lots of responses.)
You'd be surprised at the nuances we tend to miss :)
This time around I prompted the models not necessarily to be adversarial - i didn't ask them to try and fool the reader. But i gave them contextual info - something to the effect of "you're a user posting on hacker news"
True, if you look for all "obvious patterns" and filter those out of the dataset, not much will be left probably. Maybe the best is then to just publish as complete a dataset as possible, so all questions, all user answers, for each user the nr of questions they did, time for each question, etc. Then people using that dataset can draw their own conclusions.
Thanks for checking it out! The color signal is useful feedback. Let me think about it and rework!
Yeah there are some very obvious tells, but the models that are most capable are very good at writing like human.
Especially when the human responses for reddit or HN prompts were presumably made after reading the content of the article or the post; whilw the model is simply going off of the title.
The coloring is a fair point. I was some times confused if I got the right or the wrong one XD
By playing this game I'm helping to train AI how to be less detectable?
Ha :) I'm not building models, nor am I affiliated with any big labs. The idea is to use this to educate people how to spot tells of AI writing. Although like any data that's made open this can be used to train future models as well I suppose.
Hey, congratulations on the final product. It even feels fun. Some are really hard, but some feel blatantly obvious. I don't know why though. I guess it's just because the way we communicate feels off when compared to AI, some times.
Thanks for checking it out! The obvious ones are (hopefully) weaker models :) but yes my experience has been unless you're engaging with human written content consistently the line really blurs easily.