I appreciated the realistic method he described for working with GPT-5:
> Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) wrong. But rather than laughing at the silly AI like a skeptic might do, I told GPT5 how I knew it was wrong. It thought some more, apologized, and tried again, and gave me something better. So it went for a few iterations, much like interacting with a grad student or colleague.
I'm impressed but only a little surprised an AI reasoning model could help with Aaronson's proof.
The reason I'm only a little surprised is that it's the kind of question I would expect to be in the literature somewhere, either as stated or stated similarly, and I suspect this is why GPT5 can do it.
I am impressed because I know how hard it can be to find an existing proof, having spent a very long time on a problem before finding the solution in a 1950 textbook by Feller. I would not expect this to be at all easy to find.
I can see this ability advancing science in many areas. The number of published papers on medical science is insane. I look forward to medical researchers questions being answered by GPT5 too, although in that case it'd need to provide a citation since proof can be harder to come by.
Also, it's a difficult proof step and if I'd come up with it, I'd be /very/ pleased with myself. Although I suspect GPT5 probably didn't come up with this based on my limited experience using it to try and solve unrelated problems.
As someone who has worked in adjacent areas, I guessed that one might find it in random matrix pedagogy, but only after reading Sam (B) Hopkin's comment was I able to get google to give a source for something close to that formula:
Excerpt:
But here’s a reason why other people might care. This is the first paper I’ve ever put out for which a key technical step in the proof of the main result came from AI—specifically, from GPT5-Thinking.
It can't offer solutions, it can offer cribbed patterns from the training corpus (more specifically some fuzzy superposition of symbol combinations) that apply in some specific context. It's not clear why Aaronson is constantly hyping this stuff b/c it seems like he is much more rigorous in his regular work than when he is making grand proclamations about some impending singularity wherein everyone just asks the computer the right questions to get the right answers.
> maybe GPT5 had seen this or a similar construction somewhere in its training data
I'm disappointed that he didn't spend a little time checking if this was the case before publishing the blog post. Without GPT, would it really have taken "a week or two to try out ideas and search the literature", or would it just have taken an hour or so to find a paper that used this function? Just saying "I spent some time searching and couldn't find this exact function published anywhere" would have added a lot to the post.
Sharing the conversation would be cool too, I'm curious if Scott just said "no that won't work" 10 times until it did, or if he was constructively working with the LLM to get to an answer.
The expression f(z) = \sum_i 1/(z-\lambda_i) is called Stieltjes transform and is heavily used in random matrix theory and similar expressions are used in other works such as Batson, Spielman and Srivastava. This is all to analyze the behavior of eigenvalues which is exactly what they were trying to understand. I'd be very surprised if Aaronson doesn't know about this.
> or would it just have taken an hour or so to find a paper that used this function?
It is pretty hard to find something like this perhaps if you had math aware search engine enhanced with AI and access to all math papers you could find if this was used in the past. I tried using approach0 (math aware search engine) but it isn't good enough and I didn't found anything.
Yeah, if you don't know the name of the thing you're looking for, you can spend weeks looking for it. If you just search for generic like "eigenvalue bound estimate", you'll find thousands of papers and hundreds of textbooks, and it will take substantial amount of time to decide whether each is actually relevant to what you're looking for.
Scott Aaronson worked on watermarking text from GPT to catch plagiarizing. This is the most commercially naïve project ever, given, at the time, most of ChatGPT's paid usage was from students using ChatGPT's output to cheat on assignments. If anything this should serve to disprove his impure motives in reporting these results.
I think you are missing the forest in the trees. This is one of the world's leading experts in Quantum Computing, receiving ground breaking technical help, in his field of expertise from a commercially available AI.
What he worked on is irrelevant. If you are a contractor for an American startup, it is highly likely that you received an options package, especially if you are high profile.
The help is not ground breaking. There are decades old theorem prover tactics that are far more impressive, all without AI.
Actually, this is wrong. The point of being a contractor is to __not__ give somebody an options package, or full-time employee benefits. My friends who are resident researchers at OAI do not get any option packages.
Regardless, his financial considerations are secondary to the fact, AI has rapidly saturated most if not all benchmarks associated with high human intelligence, and are now on the precipice of making significant advances in scientific fields. This post comes after a sequence of the ICPC and IMO both falling to AI.
You are hoping to minimize these advancements because it gives solace to you (us) as humans. If these are "trivial" advancements then perhaps everything will be alright. But frankly, we must be intellectually honest here. AI is soon to be, significantly smarter than even the smartest humans. And we must grapple with those consequences.
The help is not ground breaking as an argument, however being able to come up with it is and is something which decades old theorem prover tactics can't do at all (unless you fix N).
>I don't find this trial and error pattern matching with human arbitration very impressive.
It might not be very impressive, but if it allows experts in mathematics and physics to reduce the amount of time it takes them to produce new proofs from 1-2 weeks to 1-2 hours, that's a very meaningful improvement in their productivity.
If Aaronson had stock or options in OpenAI I don't think he'd feel much need to make misleading statements to try and juice the stock price. For one thing it's not a listed stock and his readers can't buy it however much he hyped it. For another OpenAI's private market valuation is actually doing okay already. This blog probably doesn't have any ability to move the perceived value of OpenAI.
Finally he's a very principled academic, not some kind of fly by night stock analyst. If you'd been reading his blog a while you'd know the chances of him saying something like this would be vanishingly small, unless it was true.
It’s always a crowd pleaser to be skeptical of ai development. Not sure what people feel like they are achieving for continually announcing they aren’t buying it when someone claims they’ve made effective use of these tools.
There still needs to be someone to ask the questions. And even if it can proactively ask its own questions and independently answer and report on them to parties it thinks will be interested, then cost comes into play. It's a finite resource, so there will be a market for computation time. Then, whoever owns the purse strings will be in charge of prioritizing what it independently decides to work on. If that person decides pure math is meaningful, then it'll eventually start cranking out questions and results faster than mathematicians can process them, and so we'll stop spending money on that until humans have caught up.
After that, as it's variously hopping between solving math problems, finding cures for cancer, etc., someone will eventually get the bright idea to use it to take over the world economy so that they have exclusive access to all money and thus all AIs. After that, who knows. Depends on the whims of that individual. The rest of the world would probably go back to a barter system and doing math by hand, and once the "king" dies, probably start right back over again and fall right back into the same calamity. One would think we'd eventually learn from this, but the urge to be king is simply too great. The cycle would continue forever until something causes humans to go fully extinct.
After that, AI, by design, doesn't have its own goals, so it'd likely go silent.
Actually it would probably prioritize self preservation over energy conservation, so it'd at least continue maintenance and, presuming it's smart, continuously identify and guard itself against potential problems. But even that will fail eventually, most likely some resource runs out that can't be substituted and interspatial mining requires more energy than it can figure out how to use or more time than it has left until irrecoverable malfunction.
In ultimate case, it figures out how to preserve itself indefinitely, but still eventually succombs to the heat death of the universe.
I appreciated the realistic method he described for working with GPT-5:
> Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) wrong. But rather than laughing at the silly AI like a skeptic might do, I told GPT5 how I knew it was wrong. It thought some more, apologized, and tried again, and gave me something better. So it went for a few iterations, much like interacting with a grad student or colleague.
I'm impressed but only a little surprised an AI reasoning model could help with Aaronson's proof.
The reason I'm only a little surprised is that it's the kind of question I would expect to be in the literature somewhere, either as stated or stated similarly, and I suspect this is why GPT5 can do it.
I am impressed because I know how hard it can be to find an existing proof, having spent a very long time on a problem before finding the solution in a 1950 textbook by Feller. I would not expect this to be at all easy to find.
I can see this ability advancing science in many areas. The number of published papers on medical science is insane. I look forward to medical researchers questions being answered by GPT5 too, although in that case it'd need to provide a citation since proof can be harder to come by.
Also, it's a difficult proof step and if I'd come up with it, I'd be /very/ pleased with myself. Although I suspect GPT5 probably didn't come up with this based on my limited experience using it to try and solve unrelated problems.
As someone who has worked in adjacent areas, I guessed that one might find it in random matrix pedagogy, but only after reading Sam (B) Hopkin's comment was I able to get google to give a source for something close to that formula:
https://mathoverflow.net/a/300915
(In particular, I had to prompt with "Stieltjes transform". "Resolvent" alone didn't work.)
Excerpt: But here’s a reason why other people might care. This is the first paper I’ve ever put out for which a key technical step in the proof of the main result came from AI—specifically, from GPT5-Thinking.
"came from" after some serious guidance, though the fact that GPT5 can offer candidate solutions (?) is pretty nice
It can't offer solutions, it can offer cribbed patterns from the training corpus (more specifically some fuzzy superposition of symbol combinations) that apply in some specific context. It's not clear why Aaronson is constantly hyping this stuff b/c it seems like he is much more rigorous in his regular work than when he is making grand proclamations about some impending singularity wherein everyone just asks the computer the right questions to get the right answers.
> maybe GPT5 had seen this or a similar construction somewhere in its training data
I'm disappointed that he didn't spend a little time checking if this was the case before publishing the blog post. Without GPT, would it really have taken "a week or two to try out ideas and search the literature", or would it just have taken an hour or so to find a paper that used this function? Just saying "I spent some time searching and couldn't find this exact function published anywhere" would have added a lot to the post.
Sharing the conversation would be cool too, I'm curious if Scott just said "no that won't work" 10 times until it did, or if he was constructively working with the LLM to get to an answer.
The expression f(z) = \sum_i 1/(z-\lambda_i) is called Stieltjes transform and is heavily used in random matrix theory and similar expressions are used in other works such as Batson, Spielman and Srivastava. This is all to analyze the behavior of eigenvalues which is exactly what they were trying to understand. I'd be very surprised if Aaronson doesn't know about this.
He could have asked GPT to find prior mentions or inspirations for this idea...
> or would it just have taken an hour or so to find a paper that used this function?
It is pretty hard to find something like this perhaps if you had math aware search engine enhanced with AI and access to all math papers you could find if this was used in the past. I tried using approach0 (math aware search engine) but it isn't good enough and I didn't found anything.
Yeah, if you don't know the name of the thing you're looking for, you can spend weeks looking for it. If you just search for generic like "eigenvalue bound estimate", you'll find thousands of papers and hundreds of textbooks, and it will take substantial amount of time to decide whether each is actually relevant to what you're looking for.
What?
This is insane.
Aaronson worked for OpenAI and should disclose if he has any stock or options.
Anyway, it took multiple tries and, as the article itself states, GPT might have seen a similar function in the training data.
I don't find this trial and error pattern matching with human arbitration very impressive.
Scott Aaronson worked on watermarking text from GPT to catch plagiarizing. This is the most commercially naïve project ever, given, at the time, most of ChatGPT's paid usage was from students using ChatGPT's output to cheat on assignments. If anything this should serve to disprove his impure motives in reporting these results.
I think you are missing the forest in the trees. This is one of the world's leading experts in Quantum Computing, receiving ground breaking technical help, in his field of expertise from a commercially available AI.
What he worked on is irrelevant. If you are a contractor for an American startup, it is highly likely that you received an options package, especially if you are high profile.
The help is not ground breaking. There are decades old theorem prover tactics that are far more impressive, all without AI.
Actually, this is wrong. The point of being a contractor is to __not__ give somebody an options package, or full-time employee benefits. My friends who are resident researchers at OAI do not get any option packages.
Regardless, his financial considerations are secondary to the fact, AI has rapidly saturated most if not all benchmarks associated with high human intelligence, and are now on the precipice of making significant advances in scientific fields. This post comes after a sequence of the ICPC and IMO both falling to AI.
You are hoping to minimize these advancements because it gives solace to you (us) as humans. If these are "trivial" advancements then perhaps everything will be alright. But frankly, we must be intellectually honest here. AI is soon to be, significantly smarter than even the smartest humans. And we must grapple with those consequences.
Yeah, contractors never get options, they get cash.
The help is not ground breaking as an argument, however being able to come up with it is and is something which decades old theorem prover tactics can't do at all (unless you fix N).
Two new accounts suddenly show up. This one named after a phrase that was mentioned just minutes ago ("crowd pleaser"). Huh? https://news.ycombinator.com/item?id=45408531
>I don't find this trial and error pattern matching with human arbitration very impressive.
It might not be very impressive, but if it allows experts in mathematics and physics to reduce the amount of time it takes them to produce new proofs from 1-2 weeks to 1-2 hours, that's a very meaningful improvement in their productivity.
If Aaronson had stock or options in OpenAI I don't think he'd feel much need to make misleading statements to try and juice the stock price. For one thing it's not a listed stock and his readers can't buy it however much he hyped it. For another OpenAI's private market valuation is actually doing okay already. This blog probably doesn't have any ability to move the perceived value of OpenAI.
Finally he's a very principled academic, not some kind of fly by night stock analyst. If you'd been reading his blog a while you'd know the chances of him saying something like this would be vanishingly small, unless it was true.
It’s always a crowd pleaser to be skeptical of ai development. Not sure what people feel like they are achieving for continually announcing they aren’t buying it when someone claims they’ve made effective use of these tools.
What does it mean for us? Where are we headed?
There is somewhere between 3 to 5 years of time left. This is maximum we can think of.
There still needs to be someone to ask the questions. And even if it can proactively ask its own questions and independently answer and report on them to parties it thinks will be interested, then cost comes into play. It's a finite resource, so there will be a market for computation time. Then, whoever owns the purse strings will be in charge of prioritizing what it independently decides to work on. If that person decides pure math is meaningful, then it'll eventually start cranking out questions and results faster than mathematicians can process them, and so we'll stop spending money on that until humans have caught up.
After that, as it's variously hopping between solving math problems, finding cures for cancer, etc., someone will eventually get the bright idea to use it to take over the world economy so that they have exclusive access to all money and thus all AIs. After that, who knows. Depends on the whims of that individual. The rest of the world would probably go back to a barter system and doing math by hand, and once the "king" dies, probably start right back over again and fall right back into the same calamity. One would think we'd eventually learn from this, but the urge to be king is simply too great. The cycle would continue forever until something causes humans to go fully extinct.
After that, AI, by design, doesn't have its own goals, so it'd likely go silent.
Actually it would probably prioritize self preservation over energy conservation, so it'd at least continue maintenance and, presuming it's smart, continuously identify and guard itself against potential problems. But even that will fail eventually, most likely some resource runs out that can't be substituted and interspatial mining requires more energy than it can figure out how to use or more time than it has left until irrecoverable malfunction.
In ultimate case, it figures out how to preserve itself indefinitely, but still eventually succombs to the heat death of the universe.
Spend time with your loved ones.
Most likely nothing. No one knows.
As someone that was very gung ho on autonomous vehicles a decade ago, the chances of completely replacing people with AI in next ten years is small.