edited to clarify, thanks for pointing it out. It wouldn't be responsible for us to only publish when we got to the same stage for SHA-256, since at that point TLS and other certificates would be considered compromised.
The neat thing about bitcoin is that the incentive to break it is so high that it would almost certainly be the first place you would learn that SHA2 had been broken. Not on a website like this. I can verify its integrity by opening robinhood on my phone.
>The neat thing about bitcoin is that the incentive to break it is so high that it would almost certainly be the first place you would learn that SHA2 had been broken.
We actually see the incentive in the other direction, if we were able to reduce the search space for bitcoin proof-of-work (by applying thousands of higher-order algabraic theorems end-to-end to reduce the search space somewhat[1]), we would be financially incentivized not to tell anyone and mine at a discount. The financial incentive is against open research and disclosure. We don't get anything out of disclosing this except a neat publication.
[1] interestingly, ASICs (which are usually used to mine bitcoin) basically encode every operation verbatim, they don't use higher order mathematics at all. However, reducing mining complexity is not really on the horizon, even with our latest approaches, since it would require end-to-end complete control over the double-SHA-256 pipeline. That's considerably harder than just finding a collision when you're allowed to search just the tail part (the final rounds).
> Secure hash functions are used to make a short version of a large file. Ideally, it has several properties including making it infeasible to find two files with the same cryptographic hash. We've just gotten 92% of the way there. This has security ramifications in that other researchers are expected to be able to complete the work through similar methods as explored in the paper. We weren't sure if this was a remarkable result, since it's not a full collision
I thought this meant they were able to generate collisions for 92% of files/hashes they tried, but it sounds like they're able to generate hashes that are 92% identical?
Possible. It's up to people to decide if they're OK with a known 92% collision out there (with the unknown being there could be a 100%), or go for something stronger.
Thanks, you have this exactly right. The unknown part is especially worrying because we didn't implement many of the strongest ways to make to the final stretch yet, i.e. Wang-style message modification. Our result is basically a very strong direction in this cryptographic research, but not a full break yet.
Thank you for pointing out that that section could be clearer. I've now updated it. It now reads:
>We've just gotten 92% of the way to finding a single collision (this means that there is no full collision yet.). This has security ramifications in that other researchers are expected to be able to complete the work through similar methods as explored in the paper, and eventually produce collisions at will. We weren't sure if this was a remarkable result, since it's not a full collision, but we shared the work with the leading cryptographer in the field, who holds the world records in reduced-round attacks, and got great encouragement to proceed to publish it as a paper, so we did so.
(if we had found a single full collision, we would have just written "we broke SHA-256". This is 92% of the way to a full collision. Any collision is considered a great reduction in the security of the hash, because it means that there two different files with the same cryptographic hash. This is what happened to other algorithms such as MD5, as demonstrated in the linked tool.)
This is a really funny comment. In setting the world record for Li's 39-round collision[1] (still unbroken, and one of our favorite papers), he also set some records in sha-224, reaching 40 rounds in that one. Of course, saying sha-224 is "87% of the way" to sha-256 is correct in a sense, and that's why his record is slightly larger in reduced-round full-schedule collisions on that metric, 40 rounds for sha-224 and only 39 in sha-256. At the same time, the fact that he reached only 39/40 rounds on those shows the difficulty of getting through the full 64 rounds, which is what our paper does with a slightly relaxed schedule adherence.
I looked into citation [5] since it sounded interesting but the DOI link has been hallucinated and goes to some other article. I assume many of the others are similarly bogus.
I'd expect a finding / paper like this to be submitted to the IACR ePrint server [1] to bring it to the attention of the cryptographic community. I can't see that it's been submitted yet.
Venue should not imply credibility but in this case it would certainly help bring the proper scrutiny.
You can verify the certificates yourself or just wait for us to make an end-to-end collision generator as we did for MD5[1] - you can use that to generate a collision in seconds on your phone or any computer. If you wait for us to complete the end to end collision, in a sense it will be a little too late as TLS certificates and other security that relies on SHA-256 needs time to move away. We think it's responsible to disclose at this stage, and as mentioned, our peer reviewer said it is a "very good result" that is "worth publishing". We've gone to great pains to make our method completely reproducible, even writing in the article that we'll help anyone who is having trouble with any part.
That's a direct lie, just read the page you ostensibly wrote. It contains several times the imperative "support us" and talk about paying your bills, which is obviously asking for money.
You know what, fuck this. It's Friday night and I'm talking to a very low capability bot, this is bullshit.
Hacker News needs to do better than allowing this trash to the front page, else I'm just done.
Yes, I'm the author of the paper. It's received more than a tiny bit of peer review. I'm happy to answer any questions about it or answer anything that is unclear.
No, or we would have said so. It means that by relaxing the equations schedule somewhat, we are able to find a pair of differing messages that produce the same digest. However, we only relax the schedule a little bit, we still enforce 59 out of 64 schedule equations through the full 64 rounds - which is why we're only 92% of the way through to breaking it and not 100% of the way as we are with MD5. Importantly, we are not yet implementing the most advanced technique of Wang-style message modification, and we therefore expect that someone will be able to satisfy all 64 equations soon. This could result in an actual full-schedule, full-round collision. The previous record was only just 39 rounds out of 64 rounds, leaving 25 rounds, usually each of which mixes the message up completely. As mentioned in the paper, this attacks the problem from a different direction.
I mean, sure, you're free to wait until some team has a full collision, or free to believe it'll never happen. We've just published what we've done so far and our expectations for future directions. You can say you don't think that'll happen, it's fine.
> his report was generated on 2026-03-22 as the final artifact of the SHA-256 Cryptanalysis
Research Project. Collaboration: Robert V. (research direction, strategy) and Claude/Anthropic (implementation, computation).
This Claude guy is pretty prolific it seems.
But I'll wait for some known cryptographers to chime in
> it is possible that we'll find relations that carry across the entire double-SHA-256 pipeline
Bitcoin mining is a partial second preimage of 0x00 though, not a collision, that statement just seems to be so outside the realm of what they’re claiming to have done. Even MD5, the most widely known to be broken hash, would be secure when used in the same way bitcoin uses SHA256 (other than being too short now, bitcoin miners have done 80 bits of work at this point many times over).
Also, a collision on single-sha256 would imply a collision of double-sha256 right off the bat, since the inputs to the second round would be matching. But as you say, a collision attack doesn't do much to BTC mining.
Thanks, you're right. My "it is possible" is doing some heavy lifting there :). We've found theorems (stated in the paper) that carry through 64 rounds, so it is possible that theorems might carry through the full 128 rounds of double-SHA256. Bitcoin's proof-of-work is indeed a "partial second preimage", and constraints a certain number of leading zeros, i.e. a certain number of set bits. It's possible (there we go again) that this could leave enough wiggle room for large algabraic solvers like kissat to satisfy a large number of clauses about them. So far nobody is doing that, and ASICs are very simplistic. However, we are not making any claims about preimage attacks in this paper!
We publish this work as responsible disclosure. While a full SHA-256 collision (sr = 64) has not yet been achieved, the tools and techniques presented here represent significant methodological advances that bring it closer. Organizations relying on SHA-256 for collision resistance should begin evaluating migration paths to SHA-3 or other post-quantum hash functions. The cryptographic community should treat the collision resistance of SHA-256 as having a finite and shrinking safety margin.
In the linked work, we've broken 92% of SHA-256 across its full 64 rounds, and were encouraged to publish it by the leading cryptographer in the field (who held the previous record). Currently, SHA-256 is the basis of TLS certificates, bitcoin, and many other security applications. We think it is time to begin to migrate to other hash families, because we expect the rest of SHA-256 to fall soon.
Yeah, you're way ahead of us on the "does our proof fit in a tweet" metric! How did you get 72% of the bits to match, is there a writeup anywhere? It's very impressive. Algabraically, it seems you'd need about 2 million hashes, and around 2 million million (10^12 = 2 trillion) comparisons to go through all of them. Did you just put in the computing time, or did you use any algabraic properties?
Since you've made hashes that match at the beginning and end, you might also be interested in our exploration of alternative presentation formats that make attacks like this a little bit more difficult. We were working on a new hash and thought about how to assist people visually at the presentation level. This one tests your speed versus a typical hex presentation.[1]
As long as there is no verification of the results and their relevancy in reaching higher numbers it means as much as nearly having won the lottery by guessing 9 of the 12 numbers correctly: you did not win the lottery.
I know people (especially around here) hate it when people just post AI output, and I generally agree, since it is trivial for anyone else who is interested to do the same thing. However, the majority of the comments here are from people seemingly asking the author (or someone else) to explain how significant this is, without having taken that step themselves. So while I normally wouldn't do this, in this case it seems helpful. Claude thought the paper was interesting and had a novel cryptographic technique, but that the claims of near-term breaking of the SHA-256 algorithm to be unsupported. Here's the conversation:
That's not how this works, though. I don't care if the method is interesting. I care if it works. I can write an interesting proof that P=NP but that doesn't make it valid.
It's on the author to explain what they mean. Here, they haven't.
Does the fact that Claude wrote the paper help Claude to think the paper was interesting? <facepalm> I'd suggest sticking to your "I don't normally do this" idea
The "Intermediate Report" [1] lists the authors as "Robert V. and Claude (Anthropic)". Is there any reason to believe this is not AI hallucinations?
[1] https://stateofutopia.com/papers/2/intermediate-report.pdf
[flagged]
If you can't tell the difference between MD5 and SHA-256, you should not be making claims such as the one in the title.
edited to clarify, thanks for pointing it out. It wouldn't be responsible for us to only publish when we got to the same stage for SHA-256, since at that point TLS and other certificates would be considered compromised.
> Great question, and you're right to be skeptical.
Hi Claude! You're absolutely right!
Got the same vibe from reading that sentence, reading AI replies on HN is so annoying…
> You can use literally any MD5 tool
> Our certificates implement the full SHA-256 algorithm
We knew MD5 is broken. Do you have a POC for breaking SHA-256, too?
The neat thing about bitcoin is that the incentive to break it is so high that it would almost certainly be the first place you would learn that SHA2 had been broken. Not on a website like this. I can verify its integrity by opening robinhood on my phone.
>The neat thing about bitcoin is that the incentive to break it is so high that it would almost certainly be the first place you would learn that SHA2 had been broken.
We actually see the incentive in the other direction, if we were able to reduce the search space for bitcoin proof-of-work (by applying thousands of higher-order algabraic theorems end-to-end to reduce the search space somewhat[1]), we would be financially incentivized not to tell anyone and mine at a discount. The financial incentive is against open research and disclosure. We don't get anything out of disclosing this except a neat publication.
[1] interestingly, ASICs (which are usually used to mine bitcoin) basically encode every operation verbatim, they don't use higher order mathematics at all. However, reducing mining complexity is not really on the horizon, even with our latest approaches, since it would require end-to-end complete control over the double-SHA-256 pipeline. That's considerably harder than just finding a collision when you're allowed to search just the tail part (the final rounds).
> Secure hash functions are used to make a short version of a large file. Ideally, it has several properties including making it infeasible to find two files with the same cryptographic hash. We've just gotten 92% of the way there. This has security ramifications in that other researchers are expected to be able to complete the work through similar methods as explored in the paper. We weren't sure if this was a remarkable result, since it's not a full collision
I thought this meant they were able to generate collisions for 92% of files/hashes they tried, but it sounds like they're able to generate hashes that are 92% identical?
Is a partial collision an indicator that it could be broken? The "we broke it" seems an exageration, but maybe that's a failure of my understanding.
Possible. It's up to people to decide if they're OK with a known 92% collision out there (with the unknown being there could be a 100%), or go for something stronger.
Thanks, you have this exactly right. The unknown part is especially worrying because we didn't implement many of the strongest ways to make to the final stretch yet, i.e. Wang-style message modification. Our result is basically a very strong direction in this cryptographic research, but not a full break yet.
Thank you for pointing out that that section could be clearer. I've now updated it. It now reads:
>We've just gotten 92% of the way to finding a single collision (this means that there is no full collision yet.). This has security ramifications in that other researchers are expected to be able to complete the work through similar methods as explored in the paper, and eventually produce collisions at will. We weren't sure if this was a remarkable result, since it's not a full collision, but we shared the work with the leading cryptographer in the field, who holds the world records in reduced-round attacks, and got great encouragement to proceed to publish it as a paper, so we did so.
(if we had found a single full collision, we would have just written "we broke SHA-256". This is 92% of the way to a full collision. Any collision is considered a great reduction in the security of the hash, because it means that there two different files with the same cryptographic hash. This is what happened to other algorithms such as MD5, as demonstrated in the linked tool.)
What does "92% of the way" mean? 92% of what? How is that percentage measured?
I've now answered this in the writeup (point 11).
Well, try sha2-224. It’s 87% of the way to sha2-256. /s
This is a really funny comment. In setting the world record for Li's 39-round collision[1] (still unbroken, and one of our favorite papers), he also set some records in sha-224, reaching 40 rounds in that one. Of course, saying sha-224 is "87% of the way" to sha-256 is correct in a sense, and that's why his record is slightly larger in reduced-round full-schedule collisions on that metric, 40 rounds for sha-224 and only 39 in sha-256. At the same time, the fact that he reached only 39/40 rounds on those shows the difficulty of getting through the full 64 rounds, which is what our paper does with a slightly relaxed schedule adherence.
[1] https://eprint.iacr.org/2024/349.pdf
I looked into citation [5] since it sounded interesting but the DOI link has been hallucinated and goes to some other article. I assume many of the others are similarly bogus.
Fixed, thank you and my apologies for the oversight. The titles were accurate and we consulted those works in preparing this work.
You haven't fixed much, you're linking to a real paper now but it's about SHA-1 collisions.
I'd expect a finding / paper like this to be submitted to the IACR ePrint server [1] to bring it to the attention of the cryptographic community. I can't see that it's been submitted yet.
Venue should not imply credibility but in this case it would certainly help bring the proper scrutiny.
[1] https://eprint.iacr.org/
You can verify the certificates yourself or just wait for us to make an end-to-end collision generator as we did for MD5[1] - you can use that to generate a collision in seconds on your phone or any computer. If you wait for us to complete the end to end collision, in a sense it will be a little too late as TLS certificates and other security that relies on SHA-256 needs time to move away. We think it's responsible to disclose at this stage, and as mentioned, our peer reviewer said it is a "very good result" that is "worth publishing". We've gone to great pains to make our method completely reproducible, even writing in the article that we'll help anyone who is having trouble with any part.
[1] https://stateofutopia.com/experiments/md5collider
Are you sure you asked enough times for money on the website? I only counted 5 instances, not counting the AI-produced PDF doc.
I didn't ask for money on the website.
That's a direct lie, just read the page you ostensibly wrote. It contains several times the imperative "support us" and talk about paying your bills, which is obviously asking for money.
You know what, fuck this. It's Friday night and I'm talking to a very low capability bot, this is bullshit.
Hacker News needs to do better than allowing this trash to the front page, else I'm just done.
Thanks, I didn't realize I'd made it to the front page. I'll make it clearer that you are not paying us any money if you choose to visit our sponsor.
Their homepage states this is some sort of "AI-governed nation" https://stateofutopia.com/
Is this real? The website does not look credible.
This hn post is made by author of the paper. It needs even a tiny bit of peer review.
Yes, I'm the author of the paper. It's received more than a tiny bit of peer review. I'm happy to answer any questions about it or answer anything that is unclear.
By which peers?
For a shorter executive summary, what does "broke" mean here? Can you reliably produce collisions now for 92% of SHA-256 digests?
No, or we would have said so. It means that by relaxing the equations schedule somewhat, we are able to find a pair of differing messages that produce the same digest. However, we only relax the schedule a little bit, we still enforce 59 out of 64 schedule equations through the full 64 rounds - which is why we're only 92% of the way through to breaking it and not 100% of the way as we are with MD5. Importantly, we are not yet implementing the most advanced technique of Wang-style message modification, and we therefore expect that someone will be able to satisfy all 64 equations soon. This could result in an actual full-schedule, full-round collision. The previous record was only just 39 rounds out of 64 rounds, leaving 25 rounds, usually each of which mixes the message up completely. As mentioned in the paper, this attacks the problem from a different direction.
I don't believe a word of this.
I mean, sure, you're free to wait until some team has a full collision, or free to believe it'll never happen. We've just published what we've done so far and our expectations for future directions. You can say you don't think that'll happen, it's fine.
From https://stateofutopia.com/papers/2/intermediate-report.pdf
> his report was generated on 2026-03-22 as the final artifact of the SHA-256 Cryptanalysis Research Project. Collaboration: Robert V. (research direction, strategy) and Claude/Anthropic (implementation, computation).
This Claude guy is pretty prolific it seems.
But I'll wait for some known cryptographers to chime in
> it is possible that we'll find relations that carry across the entire double-SHA-256 pipeline
Bitcoin mining is a partial second preimage of 0x00 though, not a collision, that statement just seems to be so outside the realm of what they’re claiming to have done. Even MD5, the most widely known to be broken hash, would be secure when used in the same way bitcoin uses SHA256 (other than being too short now, bitcoin miners have done 80 bits of work at this point many times over).
Also, a collision on single-sha256 would imply a collision of double-sha256 right off the bat, since the inputs to the second round would be matching. But as you say, a collision attack doesn't do much to BTC mining.
Thanks, you're right. My "it is possible" is doing some heavy lifting there :). We've found theorems (stated in the paper) that carry through 64 rounds, so it is possible that theorems might carry through the full 128 rounds of double-SHA256. Bitcoin's proof-of-work is indeed a "partial second preimage", and constraints a certain number of leading zeros, i.e. a certain number of set bits. It's possible (there we go again) that this could leave enough wiggle room for large algabraic solvers like kissat to satisfy a large number of clauses about them. So far nobody is doing that, and ASICs are very simplistic. However, we are not making any claims about preimage attacks in this paper!
Hey Claude,
Do some research and write a paper about breaking Bitcoin.
Seems more like a case study in AI psychosis
Indeed, the text feels very LLM-written.
I'm skeptical.
S-tier schizoposting
Long time reader first time poster here...
What is the verdict (humans)?
AI slop research or modern cryptography (and society) flushed down the toilet overnight?
I can't immediately tell from the thread so far... :)
My vote: horseshit.
Sorry, there’s not much of a way I can say that more politely and still accurately convey my opinion.
We publish this work as responsible disclosure. While a full SHA-256 collision (sr = 64) has not yet been achieved, the tools and techniques presented here represent significant methodological advances that bring it closer. Organizations relying on SHA-256 for collision resistance should begin evaluating migration paths to SHA-3 or other post-quantum hash functions. The cryptographic community should treat the collision resistance of SHA-256 as having a finite and shrinking safety margin.
At this point we need AI filtering out the slop being constantly submitted to HN.
ROFL
[dead]
In the linked work, we've broken 92% of SHA-256 across its full 64 rounds, and were encouraged to publish it by the leading cryptographer in the field (who held the previous record). Currently, SHA-256 is the basis of TLS certificates, bitcoin, and many other security applications. We think it is time to begin to migrate to other hash families, because we expect the rest of SHA-256 to fall soon.
I believe I hold the actual record for most colliding bits in full-round SHA256 (72% of bits matching). My proof fits in a tweet, why doesn't yours?
https://news.ycombinator.com/item?id=38668893
(Also my work does not demonstrate any weakness in SHA256, it's just an application of the birthday paradox)
Yeah, you're way ahead of us on the "does our proof fit in a tweet" metric! How did you get 72% of the bits to match, is there a writeup anywhere? It's very impressive. Algabraically, it seems you'd need about 2 million hashes, and around 2 million million (10^12 = 2 trillion) comparisons to go through all of them. Did you just put in the computing time, or did you use any algabraic properties?
Since you've made hashes that match at the beginning and end, you might also be interested in our exploration of alternative presentation formats that make attacks like this a little bit more difficult. We were working on a new hash and thought about how to assist people visually at the presentation level. This one tests your speed versus a typical hex presentation.[1]
[1] https://claude.ai/public/artifacts/05e8b21b-fb31-4c07-83e2-5...
Why omit the name of the leading cryptographer in the field?
They specifically call out Yingxin Li[1] in the acknowledgements section of the paper?
[1] https://eprint.iacr.org/2024/349
Pretty sure his first name is Claude. He is quite good I hear ;-)
shallow broad vague boastful and wordy, this way you know the LLM is nearby...
What does it mean to “break broken 92% of SHA-256“?
As long as there is no verification of the results and their relevancy in reaching higher numbers it means as much as nearly having won the lottery by guessing 9 of the 12 numbers correctly: you did not win the lottery.
Go seek a mental health professional and never post here again until you have been diagnosed and medicated.
I know people (especially around here) hate it when people just post AI output, and I generally agree, since it is trivial for anyone else who is interested to do the same thing. However, the majority of the comments here are from people seemingly asking the author (or someone else) to explain how significant this is, without having taken that step themselves. So while I normally wouldn't do this, in this case it seems helpful. Claude thought the paper was interesting and had a novel cryptographic technique, but that the claims of near-term breaking of the SHA-256 algorithm to be unsupported. Here's the conversation:
https://claude.ai/share/b10b95ef-5d9f-43dd-9005-3d1d89f9dbc1
That's not how this works, though. I don't care if the method is interesting. I care if it works. I can write an interesting proof that P=NP but that doesn't make it valid.
It's on the author to explain what they mean. Here, they haven't.
Claude didn't "think" anything
Does the fact that Claude wrote the paper help Claude to think the paper was interesting? <facepalm> I'd suggest sticking to your "I don't normally do this" idea
Extraordinary claims require extraordinary evidence, and the burden of proof lies with the one making the claim.
See also https://en.wikipedia.org/wiki/Brandolini%27s_law -
> The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.