Since an arbitrarily tall stack of combining characters still counts as one grapheme cluster, if some application limits string length by counting grapheme clusters then you can stuff an unlimited amount of data in there, with "only" 2x overhead in the byte representation.
Unfortunately HN filters some of the codepoints so I can't demonstrate here. Since I chose "A" as the base character which the diacritics are stacked on, it has a similar aesthetic to the SCREAM cipher although a little more zalgo-y.
Interesting, I actually expected it to encode a single letter with infinitely long combining marks such that 'highlighting' it was just highlighting one character.
That's curious, because the only character is just the letter A. But I suppose if the font doesn't support a particular combining mark, it gives up on the whole grapheme?
HN filters some combining characters? That's weird, compared to the symbol/emoji blocking.
Also I'm reminded that the unicode normalization annex suggests that legitimate grapheme clusters will be 31 code points or less. "The value of 30 is chosen to be significantly beyond what is required for any linguistic or technical usage."
I have been using ROT13, but I’ve been looking for a post-quantum replacement so definitely I’m going to convert to SCREAM. It’s generally understood that qubits are unable to represent or even discern the little squiggly bits above normal Latin letters.
Thank you for this important contribution to cryptography!
The important part about applying ROT13 is the number of iterative applications. The security of even-numbered applications is undeniable. Odd-numbered is even better than that.
I’m currently building an implementation with fractional rotation. Of course I will post a Show HN when it’s ready.
Not perfectly. I grabbed a random encoded line from these comments, and asked ChatGPT to decode it[1]. It determined the plaintext was:
> Immediately thought of Moby, infact a quick search for this title... coincidental, but I would mention it in the page if I were you.
and noted that it had "preserved punctuation and capitalization from the ciphertext". The actual plaintext should be:
> Immediately thought of XKCD, infact a quick search for this title gives me XKCD, it could be coincidental, but I would mention it in the page if I were you.
I've hit my free usage limit so can't currently prompt it further about its mistake.
It's truly an honour to have been able to teach the PSF's Security Developer-in-Residence something about the implementation of a simple substitution cipher in Python. ;) (In all seriousness, thanks for all your excellent work. The many projects you help out with — and advocate for — in the Python ecosystem, including CPython itself, are all far better off for it.)
Threading is done with the wave character ~ in Racket? I can't decide if I hate it or not (am used to Clojure's ->). I think my pinky finger doesn't like ~.
I was very confused why this would be useful for Telegram messages, but the Why? part of the readme makes perfect sense. Great workaround for a stupid limitation!
I got nerd sniped by this xkcd and was happily working my way through an implementation and realized that the accent combiners work with any character. It is trivial to add bad steganography to your bad encryption.
It's hilarious that Stream Ciphers are the closest thing to the One-Time-Pad (which provides "Perfect Secrecy") and this thing is a Monoalphabetic Substitution Cipher which provides no security whatsoever.
It feels like you're trying to express that stream ciphers are especially secure compared to block ciphers (which is what most of them are built out of), which isn't the case.
I was rather confused by the dictionary comprehension syntax used there, because I wasn't aware that you could write one without the ":" to delineate the key: value pair. Turns out you can, but it just creates a dict with no values stored, just the keys! This works here because the returned dict is an iterable that returns the keys on iteration, and "update" accepts an iterable of (key, value) tuples - and the keys are just that in this case. So the effect is the same as if it was a list comprehension! Just slightly more confusing
Ah, my bad! I did not know these were a thing, but that makes more sense! Teaches me a thing about only quickly trying things in an online REPL on mobile and jumping to conclusions - I forgot curly braces were also a way to denote a set
For fun there are other variants of base64 in a similar spirit [1][2] full unicode. [3] see other links in that repo... Not a stream cipher, just encoding but could be used in conjunction with a stream cipher to add compression. It could go turtles all the way down.
And what do you think is the algorithm from the article? Looks awfully similar to base64 to me, except its lacking the bit-shifts. Both use a lookup table like that.
I think a lot of this depends on if you read the article as the scream cipher being specifically the exact listed substitutions or just any substitution with forms of As. Also depends on how you define encoding, cipher and the overlaps between the two. Plus questions on the relevance of intent, transformation of data, plus changing of meaning and definitions over the years. Some people say morse code is a cipher, but braille isn't - definitions can depend on way more than the black and white logical "but it does this" you're using.
You'd do better debating this with a real life friend over a pint, rather than wasting your time trying to argue with multiple people here.
You will find that the pigpen cipher has a 1:1 mapping between its input alphabet and its output alphabet, and that a 1:1 mapping is a necessity for full invertibility.
What people in this thread call a "key" is, not like a key, auxiliary input data, but hard-coded into the program. We are looking at encodings.
Maybe this differentiation is not popular or well accepted, but it was surely part of my cryptography curriculum and the following exam. I'd rather believe my prof than strangers on the internet.
Key can mean different things in different contexts. In a substitution cipher, the key is the mapping. In modern ciphers, the key would be some set of secret bytes. Everyone agrees that this cipher would be a bad way to encrypt/encode something. But using the word cipher like this has real historical meaning, and that is the meaning that is being used in the project.
There are a little over 256 unicode Combining Marks that have a 2-byte UTF-8 encoding. I picked a set of them, defining an encoding I call zalgo256:
https://gist.github.com/DavidBuchanan314/07da147445a90f7a049...
Since an arbitrarily tall stack of combining characters still counts as one grapheme cluster, if some application limits string length by counting grapheme clusters then you can stuff an unlimited amount of data in there, with "only" 2x overhead in the byte representation.
Unfortunately HN filters some of the codepoints so I can't demonstrate here. Since I chose "A" as the base character which the diacritics are stacked on, it has a similar aesthetic to the SCREAM cipher although a little more zalgo-y.
A demonstration as a comment on the gist would probably work! I'd love to see that
Good point, added
Interesting, I actually expected it to encode a single letter with infinitely long combining marks such that 'highlighting' it was just highlighting one character.
You can do that too, if you increase the STACK_HEIGHT constant (btw, the decoder still works the same, so changing this doesn't break compatibility)
Oh neat! Thanks :)
Most of the characters appear as boxes on my phone.
That's curious, because the only character is just the letter A. But I suppose if the font doesn't support a particular combining mark, it gives up on the whole grapheme?
HN filters some combining characters? That's weird, compared to the symbol/emoji blocking.
Also I'm reminded that the unicode normalization annex suggests that legitimate grapheme clusters will be 31 code points or less. "The value of 30 is chosen to be significantly beyond what is required for any linguistic or technical usage."
If I had to guess, they probably filtered the ones that could be used to break page layouts by creating very-tall glyphs.
I guess that's one way to do it. Pretty far from ideal though.
Are you sure this doesn't summon The One by accident?
I have been using ROT13, but I’ve been looking for a post-quantum replacement so definitely I’m going to convert to SCREAM. It’s generally understood that qubits are unable to represent or even discern the little squiggly bits above normal Latin letters.
Thank you for this important contribution to cryptography!
The important part about applying ROT13 is the number of iterative applications. The security of even-numbered applications is undeniable. Odd-numbered is even better than that.
I’m currently building an implementation with fractional rotation. Of course I will post a Show HN when it’s ready.
oh so that's where æ comes from!
[dead]
I wonder if Chatgpt can decrypt all of them just by analyzing vowel frequency, and then trying to find the algo on the internet.
In my tests, ChatGPT 5 Thinking can handle a monoalphabetic substitution cipher if you prompt it a couple times to keep going.
Not perfectly. I grabbed a random encoded line from these comments, and asked ChatGPT to decode it[1]. It determined the plaintext was:
> Immediately thought of Moby, infact a quick search for this title... coincidental, but I would mention it in the page if I were you.
and noted that it had "preserved punctuation and capitalization from the ciphertext". The actual plaintext should be:
> Immediately thought of XKCD, infact a quick search for this title gives me XKCD, it could be coincidental, but I would mention it in the page if I were you.
I've hit my free usage limit so can't currently prompt it further about its mistake.
[1] https://chatgpt.com/share/68cf17a6-8478-8011-a44e-64d43ad8a4...
I pushed it a bit and it didn’t do so hot.
https://chatgpt.com/share/68cf3b9f-decc-8007-8a5d-cc7b583d0e...
It's not necessary to write the ciphering logic.
NICE!!
It's truly an honour to have been able to teach the PSF's Security Developer-in-Residence something about the implementation of a simple substitution cipher in Python. ;) (In all seriousness, thanks for all your excellent work. The many projects you help out with — and advocate for — in the Python ecosystem, including CPython itself, are all far better off for it.)
<3 Thanks for the kind words!! :)
I did something similar a while back but using all the invisible characters to encode extra data into telegram messages for metadata storage
https://github.com/sixhobbits/unisteg
I had fun writing a Racket version:
Threading is done with the wave character ~ in Racket? I can't decide if I hate it or not (am used to Clojure's ->). I think my pinky finger doesn't like ~.
I was very confused why this would be useful for Telegram messages, but the Why? part of the readme makes perfect sense. Great workaround for a stupid limitation!
Ǎầầặắǎaạặậā ạẵẫȁẳẵạ ẫằ ȂẤĂẮ, ǎẩằaăạ a ǟȁǎăấ ǡặaȧăẵ ằẫȧ ạẵǎǡ ạǎạậặ ẳǎàặǡ ầặ ȂẤĂẮ, ǎạ ăẫȁậắ áặ ăẫǎẩăǎắặẩạaậ, áȁạ Ǎ ảẫȁậắ ầặẩạǎẫẩ ǎạ ǎẩ ạẵặ äaẳặ ǎằ Ǎ ảặȧặ āẫȁ.
https://xkcd.com/3054/
Oh god, now we're gonna have two different standards for a scream cypher https://xkcd.com/927/
Ah its from XKCD (feb 2025), bit odd of the OP not to mention that
I think this might be more coincidental than derivative.
scream ciphers. a bit like back when we invented fire :p
OP here, I either didn't know or completely forgot this XKCD existed and it resurfaced as a good idea haha! Time to update the post lol
https://www.dcode.fr/scream-cipher-xkcd
I got nerd sniped by this xkcd and was happily working my way through an implementation and realized that the accent combiners work with any character. It is trivial to add bad steganography to your bad encryption.
r̊e̝q̝ůěs̔t͞ p̊e̝a͞c̍e̊ t̠a̗lks
It's hilarious that Stream Ciphers are the closest thing to the One-Time-Pad (which provides "Perfect Secrecy") and this thing is a Monoalphabetic Substitution Cipher which provides no security whatsoever.
It feels like you're trying to express that stream ciphers are especially secure compared to block ciphers (which is what most of them are built out of), which isn't the case.
Do this with variants of O and the ghosts will be happy.
Ảặậậ ạẵaạǡ ȧặaậậā ǎẩạặȧặǡạǎẩẳ, a áǎạ ầẫȧặ ằȁẩ ạẫ ȁǡặ ạẵaẩ ȦẪẠ13
I bet that last word is ROT13! We can crack it now! And maybe the second to last is “like”.
wait is it rot13 on the screamphabet or the alphabet?
Ǎ ăẫẩǡǎắặȧ ǎạ a ăẵaậậặẩẳặ áặằẫȧặ ạẵặ ảẵẫậặ ẵȁầaẩ ȧaăặ! Aẩắ Ǎ aǎẩ'ạ ẳẫẩẩa ậẫǡặ.....
> > "Hope remains strewn asunder, I weep holy tears oh great one, Paul:16"
> "I belong to a secret group of panda bear hunters! Eat a meaty flesh chunk...."
For anyone wondering..
Previous attempt at cracking this cipher:
https://www.youtube.com/watch?v=ZlIz0q8aWpA
Most of these are short/long vowel markings, except last one which is (probably) implosion. And rest are frontal/back and wide/narrow A.
But Swedish "Å" is just stupidity "O", because they started pronouncing "O" as "U" and "U" as "Y".
-- Can you pronounce these screams?
I thought this was gonna be about the actual Scream stream cipher: https://eprint.iacr.org/2002/019
I was rather confused by the dictionary comprehension syntax used there, because I wasn't aware that you could write one without the ":" to delineate the key: value pair. Turns out you can, but it just creates a dict with no values stored, just the keys! This works here because the returned dict is an iterable that returns the keys on iteration, and "update" accepts an iterable of (key, value) tuples - and the keys are just that in this case. So the effect is the same as if it was a list comprehension! Just slightly more confusing
Not precisely. {x} is a set literal; {x for y in z} is a set comprehension.
Ah, my bad! I did not know these were a thing, but that makes more sense! Teaches me a thing about only quickly trying things in an online REPL on mobile and jumping to conclusions - I forgot curly braces were also a way to denote a set
Finally we can talk to the bomb dudes in Serious Sam
Or teach it as the only righteous alphabet to our children.
here's a JS one liner that handles scream and unscream in one function
One could use emojis instead, then the message could be hidden in plain sight in places where emoji-spam is common.
For fun there are other variants of base64 in a similar spirit [1][2] full unicode. [3] see other links in that repo... Not a stream cipher, just encoding but could be used in conjunction with a stream cipher to add compression. It could go turtles all the way down.
[1] - https://github.com/qntm/base2048
[2] - https://github.com/qntm/base32768
[3] - https://github.com/qntm/base65536
Emojis have a high overhead, a single emoji is typically 4 bytes but may be up to 35 bytes.
Yikes! Imagine if people were to start sending photos and videos to each other!
Oh, my bad - I wasn't aware that we're doing serious engineering here :p
it had to be done https://chatgpt.com/g/g-68ce9419c7d4819190f82744d6e2741e-url...
How sand people talk
Ạẵặǡặ Äȧặạąặậǡ Aȧặ Ầaấǎẩẳ Ầặ Ạẵǎȧǡạā!
Here's another implementation I made a few months ago:
https://ethmarks.github.io/posts/screamcipher
ẰǍȦǠẠ ÄẪǠẠ
Ằậaẳẳặắ
Now I need a TTS to read a scream.
Artosis' channel on Twitch has got that one covered.
... in the same sense that ROT13 or base64 would be a cipher.
Rot 13 is a cipher. It's a substitution cipher, and more specifically a shift cypher or Caesar cipher. It's not a secure cipher but it is one.
Base64 is an encoding. It's an algorithm, no attempt at secrecy, thus not a cipher.
And thus we arrive at SCREAM64 encoding, base64 in scream cipher.
Sweet Lord Jesus.
such a great idea that we ought to call it based64 encoding
If you use base64 with the intention of hiding the encoded information, surely it’s as much a cipher as rot13 is, right?
And what do you think is the algorithm from the article? Looks awfully similar to base64 to me, except its lacking the bit-shifts. Both use a lookup table like that.
I think a lot of this depends on if you read the article as the scream cipher being specifically the exact listed substitutions or just any substitution with forms of As. Also depends on how you define encoding, cipher and the overlaps between the two. Plus questions on the relevance of intent, transformation of data, plus changing of meaning and definitions over the years. Some people say morse code is a cipher, but braille isn't - definitions can depend on way more than the black and white logical "but it does this" you're using.
You'd do better debating this with a real life friend over a pint, rather than wasting your time trying to argue with multiple people here.
The original Caesar cipher supposedly also had a constant offset, yet it's still considered a cipher.
A bad substitution cipher is still a cipher. Just one you shouldn't use for anything important.
Yes https://en.m.wikipedia.org/wiki/Substitution_cipher
… and no, since neither the enciphering nor the deciphering do a 1:1 mapping for all possible input code points.
That's not a requirement. Pigpen is a substitution cipher.
You will find that the pigpen cipher has a 1:1 mapping between its input alphabet and its output alphabet, and that a 1:1 mapping is a necessity for full invertibility.
First sentence:
> with the help of a key
So, where is the key?
In the code in this article, the key is the mapping stored in ‘CIPHER’.
The key is the data table, representing which each character encodes to or from.
First, second, and third statements of the provided source code.
Like i said, by these measurements, base64 would also be a cipher.
And people are telling you yes, they (rot13 and base64) are indeed ciphers. What's the confusion?
What people in this thread call a "key" is, not like a key, auxiliary input data, but hard-coded into the program. We are looking at encodings.
Maybe this differentiation is not popular or well accepted, but it was surely part of my cryptography curriculum and the following exam. I'd rather believe my prof than strangers on the internet.
Key can mean different things in different contexts. In a substitution cipher, the key is the mapping. In modern ciphers, the key would be some set of secret bytes. Everyone agrees that this cipher would be a bad way to encrypt/encode something. But using the word cipher like this has real historical meaning, and that is the meaning that is being used in the project.
Ha ha ha ha ha ha! You want the key?
https://www.youtube.com/watch?v=vsb9-wPYpxI
Ăặȧạaǎẩậā ȧẫạ13, áaǡặ64, aẩắ ạẵǎǡ ẩặả ǡăȧặaầ ăǎäẵặȧ aȧặ aậậ ǎẩǡặăȁȧặ, áȁạ ạẵặā ắẫ ầặặạ ạẵặ ạặăẵẩǎăaậ ắặằǎẩǎạǎẫẩ ẫằ a ăǎäẵặȧ.
Now pack even more info in each character with Zalgo text.
zalgo256: https://gist.github.com/DavidBuchanan314/07da147445a90f7a049...
Hey, cool little rabbit hole there. I had totally missed all that.
https://en.m.wikipedia.org/wiki/Zalgo_text
am I unusual in not really seeing the "creepiness" of zalgo text?
Maybe you missed this piece of the internet history: https://stackoverflow.com/a/1732454
Think of it as representing something like the letters actively 'creeping' and giving off tendrils of darkness. Does this help?
I understand the idea, it just doesn't impact me
It's not inherently creepy but often symbolic of corruption or someone talking in a raspy/synthetic "evil overlord" kind of voice.
[dead]
[dead]
[dead]