In the '90s, MS-Windows by default used its own 8-bit character set CP-1252 that was a superset of ISO 8859-1 with a few additional characters, including left and right single and double quotation marks in 8859-1's unused code positions.
Microsoft Word used to "auto-correct" the ASCII codes 0x22 and 0x27 to those.
Also, ISO 8859-1 was the default character set for the web, and MS Word was in common use for making simple web pages... but without narrowing from CP-1252 to ISO 8859-1.
This had the effect that when you browsed one of those pages in a browser on another operating system, the quotation marks rendered as empty boxes ( = illegal character).
The explanation seems to be that it looked good in some old fonts. But I think it was always some kind of abuse. On old Typewriters the accents were usually used for accents (é è). They didn't move the cursor, so using them for apostrophes wasn't that comfortable and interrupted writing flow. Accent + space looks a bit like a quotation mark, but the right place of an accent is usually on top of a letter.
Special quotation marks sometimes end up in filenames (usually when I have saved a web page) and I hate trying to tab-complete or write globs for those. Of course it is equally annoying with other random unicode characters not present on my keyboard, but it mostly happens with quotation marks. (Yes, the solution is to copy-paste those characters in the terminal, but that is the annoying part, having to do that instead of just typing the next character).
This problem was solved by Plan 9 (roughly 1990) where there was a compose key to turn sequences into Unicode characters. Say compose-f-a to get ∀. This was all configurable in /lib/keyboard.
On so-called modern X11 or Wayland based systems (Linux or *BSD), there is a similar feature called XCompose. Worse syntax, but still functional.
Being able to configure your system to type the characters really doesn't solve the problem. In particular, if you get data (including metadata such as filenames) from someone else, you need to recognize the characters, both to do the configuration and then actually type them. And characters are not glyphs. There are all kinds of cases where simply looking at something doesn't and can't tell you what characters are in it.
Some language settings (on windows) will auto replace the '' and "" set for ʻʼ and “” as that is the correct spelling in the set language. There is also the lower quotes that can be used but it seems usually a normal comma and double comma is used as codepoint (U+002C, U+201E) ,’ „”.
This really messed me up when I started programming since those quotes will not work when writing in a language that expects a set of the same character but they may use the same glyph. This is one of the many reasons I have my systems set to English.
I agree that for a normal writing environment it may be advantageous to have it auto replace since it is also just easier to hit the same key twice and have it auto open/close.
I have never encountered that behaviour outside of Microsoft Word and its alternatives, I've always had this happen application-side. Is this an IME thing? Or a non-Unicode-compatible code page? Because I don't think there's any other Windows-side automatic replacement of that type.
Many blog engines online will also try to be helpful and replace quotes with smart quotes, which makes copy-pasting source code from tutorials quite a pain.
From what I remember (it was a while back) it was both in notepad, notepad++ and geany.
I was using, probably win xp or 7 at the time. I remember the only way I could fix it at the time was to change the language and keyboard settings to international English.
I can't seem to replicate the behavior now on win 11 even with the same language set and keyboard layout (system language set to English), so perhaps I'm misremembering?
It seems that this keyboard layout does enable typing ¨ U+00A8 so perhaps I am confused with that and that some editors (word etc) do the opening/closing replacement.
IME is not used, though interestingly in Asian languages the do used even different quotes, example Japanese:「x」and『x』
Option + ] will produce ‘, Option+Shift+] will produce ’. Similarly, “” can be produced with Option(+Shift)+[. Alternatively, Option+Shift+e will produce ´.
Ugh, that is such a bad arrangement of the four combos. It should obviously have been [ for left and ] for right (just as [ and ] are a pair), and Shift to turn single into double (just as Shift turns ' into ").
I’ve looked at the ASCII tables to try to figure out ¿what were they thinking? and suspect it has to do with option \ and | being « and » (and euro key for ` and ´ being one key’s base and shift).
See the ASCII table in the article, for example. I've considered the thing done wrong is the ] and { should've swapped. But then < and > and ( and ) beg me to differ.
Kinda OT: can anybody recognize the keyboard in the German keyboard shot? I have been looking for a similar ISO keyboard (with international layout) for a while
It looks like a Cherry MX 3000 (G80-3000) [1], of the older "winkeyless" variety in German layout. The later variant with windows keys and cheaper build is still being produced.
I bought several second-hand on German eBay, for scavenging parts for DIY mechanical keyboards back in the early '10s before the mechanical keyboard scene matured. The build is cheap and dated compared to modern mechanical keyboards, and newly produced keyboards are very overpriced. [2]
Most keyboard models are produced in ISO layouts, but you might have to look abroad to find one the right one or buy it from an online store selling internationally.
In the '90s, MS-Windows by default used its own 8-bit character set CP-1252 that was a superset of ISO 8859-1 with a few additional characters, including left and right single and double quotation marks in 8859-1's unused code positions.
Microsoft Word used to "auto-correct" the ASCII codes 0x22 and 0x27 to those.
Also, ISO 8859-1 was the default character set for the web, and MS Word was in common use for making simple web pages... but without narrowing from CP-1252 to ISO 8859-1.
This had the effect that when you browsed one of those pages in a browser on another operating system, the quotation marks rendered as empty boxes ( = illegal character).
I was wondering why `this hideous quotation style' is used in so many places. Good historical window.
The explanation seems to be that it looked good in some old fonts. But I think it was always some kind of abuse. On old Typewriters the accents were usually used for accents (é è). They didn't move the cursor, so using them for apostrophes wasn't that comfortable and interrupted writing flow. Accent + space looks a bit like a quotation mark, but the right place of an accent is usually on top of a letter.
Special quotation marks sometimes end up in filenames (usually when I have saved a web page) and I hate trying to tab-complete or write globs for those. Of course it is equally annoying with other random unicode characters not present on my keyboard, but it mostly happens with quotation marks. (Yes, the solution is to copy-paste those characters in the terminal, but that is the annoying part, having to do that instead of just typing the next character).
This problem was solved by Plan 9 (roughly 1990) where there was a compose key to turn sequences into Unicode characters. Say compose-f-a to get ∀. This was all configurable in /lib/keyboard.
On so-called modern X11 or Wayland based systems (Linux or *BSD), there is a similar feature called XCompose. Worse syntax, but still functional.
Being able to configure your system to type the characters really doesn't solve the problem. In particular, if you get data (including metadata such as filenames) from someone else, you need to recognize the characters, both to do the configuration and then actually type them. And characters are not glyphs. There are all kinds of cases where simply looking at something doesn't and can't tell you what characters are in it.
Some language settings (on windows) will auto replace the '' and "" set for ʻʼ and “” as that is the correct spelling in the set language. There is also the lower quotes that can be used but it seems usually a normal comma and double comma is used as codepoint (U+002C, U+201E) ,’ „”.
This really messed me up when I started programming since those quotes will not work when writing in a language that expects a set of the same character but they may use the same glyph. This is one of the many reasons I have my systems set to English.
I agree that for a normal writing environment it may be advantageous to have it auto replace since it is also just easier to hit the same key twice and have it auto open/close.
I have never encountered that behaviour outside of Microsoft Word and its alternatives, I've always had this happen application-side. Is this an IME thing? Or a non-Unicode-compatible code page? Because I don't think there's any other Windows-side automatic replacement of that type.
Many blog engines online will also try to be helpful and replace quotes with smart quotes, which makes copy-pasting source code from tutorials quite a pain.
From what I remember (it was a while back) it was both in notepad, notepad++ and geany. I was using, probably win xp or 7 at the time. I remember the only way I could fix it at the time was to change the language and keyboard settings to international English.
I can't seem to replicate the behavior now on win 11 even with the same language set and keyboard layout (system language set to English), so perhaps I'm misremembering? It seems that this keyboard layout does enable typing ¨ U+00A8 so perhaps I am confused with that and that some editors (word etc) do the opening/closing replacement.
IME is not used, though interestingly in Asian languages the do used even different quotes, example Japanese:「x」and『x』
But how can I enter ’ (U+2019) on macOS (US layout) without going through some magic incantation? It’s impossible!
Option + ] will produce ‘, Option+Shift+] will produce ’. Similarly, “” can be produced with Option(+Shift)+[. Alternatively, Option+Shift+e will produce ´.
Ugh, that is such a bad arrangement of the four combos. It should obviously have been [ for left and ] for right (just as [ and ] are a pair), and Shift to turn single into double (just as Shift turns ' into ").
I’ve looked at the ASCII tables to try to figure out ¿what were they thinking? and suspect it has to do with option \ and | being « and » (and euro key for ` and ´ being one key’s base and shift).
See the ASCII table in the article, for example. I've considered the thing done wrong is the ] and { should've swapped. But then < and > and ( and ) beg me to differ.
Thanks. This is better than the alternatives that I was aware of.
Just enable automatic substitutions from the Edit menu? You can do this on a per-app basis.
MacOS, iPadOS, iOS on physical keyboards:
“ Option [
” Option {
‘ Option ]
’ Option }
… Option ;
- just -
– Option -
— Option _
• Option 8
you may be able to use the "ctrl (⌃) + cmd (⌘) + space" (character viewer) menu to select it? There may also be another shortcut for it.
Kinda OT: can anybody recognize the keyboard in the German keyboard shot? I have been looking for a similar ISO keyboard (with international layout) for a while
It looks like a Cherry MX 3000 (G80-3000) [1], of the older "winkeyless" variety in German layout. The later variant with windows keys and cheaper build is still being produced.
I bought several second-hand on German eBay, for scavenging parts for DIY mechanical keyboards back in the early '10s before the mechanical keyboard scene matured. The build is cheap and dated compared to modern mechanical keyboards, and newly produced keyboards are very overpriced. [2]
Most keyboard models are produced in ISO layouts, but you might have to look abroad to find one the right one or buy it from an online store selling internationally.
[1] https://wiki.themk.org/index.php/Cherry_G80-3000
[2] https://www.cherry.de/en-gb/product/g80-3000
In the UK, "winkey" is a childish name for a penis, so careful now ...
Just a guess, but it looks like an IBM Model M but with a German layout, or at least something from that era.