Merge pull request #9 from frankrolf/master
U+030A is only needed once
|1 year ago|
|CONTRIBUTING.md||3 years ago|
|LICENSE||3 years ago|
|README.md||1 year ago|
This is a curated list of characters in Unicode, that have interesting (and maybe not widely known) features or are awesome in some other way.
╭───────╮ │Unicode│ │rules! │ ╰┬─────┬╯
U+2E2E REVERSED QUESTION MARK - the “irony mark” to express irony/sarcasm. A useful character⸮
U+FEFF ZERO WIDTH NO-BREAK SPACE - it’s name suggests, that it can be used like U+2060 WORD JOINER. And in fact the latter was introduced to inherit its semantics. This is because U+FEFF had become a special beacon called the byte order mark, that was placed on the beginning of some UTF-8 files. In complying software (including many text editors) this character is stripped from the start of a file and handled as metadata. In non-complying software (like the PHP interpreter) this leads to all sorts of fun behaviour.
U+FFFD REPLACEMENT CHARACTER - when a character cannot be displayed (e.g., decoding an erroneous UTF-8 sequency), this code point steps into the breach.
U+FE0E VARIATION SELECTOR-15 - force black-&-white emoji. If this code point follows an emoji, an explicit monochrome rendering of the emoji is requested (if the client supports it).
U+FE0F VARIATION SELECTOR-16 - force colorful emoji. If this code point follows an emoji, an explicit colorful rendering of the emoji is requested (if the client supports it).
Diacritics and combining marks: There is a host of
characters, that add
to the characters before. Those are called Combining Marks. Unicode
provides a handy FAQ on the
details, but in a nutshell: If you add one after a character, it is placed
on top of that previous one. So,
a + ̊ = å. This may lead to all kinds
of funny problems, because for some combinations there are pre-composed
characters. Our little
å here can also be encoded as U+00E5. You might
note, that while this has a length of one character, the combination of
and combining ring has a length of two characters.
Of course, one can also do fun things with those characters like this answer on StackOverflow.
The Regional Indicator Symbols U+1F1E6 to U+1F1FF resemble the 26 latin characters. They are used to create flag emoji. Since the Unicode consortium didn’t feel like getting on board with international politics, the solution to flags is to combine these 26 characters to the respective ISO code for a country. Examples:
|Country||ISO Code||Code Points||Emoji (if supported)|
|USA||US||U+1F1FA + U+1F1F8||🇺🇸|
|Germany||DE||U+1F1E9 + U+0F1EA||🇩🇪|
|China||CN||U+1F1E8 + U+0F1F3||🇨🇳|
Skin color of emoji: There are five code points, that control the skin color of emoji, U+1F3FB to U+1F3FF. They are called “Emoji Modifier Fitzpatrick Type” 1 to 6, with 1 the palest and 6 the darkest. If one of these characters follows an emoji, that emoji is meant to be rendered in the appropriate skin color of the Fitzpatrick scale. If no such modifier is added, the skin tone should be unnatural, e. g., bright yellow. Fun fact: Since the Fitzpatrick modifiers are normal code points, emoji with such skin colors have the length 2, which Twitter users noticed first. Here is a comparison chart directly from the specification:
|U+1F3FB||EMOJI MODIFIER FITZPATRICK TYPE-1-2|
|U+1F3FC||EMOJI MODIFIER FITZPATRICK TYPE-3|
|U+1F3FD||EMOJI MODIFIER FITZPATRICK TYPE-4|
|U+1F3FE||EMOJI MODIFIER FITZPATRICK TYPE-5|
|U+1F3FF||EMOJI MODIFIER FITZPATRICK TYPE-6|
­) like ZERO WIDTH SPACE, but show a hyphen if (and only if) a break occurs.
For better comparison of which code point has which effect, consult this table:
Smashing Magazine featured a comprehensive article on the different types of whitespace.
1 + 2 === 3.
For plain-text gaming, Unicode is well equipped with several complete sets:
See the contribution guide for details.