Unicode

Text

Text is a sequence of glyphs. Glyphs are individual marks that contribute to the meaning of what's written.

Glyph ⇨ Characters ⇨ Code Points ⇨ Binary Encoding

ASCII

Unicode Code Points

Organization

Planes

UTF-8

1-Byte Encoding

Multi-Byte Encoding

UTF-8 Encoding ➡︎ Unicode Point

  1. Convert to binary
  2. Determine size
  3. Strip leading bytes and encoding to get payload
  4. Group in pairs of 4 from right
  5. Convert to hex
  6. U+{hex}

Unicode Point ➡︎ UTF-8 Encoding

Other Encodings

UTF-X・UTF-16・UTF-32

Endian-ness