ASCII, Unicode, and Character Codes

By Mickael Gomes · Last updated: 2026-06-19

A character code is the number a computer uses to represent a letter, digit, or symbol. ASCII assigns codes 0 through 127 to the basic English letters, digits, punctuation, and a set of control characters, while Unicode extends the same idea to cover virtually every writing system on Earth. Understanding character codes explains how text becomes numbers, why the capital letter A is 65, and how emoji and accented letters fit into the same scheme.

The ASCII table

ASCII, the American Standard Code for Information Interchange, fixes a code for each of 128 characters. Codes 0 to 31 are non-printing control characters such as tab, newline, and carriage return. Code 32 is the space. The digits 0 to 9 occupy codes 48 to 57, the uppercase letters A to Z run from 65 to 90, and the lowercase letters a to z run from 97 to 122.

A neat consequence of this layout is that uppercase and lowercase versions of a letter differ by exactly 32. A is 65 and a is 97, so flipping a single bit changes the case. Because ASCII fits within seven bits, every ASCII character occupies one byte with room to spare, which kept early text files small and simple.

From ASCII to Unicode

ASCII could only describe English, so as computing spread the world needed a universal scheme. Unicode assigns a unique number, called a code point, to every character across every script, from Latin and Cyrillic to Chinese, Arabic, and emoji. The first 128 Unicode code points are deliberately identical to ASCII, so all existing ASCII text is already valid Unicode. Code points are usually written in hexadecimal with a U+ prefix, such as U+0041 for A or U+1F600 for a grinning face.

Encoding: how code points become bytes

A code point is an abstract number; an encoding decides how to store it as bytes. UTF-8, defined in RFC 3629, is the dominant encoding on the web. It uses one byte for the original ASCII range, keeping English text compact, and two to four bytes for higher code points. This backward compatibility is why UTF-8 became universal: an old ASCII file is already valid UTF-8, while the same encoding can still represent any character Unicode defines.

When you convert a decimal code like 65 to its character, you are reading one entry from this enormous table. Small numbers land in familiar ASCII territory, while larger ones reach into the full Unicode range. A character-code converter lets you move between a number and the symbol it represents without consulting the table by hand.

ASCII, Unicode, and Character Codes

The ASCII table

From ASCII to Unicode

Encoding: how code points become bytes

Sources