Go: What is a rune?

Unicode code points

A rune is a type meant to represent a Unicode code point.

The rune type is an alias for int32, and is used to emphasize than an integer represents a code point.

ASCII defines 128 characters, identified by the code points 0–127. It covers English letters, Latin numbers, and a few other characters.

Unicode, which is a superset of ASCII, defines a codespace of 1,114,112 code points. Unicode version 10.0 covers 139 modern and historic scripts, as well as multiple symbol sets.

Strings and UTF-8

A Go string is an immutable sequence of bytes. It typically contains text encoded as UTF-8.

Note that a string is a sequence of bytes, not runes.

However, strings often contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes, and Go source code is always encoded in UTF-8. This encoding was in fact designed by Ken Thompson and Rob Pike, two of the main creators of Go.

The String handling cheat sheet covers the principal ways to handle strings and runes in Go.

Comments

Be the first to comment!