ANSI Control Sequences

Photo Credits: Unsplash

Introduction

ANSI control sequences are series of characters embedded within text data that, instead of being printed as text, are interpreted by terminals and terminal emulators as functions or commands.

For example, most *nix terminals will print shell prompts in a different color than shell input and output. This is almost always accomplished with ANSI control sequences (which are a subset of ANSI escape sequences).

However, there are a wide variety of use cases for control sequences beyond text formatting. Nearly every interactive or dynamic terminal-based application requires ANSI control sequences to function.

Text Encodings

Using ANSI control sequences requires a basic understanding of text encodings.

It's fairly common knowledge that computer systems store all data (video games, pictures, ebooks, raw text, etc.) as 1's and 0's, or bits and bytes. In order to input, store, and display this data, software designers had to agree amongst themselves how to store different sorts of data: what combination of 1's and 0's would represent an 'A', a 'B', and so on. An agreed-upon set of rules for translating data into 1's and 0's is called an "encoding."

(There's the additional nuance of code points vs encodings, but more on that later.)

ASCII is the most well-known text encoding. It uses 7 or 8 bits per character and can represent all letters and punctuation marks in the English alphabet.

An "ASCII Table":

An expanded version of this table is available here

Non-Printable Characters

While ASCII designates decimal numbers 33 through 126 as regular or 'printable' characters, bytes containing the equivalent of (decimal) 0-31 or 127 are 'non-printable' or 'control' characters.

Most of the ASCII control characters are rarely used today, with some obvious exceptions (line feed, backspace, etc.). For ANSI escape codes, we'll primarily be interested in the escape character (decimal 27).

Code Points vs Encodings

There are two pieces to an encoding:

Mapping a character to a number, or "code point"
Defining how code points will be encoded into 1's and 0's

The majority of text-based data today is stored in a Unicode-based encoding. Unicode defines "code points," or assigns a unique number to each character. UTF-8, UTF-16, and UTF-32 all use those code points but differ in their methods for translating those numbers into bits and bytes on disk.

Because Unicode defines so many characters, more than one byte is required to represent many of the numbers assigned to them. To provide efficiency, UTF-8 is a "variable-length" encoding, meaning some characters are represented with a single byte, while others require up to four. UTF-16 uses a minimum of two bytes per character, but also may go up to four, while UTF-32 is a fixed-length encoding that always uses four bytes per character. UTF-32 is less memory-efficient, as much of the data will consist of "leading zeroes."

History of Escape Sequences

The ASCII text encoding was first standardized in 1963 in the ANSI X3.4 standard. ECMA-6 and ISO 646 followed soon after (1965 and 1967, respectively). The version of ASCII defined in ISO 646 is nearly identical to modern ASCII and defined the 'C0' control character set (more on these later).

In the 1970s, video terminals started to become popular. The most well-known examples today were created by the Digital Equipment Corporation (DEC):

1970: VT05
1975: VT52
1978: VT100
1983: VT200

Video terminals unlocked many capabilities not possible with teletypewriters/teleprinters. For example:

Text can be displayed in any number of colors, styles, and formats
The cursor can easily be moved to any arbitrary location, including next to or over previously written content
Text previously written to the terminal can be modified or erased
A program like vim can wipe the terminal's display, display its own content, then restore the previous content when complete

Initially, terminal vendors used proprietary or vendor-specific escape sequences to perform these types of operations. ECMA-48 (1976), ANSI X3.64 (1979), and ISO 6429 (1983) standardized most escape sequences, though many terminals continued to support additional escape sequences beyond these. The VT100 was the first terminal to be "ANSI-compliant."

In the 1980s, the xterm program was developed for *nix systems and remains one of the most popular and influential terminal emulators. It was designed to emulate the DEC VT series of terminals and therefore supports both standardized ANSI escape sequences as well as DEC private use functions.

Further reading on xterm

In 2016, Microsoft added support for ANSI escape sequences into its terminals (conhost and Windows Terminal) with the Windows 10 v1511 update. Microsoft based its support for ANSI escape sequences on the xterm program and consequently also supports both standardized ANSI escape sequences and DEC private use functions.

Further reading on Windows Virtual Terminal Sequences

Definitions

The phrase "ANSI escape sequences" is often used synonymously with "ANSI control sequences." Technically, control sequences are a subset of escape sequences. For clarity, here are the definitions from the ECMA-48 standard:

Control character: A control function the coded representation of which consists of a single bit combination
Control function: An element of a character set that effects the recording, processing, transmission, or interpretation of data, and that has a coded representation consisting of one or more bit combinations
Control sequence: A string of bit combinations starting with the control function CONTROL SEQUENCE INTRODUCER (CSI), and used for the coded representation of control functions with or without parameters
Escape sequence: A string of bit combinations that is used for control purposes in code extension procedures. The first of these bit combinations represents the control function ESCAPE
Private use: The means of representing a non-standardized control function or mode in a manner compatible with this Standard

Note: See the C1 section under Control Characters for a definition of the Control Sequence Introducer (CSI).

Observations:

All control characters are (or invoke) control functions, but not all control functions are control characters
All control sequences are (or invoke) control functions
All control sequences are escape sequences, but not all escape sequences are control sequences

Control Characters

There are two sets of control characters: C0 and C1. Some standards define methods for switching between standard and non-standard C0 and C1 character sets, but these seem to be rarely used - in modern scenarios, at least. This article will only describe the standard control character sets.

Note: Control characters are sometimes referred to with the shorthand Cc.

C0

The C0 character set comprises the 32 non-printable characters at the start of the ASCII table, and was originally defined in ISO 646, though ISO 6429/ECMA-48 subsequently re-named some codes. These characters are still included in most text encodings (though many are rarely, if ever, used).

C1

The default C1 character set was first defined in ECMA-48/ISO 6429. It defined an additional 32 control characters. They were given both 7-bit and 8-bit encodings. The 8-bit C1 set encodes each control character in a single byte and spans the range of decimal values 128 to 159. To make these control characters available to 7-bit systems (which cannot encode decimal values 128 or above in a single unit), a multi-unit version of each control character was defined by combining the ESC character with the one of the characters between decimal 64 and 95.

(The two-unit versions of the C1 control characters are arguably control functions - since they are defined by the combination of two characters.)

Note that Microsoft has intentionally disabled support for the 8-bit C1 control characters by default (source). Support can be enabled using the DEC private use escape sequence S8C1R (ESC SP 7).

(Documentation on the original DEC S8C1R function is available here)

Control Sequences

Structure

Paraphrasing ECMA-48:

A control sequence has the structure CSI P...P I...I F, where

CSI is the 7 or 8 bit control sequence introducer (code points 1b 5b or 9b)
P...P are Parameter Bytes, which, if present, have code points between 30 and 3f
I...I are Intermediate Bytes, which, if present, have code points between 20 and 2f
F is the Final Byte, has a code point between 40 and 7e, and - together with the Intermediate Bytes, if present - identifies the control function
- Final Bytes 70 through 7e are reserved for private use

End paraphrase

In most cases, this looks like:

\x1b[ <zero or more numbers, separated by ";"> <a letter>

As an example, the control sequence \x1b[1;3;4;35m can be read as:

Characters	Meaning
`\x1b[`	CSI
`1;3;4;35`	Function Arguments
`m`	Function

In this case:

"m" invokes the "Select Graphics Rendition" function
"1" sets text to bold
"3" sets text to italics
"4" sets text to underlined
"35" sets the text color to magenta.

Selected ANSI Control Functions

Pn used to denote a parameter

Acronym	Name	Signature	Description
CUU	Cursor Up	CSI Pn A	Move cursor up by n (default n is 1)
CUD	Cursor Down	CSI Pn B	Move cursor down by n (default n is 1)
CUF	Cursor Forward	CSI Pn C	Move cursor forward by n (default n is 1)
CUB	Cursor Backward	CSI Pn D	Move cursor backward by n (default n is 1)
CNL	Cursor Next Line	CSI Pn E	Move cursor to the beginning of the line n lines down (default n is 1)
CPL	Cursor Previous Line	CSI Pn F	Move cursor to the beginning of the line n lines up (default n is 1)
CUP	Cursor Position	CSI Pn1;Pn2 H	Move cursor to the Pn1th row and Pn2th column
ED	Erase in Display	CSI Pn J	Pn=0: current position to end of display; Pn=1, beginning of the display to current position; Pn=2, erase full display
EK	Erase in Line	CSI Pn K	Pn=0: current position to end of line; Pn=1, beginning of the line to current position; Pn=2, erase full line
SU	Scroll Up	CSI Pn S	Scroll text up by n. New lines fill in from bottom
SD	Scroll Down	CSI Pn T	Scroll text down by n. New lines fill in from top
ICH	Insert Character	CSI Pn @	Insert n spaces at current position, shifting existing text to right
DCH	Delete Character	CSI Pn P	Delete n characters at current position, shifting space characters in from right edge
ECH	Erase Character	CSI Pn X	Overwrite n characters from the current position with a space character
IL	Insert Line	CSI Pn L	Insert n lines at the current position
DL	Delete Line	CSI Pn M	Delete n lines from the current position
SGR	Set Graphics Rendition	CSI Pn m	Set format of screen and text (many available parameters, see dedicated sub-section)

Set Graphics Rendition

The SGR function can be used with an arbitrary number of parameters from the table below, separated by semicolons.

Table (mostly) from Microsoft

Value	Description	Behavior
0	Default	Returns all attributes to the default state prior to modification
1	Bold/Bright	Applies brightness/intensity flag to foreground color
22	No bold/bright	Removes brightness/intensity flag from foreground color
4	Underline	Adds underline
24	No underline	Removes underline
7	Negative	Swaps foreground and background colors
27	Positive (No negative)	Returns foreground/background to normal
3	Italics	Adds italic formatting
30	Foreground Black	Applies non-bold/bright black to foreground
31	Foreground Red	Applies non-bold/bright red to foreground
32	Foreground Green	Applies non-bold/bright green to foreground
33	Foreground Yellow	Applies non-bold/bright yellow to foreground
34	Foreground Blue	Applies non-bold/bright blue to foreground
35	Foreground Magenta	Applies non-bold/bright magenta to foreground
36	Foreground Cyan	Applies non-bold/bright cyan to foreground
37	Foreground White	Applies non-bold/bright white to foreground
38	Foreground Extended	Applies extended color value to the foreground (see details below)
39	Foreground Default	Applies only the foreground portion of the defaults (see 0)
40	Background Black	Applies non-bold/bright black to background
41	Background Red	Applies non-bold/bright red to background
42	Background Green	Applies non-bold/bright green to background
43	Background Yellow	Applies non-bold/bright yellow to background
44	Background Blue	Applies non-bold/bright blue to background
45	Background Magenta	Applies non-bold/bright magenta to background
46	Background Cyan	Applies non-bold/bright cyan to background
47	Background White	Applies non-bold/bright white to background
48	Background Extended	Applies extended color value to the background (see details below)
49	Background Default	Applies only the background portion of the defaults (see 0)
90	Bright Foreground Black	Applies bold/bright black to foreground
91	Bright Foreground Red	Applies bold/bright red to foreground
92	Bright Foreground Green	Applies bold/bright green to foreground
93	Bright Foreground Yellow	Applies bold/bright yellow to foreground
94	Bright Foreground Blue	Applies bold/bright blue to foreground
95	Bright Foreground Magenta	Applies bold/bright magenta to foreground
96	Bright Foreground Cyan	Applies bold/bright cyan to foreground
97	Bright Foreground White	Applies bold/bright white to foreground
100	Bright Background Black	Applies bold/bright black to background
101	Bright Background Red	Applies bold/bright red to background
102	Bright Background Green	Applies bold/bright green to background
103	Bright Background Yellow	Applies bold/bright yellow to background
104	Bright Background Blue	Applies bold/bright blue to background
105	Bright Background Magenta	Applies bold/bright magenta to background
106	Bright Background Cyan	Applies bold/bright cyan to background
107	Bright Background White	Applies bold/bright white to background

Extended Color Subsequences:

SGR Subsequence	Description
38;2;r;g;b	Set foreground color to RGB value specified in r, g, b parameters*
48;2;r;g;b	Set background color to RGB value specified in r, g, b parameters*
38;5;s	Set foreground color to s index in 88 or 256 color table*
48;5;s	Set background color to s index in 88 or 256 color table*

*You can find an example of the referenced color table here

Selected DEC Control Functions

Acronym	Name	Signature	Description
DECSC	Save Cursor	ESC 7	Save cursor position in memory
DECSR	Restore Cursor	ESC 8	Restore cursor position from memory
DECSET	DEC Private Mode Set	CSI ? Pn h	25: show cursor; 1049: use alternate screen buffer; and many others
DECRST	DEC Private Mode Reset	CSI ? Pn l	Unset corresponding DECSET settings

How to Use

To make use of control characters and functions in a modern terminal emulator, you need to know:

How to input non-printable characters to your terminal
The text encoding of your terminal
How to represent the desired character(s) in said encoding
Whether your terminal supports the targeted control character(s) and/or function(s)

1. Input Non-Printable Characters

Both Bash and PowerShell support various methods of inputting characters by their code points and/or byte representations.

Bash

printf will recognize the byte-representation of characters as octal in the format \ddd or hex in the format \xdd.

echo (when used with the -e flag) will recognize the same formats, plus common \ shorthands, like \e and \n. (printf recognizes \n but not \e.)

ESC=$(printf '\033')
echo "${ESC}[35mhello"

# OR for one-liners
printf '\033[35mhello'
echo -d '\x1b[35mhello

Important: Unlike PowerShell, both of these options will output exactly the bytes you input. If those bytes are not properly encoded according to your terminal's encoding, they will not work as expected. For example, if your terminal is using UTF-8 (quite likely), the raw, 8-bit CSI will not work if input as \x9b, because this is not the valid UTF-8 encoding. You must instead input it as \xc2\x9b.

Further Reading: Second answer on this Stack Overflow question

PowerShell

PowerShell does not recognize any of the formats supported by printf but does support multiple methods of inputting the byte-representation of characters.

A char instance can be declared by:

casting the decimal value of any Unicode code point between 00 and ff to char
casting the 2-byte hex representation of any Unicode code point in the "Basic Multilingual Plane" (BMP) to char
calling the ConvertFromUTF32 function on the UTF32 hex representation of any Unicode code point

# Method One
Write-Host "$([char]27)[35mhello"

# Method Two
Write-Host "The trademark symbol is $([char]0x2122)"

# Method Three
Write-Host "A smiley face can be printed with $([char]::ConvertFromUTF32(0x1F60A))"

Bonus: The ESC character can be input with the keyboard combination CTRL+H.

Write-Host "^[[35mhello" # '^[' must be typed with CTRL+H

Important: PowerShell will write your characters to the terminal in whatever encoding the $OutputEncoding environment variable is set to. To change your output encoding, use [Console]::OutputEncoding = [System.Text.Encoding]::<encoding>. The default for PowerShell Core is UTF-8, and the default for PowerShell 5.1 and below is ascii.

2. Terminal Encoding

On Windows, the Console API can be used to query and set the input and output encodings of the console itself (distinct from the output encoding of PowerShell). In PowerShell, you can accomplish this with the following commands:

# Query
[System.Console]::InputEncoding
[System.Console]::OutputEncoding

# Set
[System.Console]::InputEncoding = [System.Text.Encoding]::<encoding>
[System.Console]::OutputEncoding = [System.Text.Encoding]::<encoding>

Unfortunately, there is no universal equivalent in the *nix world. There are relevant environment variables and the locale command, but these do not tell you the encoding of the actual terminal. The best method to determine a *nix terminal's encoding is trial and error. (The most common terminal encoding seems to be UTF-8.)

3. Character Representation

As briefly discussed in the "Input Non-Printable Characters" section, PowerShell will correctly encode the output of all characters according to the $OutputEncoding variable; Just make sure that this variable matches the input encoding of the terminal it is running within.

However, *nix tools for outputting non-printable characters will output exactly the bytes you enter in hex or octal representation. This means you need to know the encoding of the terminal you are working in and encode your characters accordingly.

If the terminal's encoding is UTF-8 (quite common), all characters with a code point between 00 and 7F can be input as a single byte. For larger code points, refer to UTF-8 documentation (or the Stack Overflow answer here).

4. Sequence Support

Most modern terminal emulators will support all or most of the standard ANSI escape sequences, documented in ECMA-48 (and Wikipedia).

Many terminal emulators support additional escape sequences. The two terminal emulators I've discussed in this article (xterm and the Windows Console) both support many of the DEC private use functions.

Microsoft documents their escape sequence support here.

xterm's escape sequence support is documented here.

Examples

Text Formatting

Format a title using both standard and private use sequences.

function Write-Title {
	param(
		[Parameter(mandatory=$true)]
		[string]$Title
	)
	$ESC = [char]27
	Write-Host ("$ESC#3" + "$ESC(0" + "$ESC[38;5;21m" + "l" + "q" * ($Title.Length + 2) + "k")
	Write-Host ("$ESC#3" + "$ESC[38;5;21m" + "x " + "$ESC(B" + "$ESC[38;5;82m" + $Title + "$ESC[38;5;21m" + "$ESC(0 x")
	Write-Host ("$ESC#4" + "$ESC[38;5;21m" + "x " + "$ESC(B" + "$ESC[38;5;82m" + $Title + "$ESC[38;5;21m" + "$ESC(0 x")
	Write-Host ("$ESC#4" + "$ESC[38;5;21m" + "m" + "q" * ($Title.Length + 2) + "j")
	Write-Host ("$ESC(B" + "$ESC[0m")
}

Note: this function makes use of two functions not discussed above, to include:

DECDHL
Switching character sets (specifically, to and from the DEC Line Drawing mode)

Screen Buffers

Switch to alternate screen buffer. Print text. Switch back to primary buffer on user input.

function Demo-ScreenBuffers {
	$ESC = [char]27
	$sequence = "$ESC[?1049h" + "$ESC[2J" + "$ESC[?25l"
	Write-Host -NoNewline $sequence
	Write-Host "You are now in the alternate screen buffer." + ` 
		"Press ENTER to return to the main screen buffer"
	Read-Host
	$sequence = "$ESC[r" + "$ESC[?25h" + "$ESC[?1049l" 
	Write-Host -NoNewline $sequence
}

Cursor Control

Example 1

Insert text 5 lines above the current cursor position:

^[7^[[5A^[[L<YOUR TEXT HERE>^[8

Example 2

Print a border of 'a's around a PowerShell session, or around the CLI prompt

Session

$w = [System.Console]::WindowWidth; $h = [System.Console]::WindowHeight
$ESC = [char]27
$cmds = `
	@("", "7", "[1;1H", "[2M", "[$($w)X]", "[$h;1H", "[L") + `
	@(for ($i=1; $i -le $h; $i++) { "[$i;1H", "[2@a", "[$i;$($w)Ha" }) + `
	@(for ($i=3; $i -le $w; $i+=2) { "[1;$($i)Ha", "[$($h);$($i)Ha" }) + `
	@("8>")
Write-Host $([string]::Join($ESC, $cmds))

Prompt

function prompt {
	$w = [System.Console]::WindowWidth
	$ESC = [char]27
	$aLine = ("a " * ($w/2)).Substring(0, $w)
	"`n`n`n$aLine" + "$ESC[2F" + $aLine + "$ESC[E" + $regPrompt + "PS: $PWD > "
}

Filename Trickery

In addition to expanded sets of printable characters, Unicode defines additional non-printable characters, including the "right-to-left" mark (RLM).

(See its Wikipedia page here.)

The RLM indicates that text should be displayed from right-to-left, as required by some languages. However, in the context of an English environmnet, this can be used for obfuscation or misdirection. The command below will create an executable scr file with an RLM in the filename, causing File Explorer to present it as a doc file.

New-Item -ItemType File -Path "$([char]0x202E)cod.tset.scr"
explorer.exe .

References

Standards

ASCII:

ANSI X3.4 (FIPS version)
ECMA-6
ISO 646

Escape Sequences:

ECMA-48
ANSI X3.64 (FIPS version)
ISO 6429

Additional Reading

Wikipedia's ANSI Escape Code Page
Microsoft's Supported Virtual Terminal Sequences
Everything you never wanted to know about ANSI escape codes
All Control Sequences supported by XTerm

Author

Tags