Enter the fun world of control characters. The five-bit ITA2 code contained a bunch of them, all of which got equivalents in ASCII:
- Carriage return, to return the carriage (that contained the paper) of the typewriter to the rightmost position. In old typewriters, the carriage moved to the left as you typed, so the next character position on the paper would become the target of the tying hammers. In later printers, the print head would move to the right as you typed, while the paper stayed in the same position. Carriage returns would return the print head to the leftmost position.
- Line feed, which advanced the paper vertically by one line.
- Letters and figures shift. ASCII has no letters and figures shift as such, but it has the SO (shift out) and SI (shift in) control characters that can shift to and from an alternate character set. Teletypes could for instance switch between Greek and Latin this way.
- The null character, all zero bits, which did nothing.
- The Bell character, that sounded a bell.
- The WRU (Who are you) character. This caused the receiving teletype to send a response. This way you could be sure a teletype was running at the other end of the line when you typed a message. You could not know if the teletype had paper and ink, but at least you could know it was connected and switched on. ASCII has the ENQ control character for this purpose.
The ability to print a carriage return and a linefeed independently, allowed you to return to the start of the line without advancing the paper to the next line. This way you could overprint a line with different characters, possibly adding accents to letters or underlining some words. The Backspace character in ASCII would be simpler to use for this purpose. This overprinting worked on printing teletypes, but not on CRT terminals.
ASCII added many more control characters, some of which are still widely used, some of which are all but forgotten. See https://en.wikipedia.org/wiki/C0_and_C1_control_codes. The widely used ones are:
- BEL Sounds a bell or a beep, or emits a visual attention signal.
- Backspace, Moves the print head one position to the left.
- Horizontal Tab. Moves the print head to the next tab position.
- Vertical Tab. Advances the paper to the next vertical tab position. This is not actually widely used but its purpose is still recognised. It was used with pre-printed forms that required some fields to be filled in.
- Form feed, Advances the paper to the next page. On CRT terminals this was sometimes used to clear the screen.
- DC1 and DC3 (control-Q and Control-S). These were originally used to control a paper tape reader. A program receiving data from a paper tape would send those control characters to pause and resume the paper tape reader, as a means of flow control. This type of flow control was widely used with CRT terminals to pause and resume terminal output and it’s still implemented today.
- Escape. This is mostly used today as a prefix for more advanced terminal commands, for example to position the cursor or to insert and delete lines on the screen. The ANSI escape sequences are universally used for terminal-based programs.
- DEL (0x7F). This control character, lonely at the top end of the 7-bit ASCII range, was originally intended to be used on papertape. One could rewind the tape a few character positions and then punch DEL over those characters to erase them. When the papertape was later read back, the DEL characters would be ignored, just like Null characters. As DEL has all bits set, punching it over another character would punch all holes, turning anything else into DEL.
Later, another range of control characters was introduced: the C1 range 0x80.. 0x9F. These are not widely used. There is an unambiguous newline character in this range (0x85), but this is not widely used. Unicode added line and paragraph separators, but none of these are widely used in plain text files.
Modern Usage
Nearly all modern terminals (typically implemented in software on a computer), allow you to type the control characters 0x01 to 0x1A (Control-A to Control-Z). The Tab key will output Control-I (0x09), the Return key will output Control-M (0x0D) and the backspace key will output either Control-H (0x08) or DEL (0x7F), depending on which side of the holy war you are. There is a dedicated ESC key (0x1B) and some less obvious key combinations will get you NUL and the characters in the range 0x1C..0x1F. Control keys are often used in a way that is totally unrelated to their original meaning in ASCII.
- Unix uses Ctrl-C (ETX) to terminate a running program, Ctrl-D (EOT) to indicate the end of input and Ctrl-Z (SUB) to suspend the currently running program.
- CP/M files had a length specified in 128-byte blocks. When a text file was not a multiple of 128 bytes in size, the file was padded with SUB (Control-Z) characters. Even if the file was a multiple of 128 bytes, they would still add a block of Ctrl-Z, so that would be a reliable end-of-file indicator. This habit was carried over to MS-DOS (that did store exact file sizes). A single Ctrl-Z was typically appended to each text file. Some programs choked on Ctrl-Z, some would choke when the Ctrl-Z character was not present. It was a mess.
- WordStar was an early word processor under CP/M. It got ported to MS-DOS and many other editors copied its control key layout. It used Ctrl-S for cursor-left, Ctrl-D for cursor-right, Ctrl-E for cursor-up and Ctrl-X for cursor-down. The choice of these control codes has nothing to do with their meanings in ASCII, but everything with the layout of these keys on the keyboard. WordStar was developed at a time that many computer terminals did not have cursor keys.
- Many Unix editors use control codes, such as nano and emacs make extensive use of control codes in a way totally unrelated to their meaning in ASCII.
- GUI applications use Ctrl-Z for undo, Ctrl-X for “cut”, Ctrl-C for Copy and Ctrl-V for “paste”.
Holy Wars
How should text files be separated into lines? This has never been settled for real.
There are two aspects of this:
- Should a line terminator be at the end of each line you see? or should there only be one at the end of each paragraph? For source code, every line should have a line terminator, but for running text. this is not so obvious. Some authors prefer putting an entire paragraph in a long line. The expect text editor programs to wrap these lines to the width of the screen they are using. Others want to put a line separator at the end of each visual line and put a blank line (two line separators) between paragraphs.
- Should the last line of a text file always end in a line terminator/separator?
- What should the line terminator be?
- CP/M and MS-DOS settled on the sequence CR-LF. This is what printers require when you print the file. This convention was carried over to Windows.
- Apple and a bunch of 8-bit systems settled on just CR at the end of each line. Apple later followed the Unix convention.
- Unix settled on just LF at the end of each line.
Today it’s common wisdom that programs should at least accept text files with LF-only and with CR-LF on reading Both conventions are here to stay. Unicode line separators or the new NL character 0x85 never caught on.
Another source of heated debate is the use of tab characters in source files.
- Some authors prefer their source files to be free of any tab characters. Any indenting is done with spaces.
- Other authors prefer indenting with Tab characters instead. The configuration of the tab stops becomes another point of discussion. Some programmers insist on tab stops every four spaces, others want tab stops every eight spaces.
Finally there is discussion on the character code that should be emitted by the backspace key (the big left arrow right of the ‘=’ key), in particular on Unix systems.
- Unix purists insist on Backspace = Backspace (0x8 = Ctrl-H).
- Others insist on Backspace = DEL (0x7F).
Terminal programs can be configured to emit either character as backspace and the “erase” character on the Unix line input function can be configured to any character. Many programs accept either convention. But it does get super annoying when not everything on the same system is configured the same way. If some terminal programs on your desktop emit BS and others emit DEL and your shell isn’t configured correctly for some of them.
Leave a Reply