FORTH, the minimalist programming language

FORTH is a programming language, invented in 1970 by Charles Moore. It is very simple to implement and it can work with very small memories. Unlike BASIC for example, it is a very extensible language, reaching beyond the extensibility of languages like C, almost into the realm of LISP.

Like LISP, FORTH is one of very few programming languages that does not use infix expressions. Instead, expressions are written in Reverse Polish Notation, like so:

12 23 * 44 + .

This is equivalent to the expression 12*12+44 in traditional languages. The fun thing is that FORTH does not need a parser in the traditional sense. When a number (like 12) is encountered, it is pushed onto the stack, when anything else is encountered, like ‘*’, the corresponding function is executed. For ‘*’, the executed code pops two numbers from the stack, multiplies them and pushes the result back onto the stack. The word ‘.’ prints the result of the expression (320 in this case).

There are FORTH primitives to manipulate the stack, such as DROP, DUP, OVER and SWAP.. Traditional FORTH code tends to avoid local variables, but instead juggles multiple values on the stack. This stack juggling is sometimes hard to debug and it takes a lot of exercise to master. FORTH is simple, but not easy.

Like the B programming language, FORTH has only one data type: the machine word, typically at least 16 bits, even on 8-bit machines. It can represent a signed integer, an unsigned integer or an address. Two adjacent words on the stack can form a double-length integer. If a FORTH system has floating point, floating point numbers are usually stored on a separate stack and they are a distinct data type.

The Compiler

Now look at a function definition:

: PRINT-NUMBERS
  101 1 DO I . LOOP ;

The word ‘:’ is just another word that gets executed. When it is executed, the interpreter is switched to the compile state and a new definition (named PRINT-NUMBERS) is added to the dictionary. When in compile state, the interpreter does not execute the words it encounters, but instead adds the corresponding instructions to the newly compiled function. For numbers, it adds instructions to push the number to the stack when the newly compiled function is run. Some words, like “;” DO and LOOP have a special “immediate” flag and they are executed, even in the compile state. The word “;” leaves compile state and adds a “return” instruction to the newly compiled function. The words DO and LOOP take care to add the correct jump offsets, so the LOOP can jump back to the corresponding DO. There is also IF .. ELSE .. THEN and BEGIN .. WHILE .. REPEAT. These constructs can be nested, by, you guessed it, putting the branch origins and targets onto the stack while the compiler executes these words. No real parser is involved, the stack just does all the work.

Threaded Code

Especially in early FORTH implementations, the compiled code does not contain machine instructions, but addresses of functions to execute, mixed in with some literals and/or branch targets. This is called threaded code. Each FORTH primitive ends by loading the address and executing the next primitive in the threaded code. A non-primitive (a call to a compiled function) starts with a simple handler that pushes the threaded instruction pointer onto the return stack. This is a separate stack from the data stack, where values are stored.

Even though threaded code is slower than compiled machine code, it was much faster than compiled BASIC or even the P-code that many Pascal compilers on 8-bit microcomputers compiled to.

The 16 kB IDE

An interactive disk-based FORTH system could work on machines with as little as 16 kB of RAM. Fig-Forth was just over 8 kilobytes in size and required a few kilobytes as disk buffers. It did not run under an operating system, but itwas the operating system. The traditional FORTH operating system did not use disk files, but 1 kB numbered disk blocks. Blocks containing FORTH source code, consisted of 16 lines of 64 characters each.

You had a line-oriented editor for such blocks, so you had a complete interactive system within those 16 kilobytes.

Later FORTH systems, such as F83 (by Laxen and Perry) ran under CP/M and used regular disk files under CP/M. These disk files however, consisted of the same fixed format 1kB source code blocks as before and the same type of line editor was used. Later FORTH systems, for example F-PC under MS-DOS, stored source code in traditional text files, which is now by far the most common.

You needed maybe 48 KB of RAM to do a full recompilation of the FORTH system. FORTH is one of very few programming language systems that can recompile itself from source code on an 8-bit machine with 48 kB or less.

Most FORTH systems have an assembler, but this is usually a postfix assembler, making the assembler instructions look backwards compared to what you are used to. Using the stack, the assembler can work without a parser in the traditional sense. Each opcode word collects the operands from the stack and stores the bytes of that instruction into memory.

The Jupiter Ace

In 1983, there was a very small Z80-based computer with hardware very similar to the ZX-81. This was the Jupiter Ace It had just 3kB or RAM, 1kB was only used for character bitmaps (you could write them but not read them back) and 768 bytes were allocated to the video RAM. So you had just 1kB of free memory to do useful things. Like the ZX-81, you pretty much needed a 16 kB RAM expansion to do more serious things. FORTH itself was stored in 8kB of ROM. It did not store source code in the traditional way, but instead the editor would decode the threaded code of a compiled function, so you could edit it. For this to work, the compiler had to store any comments inside the threaded code as well. Jupiter Ace FORTH crammed an incredible amount of functionality in that 8kB ROM, including floating point operations, that were far from standard. The Jupiter Ace is probably the only home computer with FORTH built in.

Early use and Decline

FORTH was widely used on minicomputers in the 1970s, primarily for embedded control. Minicomputers usually had a 16-bit address space and 64 kB RAM or less. It was not unusual for these systems to be multi-tasking or even multi-user. Having the compiler on the computer itself, made development and debugging easier. If you had written a small function to perform a specific operation, you could just test it interactively from the command line, without any need to write special test programs.

FORTH was very much at home on early microcomputers as well, even though this market was dominated by BASIC. Compiled FORTH was so much faster than interpreted BASIC and the language was so much more extensible. As FORTH contained the editor, compiler and program execution in a single program, the turn-around time was usually much smaller than for traditional compiled languages, FORTH was a minimalistic IDE. With FORTH you did not have the long load times of traditional editors, compilers, assemblers and linkers.

FORTH was a pioneering language on many new computer systems. It was often the first programming language that could work on it.

Even though it was never a real mainstream computer language, FORTH did had a large niche market well into the 1990s, especially for embedded applications. But also some games and desktop applications were written in it. On some RISC CPU architectures, such as PowerPC and SPARC, FORTH was the basis for the boot firmware. Every PowerPC Macintosh or Sun SPARC workstation contained FORTH in its ROM.

Even though FORTH is still available for nearly all modern microcontrollers, it is very much a niche language today. Interactive debugging of C programs on microcontrollers has much improved over the last few decades. Cross compilers for C and C++ are everywhere and they are freely available. Every new CPU architecture has gcc and LLVM ported to it before the first silicon is available, so FORTH’s role as a pioneering language is largely a thing of the past.

The Compiler

Threaded Code

The 16 kB IDE

The Jupiter Ace

Early use and Decline

Comments

Leave a Reply Cancel reply

More posts

Floating point: the early years

Computer character sets: Unicode

Computer character sets: the 8-bit mess

Computer character sets: control characters