Macro Systems for Programming

Written by

in

The C preprocessor may be the worst macro system for any programming language. It is complex enough to seriously obfuscate programs with it, but it is not Turing complete, so you cannot meaningfully implement loops in it. The preprocessor supports conditional compilation using #if and #ifdef, but these conditionals cannot appear in the expansion of a macro. As the preprocessor is a separate pass from the C compiler itself. preprocessor conditionals cannot contain values that are only known to the compiler (and not to the preprocessor), such as the size of a data item, enum values or the value of an object declared ‘const’. So for instance you cannot do something like this:

#if sizeof(struct1) != sizeof(struct2)
#error "struct1 must be the same size as struct2"
#endif

Some tricks exist to do a similar check at compile time, like:

int unused_array[-(sizeof(struct1) != sizeof(struct2)];

This declares an unused array of size 0 (permitted) i n case the sizes are equal, but of size -1 (compile time error) if the sizes are not equal.

Header files are not real “module interface” files, but hey are textually included by the preprocessor. We need the “include guard” pattern to prevent them from being include more than once, like this:

#ifndef MYHEADER_H_
#define MYHEADER_H_
...

#endif

If you define a macro expression that looks like a function call, you need to put parentheses around any parameters and around the expression itself. If the macro expands to a sequence of statements, you have to wrap in in a do .. while(0) construct. This is annoying and once in a thousand times you make a mistakes and another nasty bug is born.

If macros can’t do what you want, you could use a more powerful macro processor, such as m4, or you could use a dedicated program to generate the C files for you. Think for example of the ‘yacc’ program to generate parsers. Apparently there is no standard way to convert a binary file into a C header file containing an initialised constant array with all the bytes of that file. If have written scripts to do that just too often. If these arrays are large, the memory consumption of the C compiler is usually terrible.

Generics

In C++ you have templates. These are useful for generating data structures that may contain different data types as their payload. So you can instantiate a hash table that contains integers or one that contains “struct foo” items. In fact the std::map template is generic over two types: one for the keys and one for the stored values. You can declare a map like this:

std::map<std::string, u32> mytable;

The parameters are in angle brackets and they are all handled at compile time. Templates perform some tasks that macros could do, but they are part of the compiler itself, not of some separate preprocessor. In C++, it is pretty easy to make use of pre-defined templates, but it is usually very hard to create your own templates.

Other languages, like Ada, also have generics. The Ada programming language has no macro preprocessor, but it does have powerful generics.

Other Languages

Many programming languages do not have any macro system or preprocessor at all, not even generics. Languages like Pascal, FORTRAN and BASIC fall into this category.

Python has no micros or generics either, but as it is dynamically typed, you do not need generics to implement generic container classes. You can use operators like ‘+’ and ‘<‘ in a function and they will automatically just work for any data types that support these operators, making the function a generic one.

Systems like LISP support macros that allow you to build arbitrary LISP programs under program control. This way you can for example add totally new control structures to the language.

Some programming systems have a clear distinction between “compile time” and “execution time”. When the C compiler runs, it translates the C program to assembly language, but it cannot itself run C code. The compiler can run on a system different from where the compiled code is to run. You can run a C compiler on an x86 PC and let it generate code that will run on a Raspberry Pi Pico (ARM Thumb 2 core). With interpreted languages, like Python, you can run Python code at load time. When you import a Python module, it will usually add new functions and/or classes to your program, but it could also run code immediately when it loads. It could build a complex data structure when it loads. The Python interpreter can always run Python code.

FORTH has an interesting combination of features. When it compiles new functions (colon definitions), it converts FORTH source code into threaded code, which can later be interpreted. When not compiling new functions, it just runs your FORTH code. In FORTH, you can extend the compiler, so you can define new control structures. This gives you the same flexibility as LISP macros. Extending the compiler is done by marking some FORTH functions IMMEDIATE, so they will execute, even while a new function is being compiled. By default, the compiler would just add the function’s address to the threaded code.

Rust has macros. For example println! is a macro. Using those macros is fairly easy, but defining new ones is a hard job, requiring you to learn much about compiler internals.

Assemblers have had macros for a long time. A macro could expand to a sequence of instructions and depending on the exact assembler you were using, you could give the macro parameters (for example specifying a register to be used on some of the instructions), the macro expansion could contain conditional assembly or even loops and the macro expansion could contain local labels, so the labels inside different instances of the macro would not conflict with one another.

Zig

Zig is the latest development in macro technology.,First of all, Zig is a true compiler. It converts your source code to compiled machine code, to be executed later, possibly on a completely different machine. Therefore it has clearly distinct compile time and run time phases. But even at compile time, the compiler can run Zig code in your source files. Zig replaces any dedicated macro language that programming languages may have. The build script is Zig code, generics are implemented using Zig code. This can work because data types are values that can be parameters and returned results of compile time Zig functions. Zig code can generate arbitrarily complex data tables at compile time.

One example of compile time behaviour in Zig: the format strings for formatted printing are parsed at compile time and calls to specific output functions are generated, depending on the desired format and the types of the parameters passed to that function.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *