Category: Uncategorized

  • To garbage collect or not to garbage collect

    Many computer programs require more memory as the size of the data increases. For example, a spreadsheet uses more RAM if the sheet contains more columns and rows. If you delete a row or column of the spreadsheet, the RAM used by it should be freed, so it can be reused later if new rows or columns get added to it. If you delete parts of a spreadsheet and the RAM does not get freed, the program will keep using more and more RAM as you delete cells and later add new cells, even if the total size of the spreadsheet does not increase. This is called a memory leak. The longer you use the program, the more RAM it will use, until it requires more RAM than is available and the program crashes.

    Many programs require the capability to dynamically allocate memory. Support in programming language varies. Most versions of BASIC do not allow you to allocate entirely new objects and arrays cannot be resized. However, strings can be of variable length, so you use way more memory if you have long strings in a string array, compared to the situation where all strings are null strings. In Python on the other hand, you can dynamically resize lists and even store new lists inside lists.

    Allocation

    Memory is allocated on what we mostly call the “heap”. A memory address range is reserved for the heap. When a program needs more RAM, it calls an operating system function to allocate a large chunk of memory in that address range. When that memory fills up and still more memory is needed, another system call is made to add more memory to the heap. Within that address range, the runtime library manages the blocks that are free and that are allocated. Free blocks are on a free list. When an allocation function is called, a free block is taken from the free list. When the free block was (much) larger than the amount of memory requested, the block is split. The used part is allocated and the unused part is put back on the free list. When an allocated block is freed, it is put on the free list again. If the freed block is next to another free block, these two blocks are usually merged into a single larger free block.

    Even if we do everything right and free each block as soon as it is no longer in use, the heap may get fragmented. The total amount of free space may be large enough, but the free space may consist of many small blocks with allocated blocks in between.

    Also an allocation may take much longer if a long free list must be traversed before a suitable block is found.

    Some languages, like Zig allow you to specify which allocator you want to use in which situation, so you can have an allocator that is more suitable for a specific application.

    Garbage Collection.

    If you delete an object (for example a list) in Python, the Python interpreter itself takes care to free that memory. Take the following code fragment:

    a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    b = a
    a = None
    b = None

    The first line allocates memory for a list of 10 numbers. The variable a stores a reference to that list. The second line causes variable b to store a reference to the same list. The list is not copied: it’s the same list in the same memory. The third line causes the variable a to no longer refer to the list. But the list is still accessible through variable b, so it cannot be freed. The fourth line causes variable b to no longer refer to that list. Now it becomes unreferenced and it can be freed.

    Earlier versions of Python used a reference count in each object. Line 1 would give the list reference count 1 (only one reference from variable a). Line 2 would increase the reference count to 2, line 3 would decrement it to 1 and line 4 would decrement it to 0, in which case it was time to free the memory. This is comparatively simple, but it would not work in case cyclic references exists. Lists can store references to other list (or even to the same list). Take for example the following Python fragment:

    a = [1, 2, 3, 4, 5]
    b = [10, a]
    a[4] = b
    a = None
    b = None
    

    The first three lines create a pair of lists that contain references to each other. List b has two elements: the number 10 and (a reference to) list a. List a has five elements: the numbers 1 through 4 and a reference to list b. Even when the variables are reassigned, the references inside the two lists remain valid and the reference counts will not decrease to 0. Both lists will be inaccessible though, as there are no other references from outside. Therefore, modern versions have a true garbage collector (in addition to the reference counting), to free up memory in situations like this.

    A garbage collector performs the following job:

    • It starts from all variables that contain references to objects. Those objects are marked.
    • For all marked objects, check which other objects are referenced from them. If they are not already marked, mark them and repeat the operation for the newly marked objects.
    • Free all allocated objects that are not marked.
    • Remove the marking from the marked objects.

    The job of a garbage collector is very complex. It uses much CPU power and while it is marking all dynamic objects that are still reachable, all operations of the program must be paused. This is unacceptable for hard real-time systems. If the program is supposed to react within 5 ms, we cannot accept that the program is frozen for 500 ms during garbage collection. In general you have no control when the garbage collector will kick in and when a large chunk of memory will actually be freed.

    Languages like Python, Lisp, Java and Go have a garbage collector. Of all these, Go is a truly compiled language and its garbage collector is state of the art and aims to reduce the duration of any pauses in the program.

    Manual Free

    C is on the other extreme. See the following code snippet:

    int *a = malloc(10 * sizeof(int));
    int *b = a;
    if (a == NULL) return -1;
    ...
    free(a);
    

    In the first line, we call the malloc function to allocate a chunk of memory for an array of 10 integers. We have to calculate the size manually: multiply 10 (the number of integers we want) by the size of one integer. Of course we have to check that the returned pointer is non-null.

    The variable b points to the exact same array.

    At the end of the program, we free the memory again. If we forget to free, we create a memory leak, if we free too soon, some pointers may still point to it, while the data in that memory range is no longer valid; the runtime system may have reused it for completely different purposes. If we call free(a), we may set a to the NULL pointer to prevent it from being used to access the array, but we may have forgotten about pointer b, which still points to the same memory.

    Manual free leaves room for a lot of bugs that would be prevented by using garbage collection.

    C,Pascal and Zig use manual free. C++ has this too, but it also has managed data types that take care of allocation and freeing internally.

    Rust

    And then of course we have Rust. Rust has no garbage collector, but it severely restricts how pointers (references) may be passed around. A dynamically allocated object always has one “owner”, which will be responsible for freeing it. References to objects can be “borrowed” by other functions, in which case they are temporarily unavailable to the owner. This is all checked by the compiler. The compiler knows when an object can be freed and it takes care to call free when appropriate, so you don’t have to do this explicitly.

    Due to the restrictions that Rust puts on copying of pointers, it is not really possible to have data structures with cyclic references either.

    The advantages of Rust:

    • You don’t have a garbage collector with the runtime overhead and nondeterministic timing.
    • You get memory safety with respect to freed dynamic memory. No use after free or double free and no memory leaks.
    • Rust also makes sure that in a multithreaded environment, only one thread gets access to any given object at the same time (multiple readers are okay, but a writer needs truly exclusive access)..

    Disadvantages of Rust:

    • Some data structures, consisting of nodes linked by pointers, are impossible to create.
    • Semantics of ownership and of the borrow checker are extremely complex. Getting a program compiled at all, can be a frustrating experience for beginners.

  • Why your 3-way hi-fi speakers struggle to sound as good as a portable radio

    I am sometimes amazed how good a portable radio can sound, with just a single speaker and an audio power of just 2 watts. I know, there is not much bass and certainly no deep bass, but the midrange, where all the vocals are, gets reproduced almost perfectly. It’s nice to have super low bass and super high treble, but the midrange is what matters most. That’s where most notes from most instruments are and also the human voices.

    I do own a good pair of stereo speakers and I do enjoy listening to them. But it’s just not always necessary to get the full concert hall experience to enjoy music. So I sometimes listen to portable radios or PC speakers at relatively low volumes.

    Wide-range Drivers

    When you need relatively low power, full-range speakers can sound great. Think about half decent portable radios and Bluetooth speakers. Nearly all headphones use full range drivers, that’s perfectly fine at those low power levels.

    You have simple speakers, that are in portable radios or in television sets and you have specially constructed full-range drivers to get more bass and more treble than normal speakers could output. Some have a tweeter dome and a tiny cone in the centre to get more treble output.

    Intermodulation distortion is a problem in wide range speakers, especially at higher volumes.

    Two-way Speakers

    The next step up from a wide-range driver is a two-way system with a bass-midrange driver and a tweeter. These setups can be simple and sound very good.

    A two-way speaker system can have a very simple crossover filter: a coil in series with the woofer and a capacitor in series with the tweeter. You might not even really need that coil: the woofer could take all the signal and the tweeter could take just the treble. If the crossover point between the woofer and the tweeter is above the speech range (for example at 5 kHz), you can have a very clean midrange. The tweeter is just for adding that extra spark of brilliance to the sound.

    Many European tube radios of the 1950s and early 1960s had a big bass-midrange driver at the front plus two (electrostatic) tweeters on the sides. Amplifier power was often just 2 watts RMS and they sounded great.

    Three-way Speakers

    If you want deep bass or high-volume bass, you will need a dedicated woofer. Together with the midrange driver and the tweeter, you have a three-way system.

    To get good bass, you need to pick a good design for your speaker enclosure. Two widely used solutions:

    • A bass-reflex system (they have a port at the back of the speaker cabinet). These are relatively efficient but they do not have a very flat frequency response. If the system is well designed and the speaker cabinet is large, the can work well. This is the most widely used solution also for two-way systems.
    • A completely sealed off speaker cabinet. The inside of the cabinet is padded with damping material. to prevent any sound inside the cabinet from bouncing around. As the enclosure is completely sealed, pushing the speaker cone inward or outward will cause a pressure difference, severely resisting the movement of the cone. In other words: you need excessive force to push the cone inward or outward, hence you need lots of amplifier power and the acoustic efficiency is very poor.

    Some disadvantages of a three-way system are:

    • Three way systems need a complex cross-over filter that should match the driver units (the actual speakers) and the enclosure design (the cabinet). Special care must be taken that the drivers are in-phase at the crossover points. This can all be done and good speaker brands get their act together, but some cheaper brands might not be so well designed.
    • The crossover filters need some chunky capacitors and coils, especially at the lower crossover point. These parts need to handle some serious power. If there are resistors in the crossover network, they can get quite hot.
    • If bipolar electrolytic capacitors are used in the crossover filter, they may change in value after a few years and go bad after a few decades. Nobody thinks about replacing these caps. People sometimes repair speaker cones or the rubber seals around them, but replacing capacitors?
    • If there is a crossover point in the middle of the speech range, it affects sound quality where it matters most. There may not be an abrupt change in frequency response when the speaker is brand new, but their might be when components wear. It may be better to have crossover points <300 Hz and >3 kHz for this reason.\

    You should probably never buy a speaker set from the catalogue alone, without listening to it. And if you do listen, do not just get carried away by the impressive bass and treble, but most importantly listen to midrange-heavy material, which should include vocals and preferably also speech. If you have a stereo recording of an on-stage dialogue, that should be perfect. If you play an instrument, listen to recordings of an instrument you know.

    If you listen to other types of music, that are very heavy on bass and or treble, your priorities may differ of course.

    Active Speakers

    It helps if you put the power amplifiers inside the speaker cabinets and every single speaker gets its own power amplifier stage. When I talk about active speakers, I talk about a setup like this. Advantages of active speakers are:

    • Each power amplifier can be coupled directly to a single driver unit, without long cables or passive crossover networks, therefore it can exercise good control over the speakers. Philips even had a transducer on the cone of the woofer, driving a closed loop control system to make the woofer behave itself. This was called MFB or Motional Feedback.
    • As it now comes before the power stages, the crossover network can be an active filter with much tighter tolerances than a passive network. Components in the active filter do not have to handle any power.
    • You can limit the drive strength of the tweeter amplifier so it will not burn the tweeter when there is clipping.

    Of course, active speakers are expensive to build and complex to install. You need a power socket for each speaker and you still need cables for your signals and possibly for control. Of course you could go for wireless, but that it its own can of worms. Unless you have physical on-off switches on each separate speaker, they will draw standby current when not in use. When used with a generic preamp, those speakers will typically go into standby when there is silence for some time (for example 10 to 30 minutes) and they will wake up again when they get a strong enough signal. The need to wake them up before the music starts playing, can be annoying.

    Surround Systems

    In the early 2000s, the craze was surround sound. Suddenly everybody wanted to have a surround sound system to use with their DVD player and TV-set, turning the whole thing into a home cinema set.

    I have seen some ultra cheap surround sets that made me laugh and cry at the same time. It came with 6 speakers: one so-called subwoofer that put out as much bass as a portable radio and five tiny speakers that sounded very tinny indeed. Their owners had tossed them all in the same corner, negating any surround effect they might provide. Needless to say: you are better off with a single soundbar or a pair of stereo speakers.

    When done properly, surround sound can sound fantastic though. Good surround amplifiers come with measuring microphones to calibrate the signals to the speakers and adjust them for room acoustics and speaker placement. You will need good speakers in the front and centre positions and you do not want to rely on the subwoofer for all the bass.

    For me it’s just not worth the expense (not just in terms of money but also in terms of speakers you need to place in your living room). SACD is dead now, so there are very few if any music releases on physical media that have surround sound. I seldom watch movies and when I do, a good stereo pair is all I need.

  • What to do with the FM-band after analogue switch-off

    According to current planing, The Netherlands will have analogue FM until 20235. At least the commercial radio stations are licensed until that year. Our national and regional stations, both public and commercial, are already on DAB+ and essentially every such station now on analogue FM is already on DAB+. DAB+ coverage has improved too during the last few years. Is analogue FM useless or will it be useless soon? Not at all.

    Disaster stations.

    The regional public radio stations are designated radio stations to provide essential information during disasters. Older technology is often more reliable. It’s a great idea to have the FM network available during a disaster, just as a backup. For one thing, DAB+ will require GPS to be available to get multiple transmitters synchronised to the nanosecond. The FM transmitters for the regional network transmit on different frequencies and are not required to be synchronised accurately. Plus more people have battery powered FM radios in their emergency kits than DAB+ radios. All DAB+ radios can receive FM, but not all FM radios can receive DAB+.

    Local stations

    DAB+ multiplexes are designed to have 12 to 18 programmes per multiplex. A local station usually carries one programme and, unless we have large cities with dozens of stations, local stations will never fill a multiplex. DAB+ makes no sense for local stations. Just keep them on the FM band.A single studio close to a single transmitting site, that’s what local stations should be. We could have both commercial and public local stations.

    Special purpose stations

    When the FM band gets unused for the most part, we can finally have some room for stations that do not want or need to be on the air all the time, like campus radio, special event stations, churches or concert halls,. The same frequency could be used during school hours by a campus radio station, during the evenings by a concert hall and on Sunday morning by a church.

    Radio Amateurs

    In the early days, radio amateurs were allowed to transmit music on shortwave. This was before the widespread use of SSB, and speech was transmitted in AM mode. Ordinary home radios with shortwave could just receive these transmissions. While it is still legal to transmit AM on shortwave, transmitting music is not. It is technically even legal to transmit in FM stereo on the UHF bands, but also on these bands, music is taboo.

    It would be great if music was allowed again on the amateur bands. You could add restrictions on the duration of music transmissions or let amateurs pay some form of copyright fee if they want to play music. Only when music is transmitted, you can really judge the audio quality of a transmission. If amateurs are on a section of the the FM band, ordinary people would occasionally listen in and they could be drawn to the hobby.

    Unregulated Broadcasters

    The Netherlands is one of the few countries where small stations, even hobbyists, can get a license to transmit on mediumwave (low power AM). You need to obtain several licenses and you need to pay copyright fees for music, but in principle it can be done. The same kind of license could be extended to the FM band, when that gets mostly abandoned by regular broadcasters.

    Of course, radio pirates do not want to be legalised. For one thing, the copyright collecting societies decided long ago, not to go after pirate stations, because they were “criminal organisations” and they did not want to collect money from criminals. So if you are a pirate, you risk a big fine for transmitting illegally, but at least you are safe from the copyright collectors. And it is part of the culture to do things without a license. The fun would be gone if it was legalised.

    It would probably help if the law was rewritten such that pirate stations are no longer considered “criminal organisations”, they just fail to pay some license fees. In that case the copyright collectors can go after them when they play music without paying a fee. Unregulated broadcasters would need to comply with power limits, they would need to use type-approved transmitters (as opposed to radio amateurs who can build their own) and they would have to stay within their section of the FM broadcast band. If they do all of this, they should be considered legal in principle, they just need to pay the required license fees. Criminal law should only be invoked when pirate stations cause real harm.

  • There’s a lot wrong with C, but what are the alternatives?

    C was developed in 1972 by Brian Kernighan and Dennis Ritchie as a systems programming language to implement the Unix operating system. It was based on B, which was in turn based on BCPL, which was in tern based in CPL, a very full-featured programming language. B was the most minimalist of the series: it had only one data type, serving both as integer and as pointer (and as a sequence of characters) C added new data types to it, like separate char, short, int and long, plus pointers. Structures ware added very soon..C became very popular in the 1980s, eventually overtaking Pascal.

    What’s Wrong

    There are a lot of things wrong with C. They have been discussed at nauseam already, so I won’t go deep here. Here are a few of my favourites:

    • Syntactic pitfalls, for example a stray semicolon after if or while, making the next statement separate from the conditional statement, even though it looks like it is controlled by it. There is also fall-trhough in switch statements and the dreaded single ‘=’ in a conditional where ‘==’ was intended. Even if you know to be careful with these, one in a thousand times your attention slips and a new bug is born.
    • Null-terminated strings might have been a good idea in the 1970s, when every byte counted, nowadays they are a bug magnet. It is not easy to find the length of a string, therefore it is not easy to ensure that a string will fit the buffer allocated to it. For example, before uyou call a function like ‘strcat’, you need two calls to ‘strlen’ to determine the lengths of the two inputs. You need to add them and add one for the terminating null byte, before you can compare it to the size of the destination buffer.
    • When an array is passed as a parameter to a function, no information about its length is automatically passed with it. From the point of view of the called function, it is just a pointer. We say that arrays decay into pointers. If you want to pass the array length to a function, it has to be done in a separate parameter. Both the caller and the called function have to do the right thing to make it work,
    • Macros are very error-prone, in particular function-like macros. You always need to surround the parameters with parentheses and put the resulting expression in parentheses as well. The resulting expression has to look like a single (parenthesised) expression or as a do-while statement. C macros are not Turing-complete (which may be a good thing after all) as conditional evaluation cannot occur during expansion of a macro. There’s always m4 if you want this kind of flexibility.
    • Header files are there only by convention, the language itself has no clear idea about module interfaces and modules. We end up using external tools to sort out the dependencies between object files and header files or editing Makefiles manually.

    And we are lucky that in C89 (the first ANSI standard), the full parameter list is part of the function prototype (its declaration in the header). In earlier versions this was not the case. Your header only specified the name and the return type of a function. For example, a function that took one integer as a a parameter, could be called with two floats and a pointer as parameters instead. The compiler had no way of knowing that the function was called in the wrong way, as it only looked at one source file at a time. A separate program called ‘lint’ was there to find inconsistencies like that.

    What’s Good

    There are a lot of good things in C:

    • C lets you do low level things, hat standard Pascal does not allow. C lets you cast any integer to a pointer and then access memory via that pointer. C lets you do pointer arithmetic. C lets you do bitwise ‘and’ and ‘or’, which is not standard in Pascal. C lets you implement a function like ‘malloc’ in C itself.
    • C does not restrict you to use arrays of all the same size if you want to pass them as parameters to a function.
    • As opposed to early standards of Pascal, C lets you compile parts of your program separately and it lets you reference functions in other parts via header files. It’s lower level than true modules, but it can be done.
    • Any C function that you can use, can be implemented in the language itself. Compare that to Pascal, where you cannot implement functions like ‘read’ and ‘write’ as they take a variable number of parameters. On top of that, write has the special formatting syntax using the colon character. The C standard library is clearly separate from the language itself and it can be completely implemented in C.
    • C is low level, so it does not require an extensive runtime library. C on an embedded system, only requires a small initialisation routine. You can run C programs (without most of the standard library functions) on a bare metal system with no operating system.
    • C control structures are flexible, compared to Pascal. You can terminate functions early with ‘return’ and terminate loops early with ‘break’. This helps avoid excessive nesting, stupid additional Booleans and ‘goto’ statements.
    • It is usually easy to inspect the machine code generated by the compiler and compare it with the C source code.
    • A lot of libraries are written in C.
    • C compilers are available for every platform under the sun.

    C Alternatives

    Many alternatives exist to C.

    First the big languages:

    • C++ is a superset of C. It has objects and inheritance, it has operator overloading and exceptions. Modern C++ adds smart pointers (that have a single owner at any given time, like in Rust) and it has convenient data types, defined by STL, that ‘just work’. It is a highly complex language. And because it is a superset of C, the dirty pitfalls are still there. You can still do manual allocation with ‘malloc’. Because of that, it can be harder to know what’s the right thing to do for any given data type. C++ does not have a garbage collector. If you leaned C++ in the 1990s, you will be amazed of what has been added to the language during the past decades.
    • Go does have a garbage collector and it also has parallelism built-in. It was originally developed by Google. See https://go.dev, t is the ideal language for multi-threaded servers. Go aims to be a safe language, where simple mistakes cannot lead to memory corruption.
    • Rust on the other hand avoids the garbage collector, but instead it uses an ownership model for each dynamically allocated object. At any given time, one piece of the program owns the object. References can be borrowed by other functions. Rust is not a truly object-oriented language with inheritance, but most of the benefits can be had with interfaces, that are called ‘traits’ in Rust. There are no exceptions, but the language helps you to handle error returns at each function call level. See https://rust-lang.org. Like Go, Rust aims to be a safe language.
    • D. is an extremely feature-rich language (including an optional garbage collector). It has many high-level constructs, but as opposed to Python, it is still statically typed and still truly compiled. And it is mostly C-like syntax. See https://dlang.org

    There are also smaller languages that want to stay closer to the true spirit of C. They want to fix some of the flaws of C, without introducing highly complex features like garbage collection, multiple inheritance, exceptions or parallel execution.. Some of these are single-developer projects, therefore they have no large communities around them. Some developers are very firm about features that will never be part of their languages, like inheritance, operator overloading, exceptions or macros. These languages do tend to have explicit allocation, a ‘defer’ statement (specify that something must be done whenever leaving a scope) and array slices. Some of these languages are:

    • Zig has no macros, but it has compile-time execution instead. Learn one language and program the build system, generics and everything else. Memory allocators are explicit, error handling is explicit and integer overflow is checked by default. It can directly include C headers to call C functions. See https://ziglang.org
    • Odin is another language at roughly the same level. Like in GO, there is no ‘while’ keyword and the ‘for’ loop allows you to just specify the terminating condition, so it behaves exactly like ‘while’ in C. Odin does not have any methods and a limited form of polymorphism. Map types (hash tables) are part of the language itself. See https:://odin-lang.org
    • C3 has operator overloading and it has a macro system that closely matches the desired use cases. It has explicit error handling using an ‘Optional’ type. It supports contracts (assertion checking). C3 has a very C-like syntax, but it has capitalisation rules to distinguish type names, constant names and other names (this to simplify parsing). See https://c3lang.org

    None of these smaller languages are going to displace C in the near future. Of the bigger languages mentioned, C++ is extremely widespread for large applications and Rust is taking over some of the code in the Linux kernel and system utilities, that were originally programmed in C.

  • What was wrong with Algol?

    When I started my study at the Eindhoven University of Technology, most computing was done on a Burroughs B7900 mainframe, that used Burroughs Extended Algol as its system programming language. We were taught Pascal though, but older students still wrote programs in Algol, that had fewer restrictions than Pascal and had a complex data type (complex numbers) too.

    Algol-60

    Algol started originally in 1958, mostly as a language to publish algorithms in (for example in scientific papers). Algol-60 became the version that was most popular and that everyone thinks of when you refer to Algol.

    Algol had block structures for IF-THEN-ELSE and FOR-loops (but no simple while loop) and it had procedures with local variables and parameters. These procedures could be recursive, as opposed to the ones in FORTRAN.

    Algol-60 had a few limitations though:

    • In had only three data types: INTEGER, REAL and BOOLEAN, plus arrays of these. There was no character data type and there were no records or pointers. There was something as a string data type, but the only thing you could do was pass literal strings around to a function that would print them. There was no way to manipulate character strings.
    • It defined no standard I/O functions, so this was not portable in any way.
    • Like Pascal, it offered no modularity. A program was a standalone entity. But the order in which you declared variables and procedures was less rigid than that of Pascal.
    • The set of control structures was very limited, keeping the evil GOTO statement necessary.
    • But the biggest drawback was the call-by-name convention. In most languages, such as Pascal, you either pass parameters to a procedure by value (it is an input parameter to the procedure) or by reference (the procedure is allowed to modify whatever variable you pass to it). In Algol-60 you had the choice between pass by value and pass by name. Pass by name meant that the expression passed as an actual parameter, had to be re-evaluated each time it was used inside the procedure. For simple variables this made no difference, but for an array element, the expression that specified the index could depend on variables that were modified inside the called procedure. This was inefficient to implement, requiring mini-subroutines for each of the pass-by-name parameters. These had to be provided by the caller, so the called procedure could call them back. It was also very hard to analyse programs that used it in a nontriviial way. You could do really clever things with it though. Call by name was more of an unintended ramification of a definition than the desired behaviour of the language.

    Burroughs Extended Algol was a full-fledged system programming language with all the data types you would want. The mainframe operating system was written in it and the language was very much extended compared to Algol-60. And as far as I know, they left out pass by name and implemented pass by reference instead.

    Algol-68

    Algol-68 was very much different from Algol-60. You cannot meaningfully consider these mere versions of the same language.

    For one thing, it ditched call-by-name and replaced it with pass-by-reference. It added many more data types:

    • Characters and strings.
    • structs and unions. The keywords struct and union were carried over to the C language, along with the keyword void, meaning no value.
    • References (which were pointers). There was dynamic allocation on the heap too.
    • Complex numbers
    • Flexible arrays.
    • semaphores, to be used with parallel statements.

    It had a very versatile slicing syntax for arrays and (unlike Python), it also supported multi-dimensional arrays.

    It had versatile control structures, including one for parallel execution, an extensive standard I/O library and special syntax for formatted I/O. And you could also overload operators, define entirely new operators and specify the priority of each operator. C++, Ada and Python also have operator overloading, but none of them allows you to change operator priorities.

    But Algol-68 still had no modules and separate compilation. A program was still a single source text.

    The syntax and semantics of Algol-68 were complex and very few implementations existed at the time. Full implementations were even rarer. At the university we had books about the language, but no working compiler. We only got an open-source implementation for Linux in 2005 with Algol-68 Genie https://jmvdveer.home.xs4all.nl/en.algol-68-genie.html

    Some features of the language were hard to implement. The language required garbage collection for objects allocated on the heap and parallel execution was its own can of worms.

    The biggest stumbling block, however, was the grammar of the language. Other languages are specified by context-free grammars, in particular in the Backus-Naur Form (https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). These grammars are easy to comprehend and there are tools to help create parsers for them.

    The drawback of context-free grammars is that they do not accurately specify what programs are legal. The context-free grammar specifies fore example that an expression can contain an identifier (variable name) consisting of letters and digits, starting with a letter, but it does not specify that a variable with that name must be declared earlier in the program.

    The grammar of Algol-68 on the other hand, does specify exactly which programs are legal. The grammar contains two levels: at one level you specify what production rules you can create and at another level you specify what programs you can create using those production rules you just created. A context-free grammar has just one static set of production rules to create a program. Algol-68 has a customised set of production rules for every set of variables and procedures you declare.

    Algol-68 is not particularly hard to comprehend per se, nor is it particularly hard to parse (compared with other complex languages like C++ and Ada), but comprehending the specification and using the two-level grammar to base a parser on, that is very hard indeed.

    That stupid octal dump program in Unix, that’s the reason why in Bourne shell, the “do” loops are terminated with “done” instead of “od”. The “if”-statements are terminated with “fi” and “case” statements with “esac”. those keywords come directly from the Algol-68 control structures. “od” terminates loops in Algol-68, but that would conflict with the name of the octal dump program, so it was not practical to use that as a keyword in the shell.