Tag: Computing

  • There’s a lot wrong with C, but what are the alternatives?

    C was developed in 1972 by Brian Kernighan and Dennis Ritchie as a systems programming language to implement the Unix operating system. It was based on B, which was in turn based on BCPL, which was in tern based in CPL, a very full-featured programming language. B was the most minimalist of the series: it had only one data type, serving both as integer and as pointer (and as a sequence of characters) C added new data types to it, like separate char, short, int and long, plus pointers. Structures ware added very soon..C became very popular in the 1980s, eventually overtaking Pascal.

    What’s Wrong

    There are a lot of things wrong with C. They have been discussed at nauseam already, so I won’t go deep here. Here are a few of my favourites:

    • Syntactic pitfalls, for example a stray semicolon after if or while, making the next statement separate from the conditional statement, even though it looks like it is controlled by it. There is also fall-trhough in switch statements and the dreaded single ‘=’ in a conditional where ‘==’ was intended. Even if you know to be careful with these, one in a thousand times your attention slips and a new bug is born.
    • Null-terminated strings might have been a good idea in the 1970s, when every byte counted, nowadays they are a bug magnet. It is not easy to find the length of a string, therefore it is not easy to ensure that a string will fit the buffer allocated to it. For example, before uyou call a function like ‘strcat’, you need two calls to ‘strlen’ to determine the lengths of the two inputs. You need to add them and add one for the terminating null byte, before you can compare it to the size of the destination buffer.
    • When an array is passed as a parameter to a function, no information about its length is automatically passed with it. From the point of view of the called function, it is just a pointer. We say that arrays decay into pointers. If you want to pass the array length to a function, it has to be done in a separate parameter. Both the caller and the called function have to do the right thing to make it work,
    • Macros are very error-prone, in particular function-like macros. You always need to surround the parameters with parentheses and put the resulting expression in parentheses as well. The resulting expression has to look like a single (parenthesised) expression or as a do-while statement. C macros are not Turing-complete (which may be a good thing after all) as conditional evaluation cannot occur during expansion of a macro. There’s always m4 if you want this kind of flexibility.
    • Header files are there only by convention, the language itself has no clear idea about module interfaces and modules. We end up using external tools to sort out the dependencies between object files and header files or editing Makefiles manually.

    And we are lucky that in C89 (the first ANSI standard), the full parameter list is part of the function prototype (its declaration in the header). In earlier versions this was not the case. Your header only specified the name and the return type of a function. For example, a function that took one integer as a a parameter, could be called with two floats and a pointer as parameters instead. The compiler had no way of knowing that the function was called in the wrong way, as it only looked at one source file at a time. A separate program called ‘lint’ was there to find inconsistencies like that.

    What’s Good

    There are a lot of good things in C:

    • C lets you do low level things, hat standard Pascal does not allow. C lets you cast any integer to a pointer and then access memory via that pointer. C lets you do pointer arithmetic. C lets you do bitwise ‘and’ and ‘or’, which is not standard in Pascal. C lets you implement a function like ‘malloc’ in C itself.
    • C does not restrict you to use arrays of all the same size if you want to pass them as parameters to a function.
    • As opposed to early standards of Pascal, C lets you compile parts of your program separately and it lets you reference functions in other parts via header files. It’s lower level than true modules, but it can be done.
    • Any C function that you can use, can be implemented in the language itself. Compare that to Pascal, where you cannot implement functions like ‘read’ and ‘write’ as they take a variable number of parameters. On top of that, write has the special formatting syntax using the colon character. The C standard library is clearly separate from the language itself and it can be completely implemented in C.
    • C is low level, so it does not require an extensive runtime library. C on an embedded system, only requires a small initialisation routine. You can run C programs (without most of the standard library functions) on a bare metal system with no operating system.
    • C control structures are flexible, compared to Pascal. You can terminate functions early with ‘return’ and terminate loops early with ‘break’. This helps avoid excessive nesting, stupid additional Booleans and ‘goto’ statements.
    • It is usually easy to inspect the machine code generated by the compiler and compare it with the C source code.
    • A lot of libraries are written in C.
    • C compilers are available for every platform under the sun.

    C Alternatives

    Many alternatives exist to C.

    First the big languages:

    • C++ is a superset of C. It has objects and inheritance, it has operator overloading and exceptions. Modern C++ adds smart pointers (that have a single owner at any given time, like in Rust) and it has convenient data types, defined by STL, that ‘just work’. It is a highly complex language. And because it is a superset of C, the dirty pitfalls are still there. You can still do manual allocation with ‘malloc’. Because of that, it can be harder to know what’s the right thing to do for any given data type. C++ does not have a garbage collector. If you leaned C++ in the 1990s, you will be amazed of what has been added to the language during the past decades.
    • Go does have a garbage collector and it also has parallelism built-in. It was originally developed by Google. See https://go.dev, t is the ideal language for multi-threaded servers. Go aims to be a safe language, where simple mistakes cannot lead to memory corruption.
    • Rust on the other hand avoids the garbage collector, but instead it uses an ownership model for each dynamically allocated object. At any given time, one piece of the program owns the object. References can be borrowed by other functions. Rust is not a truly object-oriented language with inheritance, but most of the benefits can be had with interfaces, that are called ‘traits’ in Rust. There are no exceptions, but the language helps you to handle error returns at each function call level. See https://rust-lang.org. Like Go, Rust aims to be a safe language.
    • D. is an extremely feature-rich language (including an optional garbage collector). It has many high-level constructs, but as opposed to Python, it is still statically typed and still truly compiled. And it is mostly C-like syntax. See https://dlang.org

    There are also smaller languages that want to stay closer to the true spirit of C. They want to fix some of the flaws of C, without introducing highly complex features like garbage collection, multiple inheritance, exceptions or parallel execution.. Some of these are single-developer projects, therefore they have no large communities around them. Some developers are very firm about features that will never be part of their languages, like inheritance, operator overloading, exceptions or macros. These languages do tend to have explicit allocation, a ‘defer’ statement (specify that something must be done whenever leaving a scope) and array slices. Some of these languages are:

    • Zig has no macros, but it has compile-time execution instead. Learn one language and program the build system, generics and everything else. Memory allocators are explicit, error handling is explicit and integer overflow is checked by default. It can directly include C headers to call C functions. See https://ziglang.org
    • Odin is another language at roughly the same level. Like in GO, there is no ‘while’ keyword and the ‘for’ loop allows you to just specify the terminating condition, so it behaves exactly like ‘while’ in C. Odin does not have any methods and a limited form of polymorphism. Map types (hash tables) are part of the language itself. See https:://odin-lang.org
    • C3 has operator overloading and it has a macro system that closely matches the desired use cases. It has explicit error handling using an ‘Optional’ type. It supports contracts (assertion checking). C3 has a very C-like syntax, but it has capitalisation rules to distinguish type names, constant names and other names (this to simplify parsing). See https://c3lang.org

    None of these smaller languages are going to displace C in the near future. Of the bigger languages mentioned, C++ is extremely widespread for large applications and Rust is taking over some of the code in the Linux kernel and system utilities, that were originally programmed in C.

  • What was wrong with Algol?

    When I started my study at the Eindhoven University of Technology, most computing was done on a Burroughs B7900 mainframe, that used Burroughs Extended Algol as its system programming language. We were taught Pascal though, but older students still wrote programs in Algol, that had fewer restrictions than Pascal and had a complex data type (complex numbers) too.

    Algol-60

    Algol started originally in 1958, mostly as a language to publish algorithms in (for example in scientific papers). Algol-60 became the version that was most popular and that everyone thinks of when you refer to Algol.

    Algol had block structures for IF-THEN-ELSE and FOR-loops (but no simple while loop) and it had procedures with local variables and parameters. These procedures could be recursive, as opposed to the ones in FORTRAN.

    Algol-60 had a few limitations though:

    • In had only three data types: INTEGER, REAL and BOOLEAN, plus arrays of these. There was no character data type and there were no records or pointers. There was something as a string data type, but the only thing you could do was pass literal strings around to a function that would print them. There was no way to manipulate character strings.
    • It defined no standard I/O functions, so this was not portable in any way.
    • Like Pascal, it offered no modularity. A program was a standalone entity. But the order in which you declared variables and procedures was less rigid than that of Pascal.
    • The set of control structures was very limited, keeping the evil GOTO statement necessary.
    • But the biggest drawback was the call-by-name convention. In most languages, such as Pascal, you either pass parameters to a procedure by value (it is an input parameter to the procedure) or by reference (the procedure is allowed to modify whatever variable you pass to it). In Algol-60 you had the choice between pass by value and pass by name. Pass by name meant that the expression passed as an actual parameter, had to be re-evaluated each time it was used inside the procedure. For simple variables this made no difference, but for an array element, the expression that specified the index could depend on variables that were modified inside the called procedure. This was inefficient to implement, requiring mini-subroutines for each of the pass-by-name parameters. These had to be provided by the caller, so the called procedure could call them back. It was also very hard to analyse programs that used it in a nontriviial way. You could do really clever things with it though. Call by name was more of an unintended ramification of a definition than the desired behaviour of the language.

    Burroughs Extended Algol was a full-fledged system programming language with all the data types you would want. The mainframe operating system was written in it and the language was very much extended compared to Algol-60. And as far as I know, they left out pass by name and implemented pass by reference instead.

    Algol-68

    Algol-68 was very much different from Algol-60. You cannot meaningfully consider these mere versions of the same language.

    For one thing, it ditched call-by-name and replaced it with pass-by-reference. It added many more data types:

    • Characters and strings.
    • structs and unions. The keywords struct and union were carried over to the C language, along with the keyword void, meaning no value.
    • References (which were pointers). There was dynamic allocation on the heap too.
    • Complex numbers
    • Flexible arrays.
    • semaphores, to be used with parallel statements.

    It had a very versatile slicing syntax for arrays and (unlike Python), it also supported multi-dimensional arrays.

    It had versatile control structures, including one for parallel execution, an extensive standard I/O library and special syntax for formatted I/O. And you could also overload operators, define entirely new operators and specify the priority of each operator. C++, Ada and Python also have operator overloading, but none of them allows you to change operator priorities.

    But Algol-68 still had no modules and separate compilation. A program was still a single source text.

    The syntax and semantics of Algol-68 were complex and very few implementations existed at the time. Full implementations were even rarer. At the university we had books about the language, but no working compiler. We only got an open-source implementation for Linux in 2005 with Algol-68 Genie https://jmvdveer.home.xs4all.nl/en.algol-68-genie.html

    Some features of the language were hard to implement. The language required garbage collection for objects allocated on the heap and parallel execution was its own can of worms.

    The biggest stumbling block, however, was the grammar of the language. Other languages are specified by context-free grammars, in particular in the Backus-Naur Form (https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). These grammars are easy to comprehend and there are tools to help create parsers for them.

    The drawback of context-free grammars is that they do not accurately specify what programs are legal. The context-free grammar specifies fore example that an expression can contain an identifier (variable name) consisting of letters and digits, starting with a letter, but it does not specify that a variable with that name must be declared earlier in the program.

    The grammar of Algol-68 on the other hand, does specify exactly which programs are legal. The grammar contains two levels: at one level you specify what production rules you can create and at another level you specify what programs you can create using those production rules you just created. A context-free grammar has just one static set of production rules to create a program. Algol-68 has a customised set of production rules for every set of variables and procedures you declare.

    Algol-68 is not particularly hard to comprehend per se, nor is it particularly hard to parse (compared with other complex languages like C++ and Ada), but comprehending the specification and using the two-level grammar to base a parser on, that is very hard indeed.

    That stupid octal dump program in Unix, that’s the reason why in Bourne shell, the “do” loops are terminated with “done” instead of “od”. The “if”-statements are terminated with “fi” and “case” statements with “esac”. those keywords come directly from the Algol-68 control structures. “od” terminates loops in Algol-68, but that would conflict with the name of the octal dump program, so it was not practical to use that as a keyword in the shell.

  • What was wrong with Pascal?

    Pascal was developed in 1970 by Niklaus Wirth. It was primarily an educational programming language that taught students structured programming and data structures.

    Pascal did have some limitations in its standard form, that made it less suitable for some real-world applications. Later languages like Modula-2, Oberon and Ada, were better for writing larger real-world programs, but they never had the success that Pascal once had.

    Pascal’s limitations were already pointed out by Brian Kernighan in a classic paper “why Pascal Is Not My Favorite Programming Language” https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pascal.html. More practical implementations of Pascal added features that mitigated those limitations, but these features were not standard. However, the situation was not half as bad as with BASIC, where even the basic syntax of procedure definitions differed wildly among dialects.

    Pascal on Microcomputers

    One of the first implementations of Pascal on microcomputers was UCSD Pascal. It was popular on the old Apple II and on many other machines. As it used P-code (comparable to Java byte code) instead of native compilation, it was not the fasted Pascal system on the planet. It did not implement everything of the full standard, but it did add useful features, like separately compiled modules, usable file access and a usable string type.

    Turbo Pascal came out in 1983, both for CP/M and for MS-DOS. It contained an editor and a compiler in one large program and you could edit, compile and run your programs from that program. It was a very basic IDE. The compiler was extremely fast for the time and it compiled to native machine code. Turbo Pascal had usable file access and a usable string type too, plus some more useful features, like bitwise operations. Modules (units) came in version 4 of Turbo Pascal.

    Pascal was very popular in the 1980s and much of the early Apple Macintosh software was written in Pascal. In the 1990s, Turbo Pascal evolved into Delphi, adding C++ style object-oriented programming and GUI development.

    But What Was Wrong with it?

    I should not repeat what Brian Kernighan was stating in his 1981 paper already, but to sum some things up (not all are mentioned in that paper):

    • No default clause with the CASE statement, if the selector matched none of the alternatives, behaviour was undefined according to the standard. Most if not all implementations did add an ELSE or OTHERWISE clause to CASE.
    • An array’s length is part of its type and if you want to implement a set of functions acting on arrays, each function could only act on one size array. For example a function to multiply matrices could not take matrices of different sizes. A later version of the Pascal standard added conformant array parameters, but few if any Pascal compilers implemented that.
    • What’s true for arrays in general, is also true for strings. In standard Pascal, a string array type has a predefined length and it is impossible to create functions that work on strings of any length. In many cases the programmer had to fill out string literals with spaces to give them all the same length. UCSD Pascal and Turbo Pascal did add a usable string type.
    • Only a limited number of operator precedences, therefore the AND and OR operators had a higher precedence than the relational operators like ‘<=’ or ‘=’.
    • Variables of the main program had to be declared far away from the main program itself, with all procedures and functions in between.
    • No usable file access in standard Pascal. For example a program could not specify a name of a file to open.
    • A program was a stand-alone entity. Pascal was not unique in this. For example Algol-68 had the same flaw. USCD Pascal and later versions of Turbo Pascal did add modules (that were called units).
    • Pascal has no exception mechanism. Certain errors simply cause the program to abort. Turbo Pascal came with a mini-spreadsheet program. The program checked against division by zero, but when a program got a floating point overflow because you multiplied two huge numbers, it was game over for the program and you lost your work.
    • You could not write procedures or functions that took a variable number of parameters or parameters of different types. Yet the built-in procedures read and write could take parameters of different type and a variable number of parameters. You could not write your own procedures that had the same flexibility as the built-in ones.
    • And one of my pet peeves. Identifiers in Pascal consist of letters and digits only. No spaces or underscores are allowed. While in German it is normal to write composite words together, in English it is not. Therefore they ended up putting uppercase letters in the middle of names to mark the start of the constituent words, leading to the ugly camelCase convention. For centuries, words had only one capital letter at the start (if they were not written in all uppercase). Now this camelCase convention has proliferated into words and brand names like eBook and iPhone. I really prefer the snake_case convention, but Pascal does not support it. As a saving grace, Pascal is case-insensitive.

    And the Good Things

    Of course Pascal had some good things as well:

    • In 1557, the Welsh mathematician Robert Recorde invented the equals sign. Remember, the symbol ‘=’ means equals, not assignment. Pascal users ‘:=” for assignment and ‘=’ for equals, like Modula-2 and Ada, but unlike C, Java, Python and anything else introduced after 1990. The C syntax has led to countless bugs in C programs because ‘=’ was written instead of ‘==’ in a conditional expression.
    • The syntax is generally easier to read than that of C. It also has fewer pitfalls.
    • The use of GOTO was discouraged, as it should. Structured programming was the norm in Pascal.
    • Pascal had recursive procedures and functions, like any language should have, but FORTRAN didn’t.
    • A true Boolean type (not in C until C99), real enumerated types (not fancy names for integer constants like in C) and a SET data type.
    • The string data type in Pascal (those dialects that had it) was a lot more memory-safe than C strings. A string variable recorded its maximum size and the actual string length of the currently stored string. Any function that altered a string variable, checked the maximum string size.
    • As opposed to C, Pascal procedures were aware of the sizes of any arrays passed to them making them more memory-safe. Arrays ‘decaying’ into pointers when passed as a parameter, was really a bad idea.
    • Compilers were comparatively fast and they required little memory (compared to compilers of other languages).

  • What was wrong with BASIC?

    The BASIC programming language was introduced in 1964 at Dartmouth College by John Kemeny and Thomas Kurtz. It was one of the first time sharing systems, where students could sit behind a teletype and begin typing their programs and start running them immediately. In those days it was far more common that you had to type your program on punched cards, hand it over to the computer centre and get your printout back after a few hours.

    BASIC was easy to learn and it was useful to university students who were not computer specialists. It supported many mathematical functions and early on, it supported matrix operations, a feature that got lost when BASIC was later ported to microcomputers. And what else got lost too?

    No Structured Programming

    The earliest versions of Dartmouth BASIC did not support structured programming, other than the FOR..NEXT loop. BASIC used line numbers for two different purposes:

    • To support editing of programs. Terminals were teletypes that printed everything you typed on paper. There was no computer screen , where you could move the cursor around and start editing code anywhere in your program. The internal editor kept the program ordered by line number. If you want to insert a line into a program, you simply enter a line with an intermediate line number (therefore you started your program using line numbers that incremented by 10, so there were intermediate numbers). By entering a line with an existing number, you replace that line. When you enter the line number by itself, you delete that line.
    • As labels for GOTO and GOSUB.

    At least in FORTRAN, you had labels only at lines to which you wanted to jump. In BASIC, every line could be jumped to. This may be somewhat easier to learn, but it made programs harder to maintain.

    The first version of FORTRAN had no subroutines at all, but all versions after that, had named subroutines, with local variables and named parameters. Nothing of that in early BASIC. In BASIC you had GOSUB to a line number. With Assembly language you could at least give your subroutines meaningful names. This was not the case with early versions of BASIC (including early versions for microcomputers). That made BASIC feel like an even lower level language than Assembly language in that respect. If you renumbered your program to get the line numbers less haphazard, your subroutines would change line numbers.

    The very earliest BASIC could only specify line numbers in an IF statement, for example:IF A<B THEN 30 ELSE 50

    All versions of Microsoft BASIC I know, did allow you to execute one or more statements conditionally, so you did not need line numbers and implied GOTO statements for all IF statements, for example IF A<B THEN PRINT "TOO LOW": C=C+1

    Not all versions of Microsoft BASIC had ELSE though. And ANSI Minimal BASIC only supported the variant with line numbers, as did TI BASIC.

    Too Much Variation

    Microsoft was the main developer of BASIC implementations for microcomputers. But Sinclair, Acorn and TI had their own implementations of BASIC. There was very little that the BASIC versions had in common. Some only supported variable names of a single letter (Sinclair, other than numeric scalar variables), others allowed arbitrary length names. Most had string variables that allowed strings of variable length, but some versions of BASIC had all strings in a string array the same length (Sinclair).

    Even within Microsoft itself, there was a wild difference between the commands and functions supported. For example, some BASIC versions supported user-defined functions, but only with numeric values and a single numeric parameter. Others supported both string and numeric functions with an arbitrary number of parameters of any type. Some BASIC versions did not have user defined functions at all. In some versions of Microsoft BASIC you had long variable names, all characters significant, but in other versions, only the first two characters of a variable name counted. So the variables DEBT and DECLARED would be the same. And if you dared to use a variable name MORTGAGE, the parser would mix up and tokenise the letters OR into a single byte (the OR operator). This happened only in some versions of Microsoft BASIC (for example MSX BASIC), but not in others (for example GW-BASIC on MS-DOS). On some computer systems, the disk BASIC was not even compatible with the cassette BASIC on the same machine.

    Acorn introduced BBC BASIC in 1981. It had structured loops like REPEAT..UNTIL, it had named procedures and named multi-line functions, while it was still largely compatible with the then current versions of Microsoft BASIC. Later versions of BBC BASIC added a full block-structured IF-THEN-ELSE-ENDIF, a proper WHILE loop and more. Procedures and functions did had parameters and local variables.

    Unfortunately, BBC BASIC completely ignored the ANSI standard for full BASIC. Microsoft did not. When they introduced Quick Basic and later QBASIC, they had named subroutines (with local variables and parameters) and the usual block-structured constructs, but they were completely different from those of BBC BASIC.

    You really can’t call two versions of BASIC the same language if the basic constructs for named subroutines or procedures are so different. You might as well call Ada and Pascal the same language.

    Specialised Syntax

    In BASIC, each command tends to have its own unique syntax. For example, the PRINT statement uses commas and semicolons between parameters to determine how the output will be formatted. A numeric parameter starting with # specifies a file to print to. Many BASIC versions have a USING parameter to allow formatted output. We have funny specialised syntaxes in Microsoft BASIC for OPEN (admittedly quite readable), like : OPEN "name" FOR INPUT AS #1

    And we have funny little syntaxes for line drawing, like LINE (100,100)-(200,150)

    User-defined subroutines always have to be called with CALL, like CALL MYSUB(10,20)

    They are not allowed to look like normal BASIC commands and they can certainly not have the same wild syntax variations as built-in commands.

    Pascal commits the same sin to a lesser degree. Pascal is very rigid in not allowing procedures with a variable number of parameters or parameters of different types. But the built-in procedures read and write can have a variable number of parameters and parameters of many different types. Add to this the weird formatting syntax like write(a:12:7); to specify that the number A must be printed with 12 characters and 7 digits after the decimal point. But other than the colon formatting syntax, calls to read and write look like ordinary procedure calls.

    C on the other hand,has no specialised syntax for printf or fopen. These are just ordinary functions. the printf function does have a variable number of arguments and it is somewhat of a challenge to implement such a function, but it can be done. Details of the I/O library are not baked into the syntax of the language itself.

    And the Good Things?

    Of course, BASIC did have some good things too:

    • Most versions of BASIC supported variable length strings and quite a few functions to manipulate strings. Standard Pascal lacked a variable length string type and the language went out of its way to make you suffer. You ended up counting out spaces to make all messages the same length, for example if they had to be passed as parameters. Turbo Pascal did have a usable string type, but with extensions like this, there is no standardisation among implementations.
    • Data embedded in the program. READ, DATA and RESTORE may look clumsy compared to initialised C arrays, but standard Pascal did not have a way to embed data in the program itself. You had to read it from file or you had to type endless lines of assignment statements to initialise the elements of an array.
    • Low memory usage. This was key in early microcomputers. A simple Microsoft BASIC fit into 8 kilobytes of ROM and programs were stored in RAM in such a way that every keyword occupied one byte. There was no separate storage of source code and parsed byte code. The LIST command showed you the program as you had typed it, with all byte tokens converted back to full keywords and all binary representations of line numbers printed in decimal.Interpretation was usually rather slow, but it got done with very little memory.
    • Floating point support and a full set of mathematical functions. Apart from integer BASIC versions of some very stripped down versions, you had full trigonometric and logarithmic functions.
  • Why the Dutch keyboard layout never caught on.

    The Netherlands is one of the few countries in Europe that uses the US-QWERTY keyboard layout for computers. Before there were personal computers, we had typewriters and nearly all typewriters sold in The Netherlands had two dead keys: one dead key contained the acute and grave accents and the other contained the diaeresis and circumflex accents. When you press a dead key on a typewriter, the accent is printed, but the carriage does not advance. The next letter you type gets printed under the accent you just typed. The positions of many symbols on the keyboard were not standardised: different brands of typewriters could have the question mark in a different location. The letters always followed the QWERTY layout. Some did have a special key for the letter ij, on others you had to type the letters I and J separately.

    Typewriters in the UK and the USA typically didn’t have dead keys. If you really, really want to type an accented letter, you could type the apostrophe, then backspace and then the letter. The same could we done with the double quote to get something that resembled a letter with an umlaut or diaeresis. In The Netherlands we used to type a comma, then backspace and then the letter C to get a C-cedilla (ç). Typewriters in Germany had dedicated keys for Ä, Ö, Ü and ß. Typewriters in Sweden had Ä, Ö and Å, in Norway and Denmark they had Æ, Ø and Å. Most of them had dead keys for accented letter too.

    France and Belgium had the weird AZERTY layout. French typewriters typically had a single dead key for circumflex and diaeresis, the few letters with acute and grave had their own keys, as did c-cedilla, but all only in lowercase.

    When personal computers were introduced, nearly all European countries standardised on a keyboard layout that was similar to the layout of the local typewriters, except for The Netherlands, that used the US layout, without any support for accented letters. In Dutch, accented letters are infrequent, but they are still important. The IBM-PC allowed you to type arbitrary symbols by holding down the Alt-key and then typing the numeric code of the character on the numeric keypad. For example the letter ë could be had by typing Alt-1-3-7. Oddly enough, this still works in modern Windows, even though it no longer uses that character encoding at all. Yes, the old CP-437 codes are translated to whatever Unicode character is applicable.

    National keyboard layouts other than US-QWERTY, all have one additional key left of the leftmost letter on the lower row. On QWERTY, this means left of the Z key, but on QWERTZ, it’s left of the Y and on AZERTY, it’s left of the W. In most national layouts, this key has the less-than and greater-than symbols..As there is an extra key in that position, the left shift key is much smaller than on the US keyboard. Some US keyboards (ISO layout) have this additional key too, while it is redundant for US-QWERTY. Many typists prefer the true ANSI layout with the wider shift key and the horizontal Enter key, over the ISO layout with the narrow shift key and the vertical Enter key.

    National keyboard layouts other then US-QWERTY, also use the right Alt key (marked as Alt-Gr) to access additional symbols. The ASCII characters @, [, ], { and } often require the Alt-Gr key. Most national keyboard layouts use the keys right of the letters for national letters (like Ä, Æ or Ñ) and/or dead keys, making fewer keys available for symbols. Only the UK keyboard layout does not have national letters or dead keys and is very similar to US-QWERTY. It swaps the @ and double-quote, it puts the £ instead of the # above the 3 and it puts the # on the key normally used by the \ symbol, which moves to the additional key left of the Z.

    Programmers prefer the straight US-QWERTY layout over any national keyboard layout, because of the easy access to square brackets and curly braces, which are used frequently in C and similar programming languages.

    Interestingly enough, the Netherlands does have a national keyboard layout. See https://en.wikipedia.org/wiki/List_of_QWERTY_keyboard_language_variants under Dutch. This keyboard has dead keys, like the old typewriters and shuffles the symbols around quite a bit. the square brackets end up on the key left of the Z. These proper Dutch keyboards are very rare. They may have been used by government institutions and Dutch publishers, but as 99% of the users has US-QWERTY at home, they are used to it and want to use it for work too.

    The reason why the Dutch layout never caught on may be a combination of programmer preference and cost awareness. US keyboards are made in larger series, therefore they tend to be somewhat cheaper. Schools may have selected US-QWERTY because it is more practical for programming.

    In Belgium, even in the Dutch speaking part, AZERTY is still the norm. Some programmers do use QWERTY keyboards, but that’s always a special order.

    At least since the 1990s, Windows lets you configure the US-QWERTY keyboard as US International with dead keys. This changes the following:

    • the apostrophe and double quote key becomes a dead key for acute accent and diaeresis. The same goes for the Caret (circumflex on shift 6) and the grave and tilde key.
    • The right Alt key gives access to many additional symbols and accented letters. Unfortunately for Dutch users, the letters ë and ï are not accessible this way, they require the dead double quote key instead.

    The US International layout with dead keys, is considered the Dutch keyboard layout, but this is not the same as the real Dutch keyboard layout. The big disadvantage is that you need to type an additional space whenever you type an apostrophe or double quote (or caret, left quote or tilde). This is super annoying for programmers, but it can also get in your way when typing just text..

    Linux distributions come with another option: US International with AltGr dead keys. It differs from US international with dead keys in the following way:

    • The dead keys only become dead when you type them with AltGr (the right Alt key). When you type the apostrophe key normally, you just get the apostrophe. It only becomes the dead acute accent key when typed with AltGr.
    • The set of symbols accessible with AltGr (without dead keys) is changed somewhat. the imported Dutch accented letters ë and ï are now in.

    But there are other layouts based on US-QWERTY as well, see for example: https://altgr-weur.eu/

    In Linux you can also configure a Compose key. You can use the right Control key, the right “Windows” key or the Menu key for that purpose. The disadvantage is that it requires three keystrokes to get a composed character. For example you type Compose, followed by /, followed by o to get ø. The advantage is you get access to many more symbols than with just dead keys or Alt-Gr combinations and that these symbols are mostly logical and easy to remember combinations of ASCII characters.

    Finally, Linux allows you to type any arbitrary Unicode character by typing first Ctrl-Shift-u, then the hex code of the desired character and finally Enter. For example Ctrl-shift-u, then the letter e b, then Enter gives you ë.