Tag: Standards

  • When outside the USA, never use the American date format

    I see it too often that websites of applications default to the American date format, like 3/21/2026 for March 21, 2026. Even the BIOS/UEFI setup screens in most PCs default to this format. Interestingly enough, BIOS setup screens in the 1990s used month names in letters, like Mar 21, 2026. So they went to a worse date format.

    The problem is that for day numbers of 12 or less, the American date format is ambiguous. Most countries outside the USA write dates like this: 21-3-2026. Now suppose it’s March 10, 2026 instead. If we write 3-10-2026 (or 3/10/2026), do you mean March 10 or October 3? Slashes versus dashes does not solve the problem either. Some European countries do write 10/3/2026 for March 10 2026, which is really indistinguishable from the American notation for October 3, 2026. If a document contains only dates with day numbers of 12 of less, it is unclear what the intended dates are. This is unacceptable.

    There is a good way to write dates in all-numeric form, that is not ambiguous. This is the ISO format, which is YYYY-MM-DD, The date March 21, 2026 is now written as 2026-03-21. This is exactly the reverse of the customary European format, but it is unambiguous, provided the year is always written with 4 digits, which the ISO standard requires.

    Further, the USA is about the only country that still uses 12 hour notation for all-numeric times, for example in formal time tables. Most European countries use 12-hour times in spoken language and informal settings., for example: “Let’s have dinner of quarter past six”. But in an official schedule we will write 18:15 in 24-hour notation. Digital clocks should show times in 24-hour notation when not inside the USA. And we should use 24-hour notation in formal settings. This way we do not have to say AM or PM when ambiguity could arise: for example when there could be meetings both at 08:00 and at 20:00. In 24-hour notation the hour past midnight runs from 00:00 to 00:59. Midnight itself can be denoted both as 24:00 (the very last moment of the old day) and 00:00 (the very first moment of the new day). In 12-hour notation, 12AM and 12PM often get confused. One hour after 11AM is 12PM and not 12AM. 24-hour notation does not have that confusion.

    If an event crosses multiple time zones, times should be specified in UTC. But that concept seems to be alien to most American companies, who specify online meeting times in a lot of time zones, but not the one you are in. UTC seems to be well understood in the telecommunications world, but not in aviation.

    English is a language that is widely used in Europe in international settings. I mostly write in English, I prefer to have my desktop operating system in English. There really should be an international English locale that defaults to internationally accepted standards like ISO date and time notation, A4 paper size, metric units and temperatures in degrees Celsius. The UK locale comes close, so I select that one when installing an operating system. Of course I need to correct my keyboard layout and time zone manually after that.

  • Ç, ü, é, the result of one crazy brainstorm session became a world standard

    Ever wonder where the original IBM-PC character set came from and how it was designed? This character set is known as CP-437 and there is a very informative Wikipedia article about it. https://en.wikipedia.org/wiki/Code_page_437. At one time this character set was synonymous with Extended ASCII and plaintext files were encoded in it. Even on modern Windows, that uses Unicode, you can still type ALT codes with the numeric keypad and you get the characters that originally had this code in CP-437, for example ATL-1-2-8 gives you capital C with cedilla. Millions of people still have muscle memory for at least some of these codes.

    There are a few things sane about it:

    • Codes 0x00..0x1F are fun symbols, such as smiley faces, card symbols, musical notes etc.
    • Code points 0x20..0x7E are ASCII. This may be a small miracle in its own right, given that it came from IBM, the inventors of EBCDIC.
    • Code 0x7F is the house symbol or capital delta, nobody knows exactly what it’s meant to be.
    • Codes 0x80..0xAF are accented letters for western European languages, plus other typographic symbols.
    • Codes 0xB0-0xDF are graphics characters, mostly for box drawing.
    • Codes 0xE0..0xFF are maths symbols and some selection of Greek letters.

    This looks logical enough. The Monochrome Display Adapter (one of the video cards you could have in the original IBM PC) supported no graphics, just those characters. Box drawing characters let you design fancy text-based user interfaces and the fun symbols came in handy for games. And a serious computer could use some serious maths symbols.

    There was support in the MDA hardware to extend the characters in the range 0xC0…0xDF into the ninth pixel column. All other characters had this column just blank (the character ROM as only 8 bits wide), but some of the characters needed to form continuous lines, hence the rightmost bit from the ROM was extended to the next pixel. This was a very clever hack indeed.

    Apparently the character set was based on the character set of Wang word processors, so some of the quirks may have been inherited from that.

    I have always wondered why the maths characters includes less-than-or-equal and greater-than-or-equal symbols, but not the not-equal symbol. Why is the section sign § (fairly common in Germany) in the range of fun symbols? This makes no sense.

    The selection of international letters was poorly picked. Why do we have Å and Æ as used in Danish and Norwegian, but not Ø? Indeed the Greek lowercase letter Phi (code 0xED) was sometimes used as small letter ø and in many fonts it looked sufficiently like it. The German sharp S (ß) looks a lot like a Greek lowercase beta and indeed it shares the character code 0xE1 with it. Portuguese A and O with tilde are not included, while some fairly unusual currency symbols are there. Many accented letters exist only in lowercase version, but that’s to be expected when you also need code space for box drawing characters and maths symbols. But all in all, the Apple Macintosh character set looked a lot more reasonable in terms of character selection.

    Which brings us to the order of the international characters in the code page. Why on earth does this range start with Ç, ü, é? This makes no sense. at all! This must be the result of a crazy brainstorm session, in which characters were haphazardly added. Later during the session, somebody discovered that some characters had been forgotten and some that were already included, had to be kicked out.Nobody had a list of characters that were essential for each language and which languages had to be included. Even worse, they were too lazy to reorder the set after the session, so the characters that were eventually selected, could be in a somewhat logical order.

    At least this character set is more reasonable than the one selected in 1985 for the Acorn BBC Master.. They did include the full Greek alphabet, both uppercase and lowercase, but they left out some accented letters that were important for French and Ditch, like the I-with diaeresis (ï). Fortunately this never caught on and the Acorn Archimedes switched to ISO Latin-1 (with some extensions).

  • Stop calling a mole 6160.5 particles!

    I see it way too often, even in serious newspapers: copying numbers that were originally in scientific notation, but without the superscript. For example: one mole is 6.022 × 1023 particles. This is just plain wrong! I hurts to see it! The number of particles in a mole is 6.022×10236.022×10^{23} particles. The superscript 23 is a power of 10, meaning it is a 1 with 23 zeros. This is one hundred thousand times one billion times one billion The number in question is therefore 602,200,000,000,000,000,000,000. Or more exactly: 602,214,076,000,000,000,000,000, as the number of particles in a mole is now exactly defined as this integer.

    Equally silly is writing down 1.5×10-5 where you really mean 1.5×1051.5×10^{-5}. The former number means 10 (15 minus 5), but you intend to write 0.000015.

    If you are restricted to ASCII and don’t have access to superscripts, please use the E notation for floating point numbers in programming languages: for example 6.022E23 or 1.5E-5. Or use the ^ notation like 6.022*10^23 or 1.5*10^-5.

    And finally: if you explain a quantity in terms of football fields, Olympic swimming pools or the distance from London to Paris, always review that calculation or have it reviewed by someone else. I’ve seen if being off by a few orders of magnitude far too often.