Author: admin

  • The 8-bit byte

    Today a byte is always a unit of exactly eight bits. Eight-bit units (bytes, octets), have long been a key part of the TCP/IP protocols, file format specifications and other standards. Disk files have their sizes specified in bytes, not in bits or any other multiple of bits. Pretty much every general-purpose computer architecture invented since the 1970s, addresses memory in 8-bit bytes. If you add one to an address, you get to the next byte in memory, not to the next full word. For the next full word, you have to add 4 or 8 to the address, depending on whether you are on a 32-bit or 64-bit system. In the past, we also had 16-bit systems, where you add 2 to an address for the next word. Because 8-bit bytes are assumed in so many standards, file formats and protocols, they are deeply ingrained in our culture and they are here to stay for the foreseeable future..

    In French, an 8-bit unit is called an “octet”, and this term is also used in some official standards documents. For all practical purposes, byte and octet are synonyms.

    Bytes have not always been 8 bits though. Most mainframes of the 1950s and 1960s had word sizes of 36 bits, but there were also machines with different word sizes, such as 40, 48 or 60 bits. Memory addresses selected full words, so to get to the next full word in memory, you always had to add exactly 1 to the address. Each word contained one number.Text data was stored as a fixed number of characters in each word. The size of a character was often 6 bits. This let you have 64 different characters: 26 (uppercase) letters, 10 decimal digits and a bunch of other symbols. A single 36-word could hold six characters.Instruction sets often contained instructions that helped you compose 36-bit words form single characters and to extract single characters from 36-bit words.

    Not all computers were word-addressable. For example the 1401 addressed memory as single characters. Numbers consisted of a variable number of characters, each of which represented one decimal digit.These machines accessed memory one character at a time, therefore they were slow, just like the early 8-bit microcomputers. But they were suitable for business applications.

    In 1964, IBM decided to define one instruction set architecture for all its computer systems. This became System/360. The word size was 32 bits, the character size as 8 bits and memory was addressed in units of 8-bit bytes. This is exactly the same way as modern 32-bit machines address memory. Characters were 8 bits and IBM defined an 8-bit character set called EBCDIC. To make hardware efficient, it is important that the number of bytes in a single machine word is a power of two. If this were not a power of two, you would have to perform a division to obtain the word address from a byte address. With a power of two, you can just ignore the last few bits of the address (and use these to select one byte in a memory word when doing byte access).

    As the number of bits in a single byte is also a power of two, the number of bits in a whole machine word is a power of two too. This makes it very efficient to implement bitmaps. Bitmaps can implement sets (like in the Pascal programming language), they can represent monochrome graphics and they can keep track of free blocks in memory or on disk. Now that we have these benefits, nobody will ever move away from power-of-two word sizes.

    In 1970, the DEC PDP/11 was a very influential machine that had 8-bit bytes and 16-bit words. As this was one of the main machines that Unix was developed on, this helped Unix to standardise on 8-bit bytes. The first microprocessors were 4-bit but 8-bit microprocessors followed soon. This also helped popularise the 8-bit byte.

    As octal numbers represent 3 bits per digit and 8 is not a multiple of 3, bytes are not a whole number of octal digits. If you write a 16-bit number in octal, the two constituent bytes will have different octal digits from the 16-bit word as a whole. For example the number 27125 is 064765 in octal, but if you break the number into two bytes, they become 0151 and 0365. This is a major pain in the butt. In hexadecimal the number is 0x69f5 and the separate bytes are 0x69 and 0xf5, This is why hexadecimal is vastly more popular than octal today. Each hexadecimal digit represents 4 bits and 8 is a multiple of 4. IBM knew this from the start and went all out on hexadecimal with System/360, but at DEC they were not so smart and they specified everything in octal. Granted, their PDP-11 instruction set contained many 3-bit fields and they came out nicely when written in octal as 16-bit numbers. The Unix and C legacy still contains octal numbers in many places:

    • Leading zero of an integer in C denotes an octal number. 030 means 24, not 30.
    • Octal escapes in string literals in C.
    • The mode parameter in the chmod command
    • The od command indeed displays octal by default.

  • When outside the USA, never use the American date format

    I see it too often that websites of applications default to the American date format, like 3/21/2026 for March 21, 2026. Even the BIOS/UEFI setup screens in most PCs default to this format. Interestingly enough, BIOS setup screens in the 1990s used month names in letters, like Mar 21, 2026. So they went to a worse date format.

    The problem is that for day numbers of 12 or less, the American date format is ambiguous. Most countries outside the USA write dates like this: 21-3-2026. Now suppose it’s March 10, 2026 instead. If we write 3-10-2026 (or 3/10/2026), do you mean March 10 or October 3? Slashes versus dashes does not solve the problem either. Some European countries do write 10/3/2026 for March 10 2026, which is really indistinguishable from the American notation for October 3, 2026. If a document contains only dates with day numbers of 12 of less, it is unclear what the intended dates are. This is unacceptable.

    There is a good way to write dates in all-numeric form, that is not ambiguous. This is the ISO format, which is YYYY-MM-DD, The date March 21, 2026 is now written as 2026-03-21. This is exactly the reverse of the customary European format, but it is unambiguous, provided the year is always written with 4 digits, which the ISO standard requires.

    Further, the USA is about the only country that still uses 12 hour notation for all-numeric times, for example in formal time tables. Most European countries use 12-hour times in spoken language and informal settings., for example: “Let’s have dinner of quarter past six”. But in an official schedule we will write 18:15 in 24-hour notation. Digital clocks should show times in 24-hour notation when not inside the USA. And we should use 24-hour notation in formal settings. This way we do not have to say AM or PM when ambiguity could arise: for example when there could be meetings both at 08:00 and at 20:00. In 24-hour notation the hour past midnight runs from 00:00 to 00:59. Midnight itself can be denoted both as 24:00 (the very last moment of the old day) and 00:00 (the very first moment of the new day). In 12-hour notation, 12AM and 12PM often get confused. One hour after 11AM is 12PM and not 12AM. 24-hour notation does not have that confusion.

    If an event crosses multiple time zones, times should be specified in UTC. But that concept seems to be alien to most American companies, who specify online meeting times in a lot of time zones, but not the one you are in. UTC seems to be well understood in the telecommunications world, but not in aviation.

    English is a language that is widely used in Europe in international settings. I mostly write in English, I prefer to have my desktop operating system in English. There really should be an international English locale that defaults to internationally accepted standards like ISO date and time notation, A4 paper size, metric units and temperatures in degrees Celsius. The UK locale comes close, so I select that one when installing an operating system. Of course I need to correct my keyboard layout and time zone manually after that.

  • Which sounds better: vinyl or CD?

    The short answer is:this will probably never be determined by the inherent limitations of either medium. There are dozens of factors that influence the sound quality you get from a recording. These factors include your own hearing, room acoustics, the quality of your speakers, the quality of your equipment, the quality of the mastering job, the quality of the recording and the quality of the performance.

    Frequency Response

    CDs have an absolute upper frequency limit of 22.05 kHz. For all practical purposes the frequency limit is 20 kHz. The frequency response up to that frequency can be nearly flat though.

    Vinyl on the other hand has no absolute frequency limit and frequencies up to 30 kHz or even higher, can be reproduced. In the past, there was even a quadrophonic system that used frequencies up to 45 kHz to modulate difference signals between front and back channels. This required special cartridges to play back and direct metal mastering, but it could be done.

    What about your own ears? Current scientific consensus is that nobody can hear frequencies higher than 20 kHz and adults certainly can’t. People cannot distinguish signals that contain harmonics above 20 kHz from signals that don’t. Therefore CDs have a frequency response that is adequate for human hearing.

    Note that FM stereo is limited to 15 kHz and people consider it superior to digital radio standards. So apparently, 15 kHz would be good enough for all practical purposes.

    Dynamic Range

    CDs have a resolution of 16 bits. In terms of signal-to-noise ratio, this is about 96 dB. You have a hard time getting more than 70 dB out of vinyl. So even if you consider the hard upper limit of the CD dynamic range worse than the maximum signal strength of vinyl and even if you consider quantisation noise worse than real noise, CD is still not worse than vinyl.

    What about human hearing? Here the dynamic range is about 120 dB, more than can be had from either CD or vinyl. But even if a medium could support 120 dB of dynamic range, it would be totally impractical for any real world listening.A realistic dynamic range of a symphony orchestra would be 60 dB, but commercial recordings are never mastered to get close to it.

    Mastering

    There are a lot of CDs that have extreme compression and therefore sound really bad. This is by no means an inherent limitation of the medium, but a priority set by recording engineers. They want to win the loudness wars and prioritise this over sound quality. Compressed recordings are perceived as louder and therefore they stand out when played on the radio.

    Vinyl records are usually not compressed this way, therefore they sound better than CDs that are. And there are also a lot of vinyl records around that sound a lot worse than they could have been when they were properly mastered.

    It is also rather hard to find vinyl records and CDs that have the identical recording and mixing on them, so you can really compare the sound quality of each medium. This is also true for normal CDs on one hand and Super Audio CD on the other hand. They are not the same mixing to begin with, they sound different and listeners prefer one over the other.

    Tweaking

    You have people who enjoy music, even if it is heard on less than ideal equipment, and you have audiophiles, who constantly tweak their equipment and buy super expensive cables on a never ending quest to improve sound quality. Record players have more potential to be tweaked than CD players, therefore they tend to be liked more by audiophiles.

  • Ç, ü, é, the result of one crazy brainstorm session became a world standard

    Ever wonder where the original IBM-PC character set came from and how it was designed? This character set is known as CP-437 and there is a very informative Wikipedia article about it. https://en.wikipedia.org/wiki/Code_page_437. At one time this character set was synonymous with Extended ASCII and plaintext files were encoded in it. Even on modern Windows, that uses Unicode, you can still type ALT codes with the numeric keypad and you get the characters that originally had this code in CP-437, for example ATL-1-2-8 gives you capital C with cedilla. Millions of people still have muscle memory for at least some of these codes.

    There are a few things sane about it:

    • Codes 0x00..0x1F are fun symbols, such as smiley faces, card symbols, musical notes etc.
    • Code points 0x20..0x7E are ASCII. This may be a small miracle in its own right, given that it came from IBM, the inventors of EBCDIC.
    • Code 0x7F is the house symbol or capital delta, nobody knows exactly what it’s meant to be.
    • Codes 0x80..0xAF are accented letters for western European languages, plus other typographic symbols.
    • Codes 0xB0-0xDF are graphics characters, mostly for box drawing.
    • Codes 0xE0..0xFF are maths symbols and some selection of Greek letters.

    This looks logical enough. The Monochrome Display Adapter (one of the video cards you could have in the original IBM PC) supported no graphics, just those characters. Box drawing characters let you design fancy text-based user interfaces and the fun symbols came in handy for games. And a serious computer could use some serious maths symbols.

    There was support in the MDA hardware to extend the characters in the range 0xC0…0xDF into the ninth pixel column. All other characters had this column just blank (the character ROM as only 8 bits wide), but some of the characters needed to form continuous lines, hence the rightmost bit from the ROM was extended to the next pixel. This was a very clever hack indeed.

    Apparently the character set was based on the character set of Wang word processors, so some of the quirks may have been inherited from that.

    I have always wondered why the maths characters includes less-than-or-equal and greater-than-or-equal symbols, but not the not-equal symbol. Why is the section sign § (fairly common in Germany) in the range of fun symbols? This makes no sense.

    The selection of international letters was poorly picked. Why do we have Å and Æ as used in Danish and Norwegian, but not Ø? Indeed the Greek lowercase letter Phi (code 0xED) was sometimes used as small letter ø and in many fonts it looked sufficiently like it. The German sharp S (ß) looks a lot like a Greek lowercase beta and indeed it shares the character code 0xE1 with it. Portuguese A and O with tilde are not included, while some fairly unusual currency symbols are there. Many accented letters exist only in lowercase version, but that’s to be expected when you also need code space for box drawing characters and maths symbols. But all in all, the Apple Macintosh character set looked a lot more reasonable in terms of character selection.

    Which brings us to the order of the international characters in the code page. Why on earth does this range start with Ç, ü, é? This makes no sense. at all! This must be the result of a crazy brainstorm session, in which characters were haphazardly added. Later during the session, somebody discovered that some characters had been forgotten and some that were already included, had to be kicked out.Nobody had a list of characters that were essential for each language and which languages had to be included. Even worse, they were too lazy to reorder the set after the session, so the characters that were eventually selected, could be in a somewhat logical order.

    At least this character set is more reasonable than the one selected in 1985 for the Acorn BBC Master.. They did include the full Greek alphabet, both uppercase and lowercase, but they left out some accented letters that were important for French and Ditch, like the I-with diaeresis (ï). Fortunately this never caught on and the Acorn Archimedes switched to ISO Latin-1 (with some extensions).

  • The Tecsun PL-680, a poor man’s ICF-2001D.

    Te Tecsun PL-680 may be the last truly analogue world band receiver ever produced. It still has analogue IF filters and an analogue demodulator. Newer radios like the PL-880 use the Silicon Labs Si473x DSP chip for IF filtering and demodulation instead.. It was introduced in 2016, almost 10 years ago. Its very similar predecessor, the PL-660, dates back to 2010.

    It shares some desirable features with the Sony ICF-2001D (or ICF-2010).

    • It covers the same bands, including the air traffic control band.
    • It has two IF bandwidths on longwave, mediumwave and shortwave.
    • It has a decent synchronous detector. Not nearly as good as Sony’s, but certainly very useful and better than the one that comes with some DSP radios..
    • It has a a very smooth tuning knob, capable of tuning in 1 kHz steps.
    • It’s very easy to tap a frequency on the keypad.

    The only thing that’s really missing is the 32 dedicated preset buttons. Of course, the Tecsun has tons of presets, but you need to use the V/M button or page buttons to get to them.

    For a start: I really hate the variable tuning speed as is used by default. But hold down the STEP button for a second or so and the radio will stay in slow tuning mode, which means 1 kHz steps. It will stay that way until you change batteries. I switched the mediumwave to American mode, so I can monitor the band segment between 1620 and 1710 kHz, where some pirate stations may be heard. Now that manual tuning is in 1kHz steps anyway, it does not matter if the fast steps are now 10kHz instead of 9.kHz. The PL-680 has absolutely mute-free manual tuning, something the less expensive DSP-based radios like the Tecsun PL-330 cannot offer.

    For SSB we have a 1kHz tuning step plus an analogue clarifier knob (and it does have selectable LSB and USB). It’s a really nice radio to tune around the ham bands.SSB has good audio without any wobble and without the distortion at the onset of each word, as many of the more modern DSP-based radios have.

    It performs very well on shortwave. Of course it does not play in the same league as the legendary ICF-2001D, but what radio does?

    The PL-680 runs on four regular AA batteries, which may be alkaline or NiMH, which I prefer to the modern 16850 LiIon cells.All in all a very nice radio to have around and to enjoy listening to.