General considerations

Steam Hardware & Software Survey: March 2025: Less than 0.5% of PCs in Steam’s 2024 survey run a 32-bit OS¹, indicating near-total obsolescence in consumer hardware.

As someone who restores vintage electronics, I have experienced x86-32's descent into obsolescence first-hand. The first big blow to laptops equipped with chipsets that support only this architecture came from Microsoft, when it was rumored, but later confirmed, that Windows 11 would be the first version of Windows to completely abandon compatibility with x86-32 in favor of support for only x64-based processors, and later ARM, which emerged from a joint effort with Qualcomm to polish its Snapdragon X Elite, as part of a strategy that seeks to compete head-to-head with the hitherto unbeatable energy efficiency of Apple Silicon, unattainable in x86-based systems.

While it is true that, from that moment on, x86-32 was doomed to lose on all fronts, as that would mark a trend in which both companies and developers would no longer see the point in continuing to provide software compatible with x86-32, the truth is that even with a good graphics card of yesteryear, even Windows 10 already performed very poorly on those processors, both on laptops and desktop machines.

Microsoft, in fact, was quite forgiving, more so than the Linux developer community itself. While the first operating system to completely abandon x86-32 was Solaris 11, the first general-purpose operating system to do so was Arch Linux, with 2017.02.1 being the last dual-version ISO of the operating system (both 32-bit and 64-bit). It is true that there is the Arch Linux 32 project, which seeks to revive it so that it continues to work on legacy hardware, but having tested it, I have to confirm that its performance is certainly mediocre.

A death foretold

In October 1985, Intel launched the Intel 80386, the first genuinely 32-bit processor for its x86 family (relative to the Intel 8086), in a conjuncture marked by competition that was already being planted by Motorola processors of the 68000 series and, in particular, the 68020, which was used in a wide range of machines such as the Sun-3 workstations from Sun Microsystems or the NeXTcube from NeXT, which Tim Berners-Lee used to mount the first web server, sticking on it a note that said: "This machine is a server. DO NOT POWER IT DOWN!!".

Theoretically, its predecessor, the Intel 80286, operating in its real mode, was a 16-bit processor, backward-compatible with software intended for the Intel 8086 or 8088. It integrated about 134,000 transistors and a 16-bit data bus, which can be expressed as,

\frac{A}{B} = C

where $A$ is the total number of bits on the data bus, which results in the fact that for every $C$ byte, there are $B$ bits, that is,

\frac{16 \text{ bits}}{8 \text{ bits per byte}} = \text{2 bytes}

However, the Intel 80286, which in protected mode offered a maximum of 24-bit memory addressing, equivalent to $2^{24} = 16,777,216$ , the entire Intel 80386 silicon was permeated by a 32-bit architecture, equivalent to $2^{32} = 4,294,967,296$ , that is to say,

\frac{16,777,216\text{ bytes}}{1,048,576\text{ bytes per MB}} = \text{16 MB} , \frac{4,294,967 \text{ bytes}}{1,048,576 \text{ bytes per GB}} = \text{4 GB}

In summary, while the Intel 80286 could address a minuscule — at least by today's standards — amount of 16 MB, the Intel 80386 could address 4 GB, an enormous amount, at least for those years. In such a conjuncture, where Intel, with the introduction of i386, was on everyone's lips for such innovation, the launch of machines based on the new 80386 began, such as the Compaq Deskpro 386.

The maximum amount of memory that the 32-bit architecture originating with the Intel 80386 can address is revealing. This is because, as you've probably noticed, almost no machine sold today has less memory than an x86-32-based system can address. More importantly, since at least early 2010, many computers shipped with a default hardware configuration that made them incompatible with x86-32.

It is also an efficiency issue

Many so-called "experts" often claim that x86-32's drawbacks end with its memory addressing limit, which, as we mentioned, is equivalent to 4,294,967,296 bytes or 4 GB. They ignore that it is only after this limitation that x86-32 becomes nothing more than problems.

Let’s start by saying that, with the exception of instructions — which in both 32-bit and 64-bit x86 architectures are limited to ≤ 15 bytes in length — most data handled at the register level or per execution cycle in x86-32 should not exceed 32 bits. Registers are a type of memory that stores instructions, data and addresses, useful for the execution of machine code. Processors with a 32-bit architecture contain 8 registers with a length restricted to a maximum of 32 bits in length, as well as the width of the data executed per clock cycle.

If the data is wider, execution will be partitioned into separate micro-operations (µops) or clock cycles, which will obviously result in a greater number of cycles per clock for the execution of a task that would take fewer cycles per clock on a 64-bit processor. By today's standards, this is especially scandalous in an era where we are seeking processors that are not only more energy-efficient, but also more efficient per clock cycle, regardless of the already controversial limitations of designing processors with a complex instruction set.

But the already unquestionable efficiency problems of x86-32 do not end there. Frequently, it is thought that the lowest latency storage medium in a computer is random access memory (RAM), when in truth it is the processor registers and the cache, with access times measured as low as a few nanoseconds per operation. In the case of x86-32, not only is the register size limited to 32 bits in length, but also the number of available registers, which is limited to 8 registers, compared to the 16 registers offered by a 64-bit architecture.

EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP

Independent of the architecture on which they are designed, processors are designed with the assumption that certain data will be reused multiple times in the program. It is for this reason that registers are included in the processor, as they are a very low-latency access route, essential for obtaining frequently used data. In the previous block of code, I have included the 8 registers offered by x86-32, which may have seemed familiar to you given their close connection with those used in assembly language.

But even by 1985 when Intel released the 80386, 8 register entries were insufficient for several tasks. This results in the processor having to interact much more frequently through a higher latency pathway, which is the system's random access memory, accessing it for the purpose of retrieving information that cannot otherwise be accessed due to insufficient registers.

Although 80386 introduced 32-bit registers, e.g. EAX, EBX, ECX, EDX, etc., these could not natively store or take values larger than > 32 bits. Those operations that involve the temporary storage of values greater than 32 bits must be partitioned into different cycles per instruction or micro-operations for their completion, often also involving the use of register pairs such as EAX or EBX, ergo resulting in a greater number of cycles per instruction than would be necessary in a architecture based on x86-64, as well as complicating low-level programming.

x86-S ≠ x86-64: Intel's proposal for a simplified architecture

In 1989, just 4 years after Intel launched the 80386, Hewlett-Packard became concerned that processors used for high-performance computing with a reduced instruction set could only execute one instruction per cycle.² Hewlett-Packard saw that the superscalar complexity of its PA-RISC processors was null, making it impossible to parallelize multiple instructions in a single clock cycle, as well as its branch prediction, which made it prone to handling very inefficiently programs with many changes in the control flow, such as those that make very frequent use of if-elseif-else statements. The above, coupled with the impossibility of adopting x86-32 given its aforementioned limitations in terms of memory addressing and support for parallel execution, paved the way for the formation of a coalition between Intel and Hewlett-Packard to find a solution. The goal was to prepare a successor to x86-32 (IA-32) and Hewlett-Packard's PA-RISC architecture.

As part of such a project, based on Hewlett-Packard's efforts to hone the instruction parallelization capabilities of its processors for high-performance enterprise workstations, servers and mainframes, culminated in the invention of the EPIC (Explicitly Parallel Instruction Computing) architecture, which Hewlett-Packard would say was just an "outdated term" for Intel's ambitious Itanium.³ On May 29, 2001, Intel produced the first Itanium processor, which introduced IA-64, which is the ISA implementation of the EPIC architecture.

But at a great distance from what one might think, IA-64 — with EPIC — is not an architecture at all related to x86 and such is the case that they are not mutually compatible. Being developed from scratch, it defined that the compiler was responsible for developing a plan for the execution of instructions in each microoperation executed by the processor. Moreover, it did not use 8 registers like x86-32 or 16 registers as later introduced by x86-64, but 128 registers, emphasizing efficient use of them as other architectures such as IBM's PowerPC did.

For this reason, Intel's Itanium processors were very poorly received and successful, as they were incompatible with the already mature software catalogue available on x86. Even in the sector for which it was intended, enterprise computing, there was little adoption. By 2005, Dell announced it would begin discontinuing its Itanium-based computers, with IBM following suit shortly after. Microsoft also showed disinterest in Itanium, to the point where even Visual Studio 2005 did not allow development of IA-64-compatible software.

Despite targeting a different market, AMD's Opteron processors were beginning to emerge as an excellent alternative to the expensive and doomed Itanium processors, thanks in part to Sun Microsystems, whose Galaxy servers with up to 16 cores offered excellent performance and convenience in the face of Itanium's compatibility issues. Faced with such a situation, Intel's only option was to retaliate by equipping its Xeon processors with much higher clock speeds and massive L3 caches (for x86), of up to 8 MB, making the existence of Itanium processors even more meaningless, as Xeon turned out to effectively meet most of the demands of enterprise computing.

What is truly pertinent about Intel's Itanium processors was that they were one of the first to introduce a 64-bit microarchitecture to the computers of that time. Before the Itanium, there was the IBM POWER2, Sun's UltraSPARC-I, and the MIPS R4000, the latter being the first commercially available to integrate a 64-bit microarchitecture, but none were as ambitious as the Itanium. Despite such novelty, today we all know that the winner of the race to be the first to bring 64-bit computing to the masses by introducing it to the fairly mature x86 architecture was AMD with AMD64 — AMD's 64-bit microarchitecture for x86 — with the launch of the Athlon 64 3200+ in 2003.

Despite its ambitious project, Intel discontinued Itanium on January 20, 2020, after 23 years in which it reported very little success and adoption, even in the sector for which it was designed. Intel has recently dusted off the idea of a 64-bit only microarchitecture with X86-S in what appears to be an attempt to simplify x86, at a time when Apple's entry into the laptop processor market with its ARM-based Silicon processors — built on a reduced instruction set — has set off alarm bells at both AMD and Intel, since it has made many reconsider whether it is really convenient, or even necessary, to continue using x86 because Intel and AMD have imposed it on us, when ARM seems to work just as well or even better, at a much lower cost than x86 in terms of energy consumption and performance in certain cases.

In view of the indubitable fact that x86-32 is obsolete, the x86-S (a.k.a. x86-Simplified) initiative emerged, which originates from Intel's efforts to de-bloat x86. With this initiative, Intel recognizes — not directly — that x86 continues to preserve features of the primitive 8086 that are not longer useful and, as a result, not necessary, making them worthy of being called "bloat", due to the fact that they negatively impact the simplicity of the processor, without any solid reasons justifying their presence. In a page on Intel's official website, Intel claims that x86-64 is the current standard to which processors for today's computers are designed and, based on this reasoning, Intel concludes that there are certain elements of legacy execution and compatibility with legacy software that in the present time are in disuse.⁴

Key removals

In its draft, Intel proposes eliminating real mode. Real mode is the mode in which the already primitive 8086 worked, and allowed addressing an amount of memory no greater than ≤ 1 MB, and limited programs to working only with 16 bits. There is no need to justify this last decision, as it is clear why Intel wants to eliminate it: because it is absolutely useless in the present days. As specified by x86-S, Intel would seek to remove any trace of protected mode. Protected mode was introduced in the also primitive 80286 of 1982, extending memory addressing beyond ≥ 1 MB and isolating each program's memory so that it was not possible for one to overwrite another's memory.

But unlike real mode, protected mode deserves an explanation for why Intel wants to chainsaw it, given that it proved more convenient to "containerize" — for lack of a better term — a program's memory through software rather than hardware with inventions such as DEP (Data Execution Prevention) or ASLR (Address Space Layout Randomisation), assisted by NX Bit (No-eXecute bit), a security measure introduced in x86-64 by AMD with AMD64 in 2003 and later introduced by Intel to its Intel 64 processors under the name XD (eXecute Disable).

Intel has also slipped in the intention of eliminating another antediluvian element bequeathed by the primitive 8086 to modern x86-based processors: memory segmentation. Memory segmentation in the 8086 worked very differently than it would be conceived today, as it emerged as a patch proposed by Intel with which it sought to enable the 8086 to be capable of addressing an amount of memory no greater than ≤ 1 MB, an amount much larger than the 64 KB it could natively address. For the development of this technology — innovative in those days but obsolete at the present time — Intel introduced segment registers; with a block start address, which represents the starting point of the block in memory, and an offset, a 16-bit value indicating how many bytes to shift from the starting point of the block in memory to a specific memory region with a ceiling set at 64 KB. Once more, there is no need to explain why this is useless.

More importantly, Intel seeks to modernize and simplify SIPI (Startup Inter-Processor Interrupt). Today, when an x86-64 based system is booted, it boots into real mode, and through the call to a trampoline function, it goes from real mode, to protected mode, and then to long mode, which runs in 64 bits. Intel itself acknowledges that the removal of these elements will require changes to the way operating systems boot, due to the fact that operating systems compatible with x86-64 platforms are not designed to boot in long 64-bit mode only.

In addition to the earlier mentioned new features introduced in the 80x86, which Intel proposes to discontinue due to their limited use and practical utility, there are protection rings, particularly those at the intermediate level within the hierarchy. These protection rings debuted in Multics — a time-sharing operating system launched in 1969 and a pioneer in several modern cybersecurity concepts — via software, while being assisted by the Honeywell 6180 hardware, designed down to the millimeter to work in perfect harmony with Multics.⁵

It introduced a total of 64 rings of protection, each demarcated by its permission level, starting with Ring 0 (the most privileged in the hierarchy), with successive rings gradually decreasing in privileges.⁶ As a direct result of the many rings of protection and the nuances between them — many being indistinguishable from each other — coupled with difficulties in determining when to use one ring over another, only a few were used in practice, with the rest remaining in disuse. Intel refloated this security measure introduced by Multics in 1969 in the 80386 of 1985, reducing the protection rings from 64 levels to 4 levels, with 2 of these being intermediate levels.

With the arrival of protected mode with the Intel 80286 in 1982, Intel introduced, along with memory segmentation, multi-level protection rings to x86, taking this approach from the protection rings used by Multics, defining in this case 4 levels of privilege: Ring 0, Ring 1, Ring 2 and Ring 3. Ring 0 (kernel/core mode) is essential for anything that makes active use of low-level interaction with the hardware, such as the kernel and drivers, with Ring 3 (user mode) being the one under which the operating system operates on a daily basis with severe restrictions on access to the memory reserved for the kernel, except in those cases where a process running within this ring requires administrator/elevated permissions. It is in this case that the privilege held by Ring 0 could be transferred to Ring 3 to operate and interact at a low level with the memory, kernel, or processor of the system, still under certain restrictions on what it can do.

Lastly, there are the intermediate level protection rings: Ring 1 and Ring 2. Ring 1 does not have a very clear and well-defined purpose, but by allowing the operating system to be divided into layers, it found a reason to be in virtualization with VMware, KVM, or Hyper-V by not providing access as privileged as Ring 0, but also not as restricted as Ring 3. Despite this, its only "utility" was dissipated when Intel introduced its VT-x (previously codenamed "Vanderpool") virtualization technology to its Pentium 4 662 and 672 on November 14, 2005, and whose CPU flag is "vmx", which allows the unlocking of VT-x capabilities. AMD followed in Intel's footsteps and on May 23, 2006 introduced its AMD-V virtualization technology (codenamed "Pacifica") on the Athlon 64 ("Orleans") and Athlon 64 X2 ("Windsor"), making them the first AMD processors to support virtualization. Since that event, Ring 1 has fallen into disuse in favor of hardware virtualization.

On the other hand, Ring 2 is an intermediate level that does not provide the same level of privilege as Ring 1, but it does provide enough to allow you to operate with less restriction than Ring 3. What distinguishes it from Ring 1 is that it is subject to greater controls regarding access to and interaction with the hardware, with less access to memory segments and peripherals. Like Ring 1, it is rarely used, which is why Intel wants to get rid of Ring 1 and Ring 2: they are currently neither useful nor as widely used as Ring 0 and Ring 3. Moreover, its mere presence adds unnecessary burden to the design of x86-based processors, which are moving towards a more efficient architecture.

Intel has also hinted at its intention to eliminate the already primitive APIC — used for interrupt management in cycles per instruction and multiprocessing efficiency — which was introduced in its single-core Pentium Pro in 1995. This decision is explained by the ever-increasing demand for cores in high-performance workstations powered by Intel's Xeon processors, as well as preparing today's processors for a scenario where Intel processors exceed 255 cores, which is the maximum number of cores that current APIC technology can operate with.

Anticipating such a scenario, Intel refined APIC with x2APIC, which was introduced in the first generation (codenamed "Nehalem") of Core i7 processors presented in November 2008. This update stretched the supported core count from 255 to $2^{40}$ which is equivalent to a staggering 1,099,511,627,776 cores, far from the core count of even the most powerful mainframe or supercomputer today. Likewise, memory-mapped input/output (MMIO) technology was replaced by model specific registers (MSRs), which allow reading and writing of information contained in the processor registers through RDMSR and WRMSR instructions, as opposed to using a memory-mapped input/output system, which involved the use of a memory bus and I/O devices, providing a higher latency and lower speed path.

Intel aims to stop allowing Ring 3 access to input/output (I/O) instructions, as the operating system currently does through a special mask that calls the iopl() function, while also aiming to remove instructions such as INS, OUTS, INSB and OUTSW, which were previously useful for ordering the processor to write information to the computer's I/O ports, such as floppy disk drives or the parallel port.

Conclusion

Perhaps the ambition of the x86-S initiative was such that it was its own weight that caused it to fade away, as Intel recently revealed in a statement. At least for them, it is clear that this is not the right time for disruptive proposals, considering the very vulnerable situation they find themselves in, in which they perhaps see themselves as lacking sufficient authority to impose on the rest of the industry a new way of designing their software for x86-based systems, which constitute the vast majority of computers produced worldwide.

The idea has already been formulated, and it would be unconventional if no one dared to revitalize it in the future. At a time when x86 is seeing its dominance over consumer computers threatened by the unbeatable energy efficiency offered by ARM — apparently at no sacrifice as it seems to work as well as x86 — it is clear that Intel and AMD will redesign their processors as needed and even tinker with the current x86 architecture to justify their continued presence in the desktop market, as x86 is the reason for both companies' existence. The same does not happen with the laptop market, where it is even more difficult to continue justifying the use of x86, when the industry is aspiring to produce laptops with much greater autonomy, which guarantee the attribute that makes them desirable: portability.

Sheikyon

x86-32 is Deprecated: The Path to Architectural Simplification