How can a 32-bit CPU have an addressable memory size of 16TB?
It is important to decouple CPU “bitness” from the amount of memory the CPU can address or even use.
When a CPU is described as being n bits, that usually means that the CPU’s general-purpose registers are of the same width, which is also the word size. In other words, that’s the CPU can effectively deal with data that size natively. Notice that this has no direct relation to memory…yet.
Fundamentally, in order for a particular memory location to be accessed, the memory chip itself needs to receive a set of address bits on the appropriate address pins. Take a look at the diagram below for example: it is the pin assignment from an SRAM memory chip on which I’ve outlined all the address pins :
As long as the chip is powered, receives the relevant address bits and the appropriate control signals, then the manner in which that address was generated or handled doesn’t matter: you could potentially hook up those pins to DIP switches and use those to generate the appropriate address signals, no processor required.
One of the simplest ways to generate the address bits for a memory chip would be to pipe the contents of a processor register to those address pins over a bus. Doing so requires no extra circuitry, and is probably the the reason why people associate a processor’s “bitness” to the amount of memory it can address. Yes, this approach would net you a memory pool 4GiB wide using a 32bit processor. But it is not the only way to do it.
For starters, you could simply use two registers to contain a single address and then pipe the contents of both directly to those address pins, using either a wider address bus or doing so in two+ steps, placing the address chunks in a buffer before sending them to the memory chip. This was notably what the Z80 did combining 8bit registers when it needed to do any kind of 16bit data/address manipulation. Using this approach, a 32bit CPU can access 64bits worth of address space.
This was also the approach used in Intel’s PAE extension for X86 to an extent, where a 32bit CPU would use 64bit pointers, handled internally as two 32bit chunks as a consequence of the machine’s architecture.
But there are even more ways to do it, such as bank switching: the Commodore 64 used a 6502-based CPU (8bit data, 16bit address) capable of addressing 64KiB of memory total. With a few tricks they managed to fit 64K of RAM and the required I/O within that address space (the MMU could notably tell if the software needed to access I/O or RAM). With the introduction of the Commodore 128, Commodore needed to find a way to make the 6502 access 2^17 bytes of address space. The solution was bank switching: the 128K address space was split in two 64K chunks (“banks”) which the CPU could access as normal. Whenever the 6502 needed to read/write data in the other bank, the programmer could write to a specific control register on the C128 to swap the current bank for the other.
That approach was notably used by many old cartridge-based consoles such as the NES, enabling the use of fairly large cartridges which were normally too large to fit within the section allocated to cartridge ROM in the CPU’s address space. Again, writing to specific registers would instruct separate circuitry to swap banks in and out of the CPU’s view.
image from Wikipedia
Intel’s 8086 and 8088 took an interesting (but seemingly infuriating) approach to solving the problem, combining aspects of bank switching and the approach of splitting addresses into chunks capable of being stored in registers. Despite being a 16bit CPU architecture, this scheme allowed these processors to access a 20bit (1MiB) address space. When wanting to access memory, a programmer would first write to a 16bit segment register, loading the address of the base of a so-called segment. When multiplied by 16 (a 4 position left shift), the contents of this segment register would give the base address of a 64KiB window of sorts within the 20bit address space, which could be accessed conventionally by the 16 bit Instruction pointer/PC. It’s a lot like the bank switching scheme discussed previously except that the beginning and end addresses of a bank are not fixed and can be relocated. This is called a segmented address space.
There are potentially even more schemes to look into, but I won’t look at every one of them. And of course, this is all assuming you’re dealing with a byte-addressable processor: seeing as though a 32bit address can distinguish 2^32 different memory locations of arbitrary size, a word-addressable processor with a 4KiB word size would be able to access 16TB of memory without needing to do anything complex.