Design Decisions in the Win32Forth Kernel

Register Allocation

Next to the threading technique, the usage of the CPU's registers is the most crucial design decision. It's probably the most difficult. The availability of CPU registers can determine what threading technique can be used and how efficient the Forth will be. 

The Classical Forth Registers

The classical Forth model has five "virtual registers." These are abstract entities which are used in the primitive operations of Forth. NEXT, ENTER (called DOCOL in Win32Forth), and EXIT were defined earlier in terms of these abstract registers.

Each of these is one cell wide -- i.e., in a 32-bit Forth, these are the 32-bit registers. These may not all be CPU registers. I'll describe them in the order of their importance; i.e., the bottom of this list are the best candidates to be stored in memory.

W is the Working register. It is used for many things, including memory reference, so it should be an address register; i.e., you must be able to fetch and store memory using the contents of W as the address. You also need to be able to do arithmetic on W. (In DTC Forths, you must also be able to jump indirect using W.) W is used by the interpreter in every Forth word. In a CPU having only one register, you would use it for W and keep everything else in memory (and the system would be incredibly slow). Win32Forth use the EAX register for W.

IP is the Interpreter Pointer. This is used by every Forth word (through NEXT, ENTER, or EXIT). IP must be an address register. You also need to be able to increment IP. In Win32Forth, for reasons closely associated with the LODSD instruction and the use of EAX as W, this is the ESI register. LODSD is no longer used in Win32Forth; it's too slow.

PSP is the Parameter Stack (or "data stack") Pointer, sometimes called simply SP. I prefer PSP because SP is frequently the name of a CPU register, and they shouldn't be confused. Most CODE words use this. PSP must be a stack pointer, or an address register which can be incremented and decremented. It's also a plus if you can do indexed addressing from PSP. In Win32Forth, this is the ESP register. It's convenient to use this register as data is often passed to Windows. Windows uses the Pascal calling convention where parameters are passed on the stack, and the callee is responsible for clearing the stack before returning. Using ESP reduces the amount of data shuffling required.

RSP is the Return Stack Pointer, sometimes called simply RP. This is used by colon definitions in ITC a. RSP must be a stack pointer, or an address register which can be incremented and decremented. Win32Forth uses the EBP register.

UP is the User Pointer, holding the base address of the task's user area. UP is usually added to an offset, and used by high-level Forth code, so it can be just stored somewhere. But if the CPU can do indexed addressing from the UP register, CODE words can more easily and quickly access user variables. If you have a surplus of address registers, use one for UP. Single-task Forths don't need UP. Win32Forth supports multi-tasking, and uses the EDX register to address the User area.

X is a working register, not considered one of the "classical" Forth registers, even though the classical ITC Forths need it for the second indirection. In ITC you must be able to jump indirect using X. X may also be used by a few CODE words to do arithmetic and such. Sometimes another working register, Y, is also defined. In Win32Forth, the EAX register and the ECX register are available as working registers. (Why EAX is available for this task when it's used for W and why the EDI register isn't used at all will be explained later.)

Use of the Hardware Stack

Most CPUs have a stack pointer as part of their hardware, used by interrupts and subroutine calls. How does this map into the Forth registers? Should it be the PSP or the RSP?

The short answer is, it depends. It is said that the PSP is used more than the RSP in ITC and DTC Forths. If your CPU has few address registers, and PUSH and POP are faster than explicit reference, use the hardware stack as the Parameter Stack. On the other hand, if your CPU is rich in addressing modes -- and allows indexed addressing -- there's a plus in having the PSP as a general-purpose address register. In this case, use the hardware stack as the Return Stack.

The decision is simple in Win32Forth; there are so many Windows calls it makes sense to keep the data on the stack that Windows will use to avoid shifting data from one stack to another for a call. Windows parameters are pushed on the stack (it uses the Pascal calling convention) and that's addressed by register ESP. So the PSP is ESP.

Top-Of-Stack in Register

Forth's performance can be improved considerably by keeping the top element of the Parameter Stack in a register. Many Forth words (such as 0=) then don't need to use the stack. Other words still do the same number of pushes and pops, only in a different place in the code. Only a few Forth words (DROP and 2DROP) become more complicated, since you can no longer simply adjust the stack pointer -- you have to update the TOS register as well.

There are a few rules when writing CODE words:

A word which removes items from the stack must pop the "new" TOS into its register.

A word which adds items to the stack must push the "old" TOS onto the stack (unless, of course, it's consumed by the word).

What about buffering two stack elements in registers? When you keep the top of stack in a register, the total number of operations performed remains essentially the same. A push remains a push, regardless of whether it is before or after the operation you're performing. On the other hand, buffering two stack elements in registers adds a large number of instructions -- a push becomes a push followed by a move. Only dedicated Forth processors like the RTX2000 and fantastically clever optimizing compilers can benefit from buffering two stack elements in registers.

Win32Forth keeps TOS in the EBX register. (Register EBX is a historical hangover; in 16 bit Forths on 8086 machines, AX couldn't be used for indexed addressing modes. BX was a natural alternative, and this became EBX in Win32Forth.)

Document $Id: p-arch3.htm,v 1.1 2004/12/21 00:18:55 alex_mcdonald Exp $