Design Decisions in the Win32Forth Kernel

The Essentials

There are, of course, certain prerequisites when designing any Forth system. Tom Zimmer and Andrew McKewan developed Win32Forth to meet a particular need that they had for a project; the details are in The History of Win32Forth. What is certainly true is that Win32Forth was never conceived as anything other than a full 32bit forth for the Windows operating system on an Intel platform; that forced certain design decisions that can still sometimes be seen in the code today. Of particular interest are some of the difficulties faced in designing a Forth system for a C-centric environment like Windows. Some of these have been overcome; some remain.

CELL size

The word size used by Forth is not necessarily the same as that of the CPU. The smallest practical Forth is a 16-bit model; i.e., one which uses 16-bit integers and 16-bit addresses. The Forth community calls this the "cell" size, since "word" refers to a Forth definition. 8-bit CPUs almost invariably support 16-bit Forths. This usually requires explicit coding of double-byte arithmetic, although some 8-bit CPUs do have a few 16-bit operations. 16-bit CPUs commonly run 16-bit Forths, although the same double- precision techniques can be used to write a 32-bit Forth on a 16- bit CPU. At least one 32-bit Forth has been written for the 8086/8088.

32-bit CPUs normally run 32-bit Forths; which is the cell size of W32F. Based on Windows, W32F will run on whatever Intel processor Windows runs on; for practical purposes nowadays, that means Pentium or better. Although W32F originally ran under WIndows3.1 using the WIN32S interface, 3.1 was a 16bit operating system on a 32bit processor, an uncomfortable fit (and incredibly unstable to boot). We haven't supported 3.1 in a long time; here's a definitive list of what is supported.

The Threading Technique

"Threaded code" is the hallmark of Forth. A Forth "thread" is just a list of addresses of routines to be executed. You can think of this as a list of subroutine calls, with the CALL instructions removed. Over the years many threading variations have been devised, and which one is best depends upon the CPU and the application. To make a decision, you need to understand how they work, and their tradeoffs.

Indirect Threaded Code (ITC)

This is the classical Forth threading technique, used in fig-Forth, F83 and Win32Forth, and described in most books on Forth. All the other threading schemes are "variations" on this, so you need to understand ITC to appreciate the others.

Let's look at the definition of a Forth word SQUARE:

    : SQUARE  DUP * ;

In a typical ITC Forth this would appear in memory as shown in Figure 1. (The yellow header will be discussed in a future article; it holds housekeeping information used for compilation, and isn't involved in threading. Brown is code; blue and cyan are CFA pointers, explained below).

Fig.1 Indirect Threaded Code

Assume SQUARE is encountered while executing some other Forth word. Forth's Interpreter Pointer (IP) will be pointing to a cell in memory -- contained within that "other" word -- which contains the address of the word SQUARE. To be precise, that cell contains the address of SQUARE's Code Field, referred to in Win32Forth as the 'cfa' (code field address) or the 'xt' (execution token). The interpreter fetches that address, and then uses it to fetch the contents of SQUARE's Code Field. These contents are yet another address -- the address of a machine language subroutine which performs the word SQUARE. In pseudo-code, this is:

(IP) -> W

fetch memory pointed by IP into "W" register; W now holds address of the Code Field (the cfa)

IP+4 -> IP

advance IP, just like a program counter (assuming 4-byte addresses in the thread)

(W) ->  X

fetch memory pointed by W into "X" register; X now holds address of the machine code

JP (X)

jump to the address in the X register

This illustrates an important but rarely-elucidated principle: the address of the Forth word just entered is kept in W. CODE words don't need this information, but all other kinds of Forth words do.

If SQUARE were written in machine code, this would be the end of the story: that bit of machine code would be executed, and then jump back to the Forth interpreter -- which, since IP was incremented, is pointing to the next word to be executed. This is why the Forth interpreter is usually called NEXT.

But, SQUARE is a high-level "colon" definition -- it holds a "thread", a list of addresses. In order to perform this definition, the Forth interpreter must be re-started at a new location: the Parameter Field of SQUARE. Of course, the interpreter's old location must be saved, to resume the "other" Forth word once SQUARE is finished. This is just like a subroutine call! The machine language action of SQUARE is simply to push the old IP, set IP to a new location, run the interpreter, and when SQUARE is done pop the IP. (As you can see, the IP is the "program counter" of high-level Forth.) In W32F, this is called DOCOL:


onto the "return address stack"

W+4 -> IP

W still points to the Code Field, so W+4 is the address of the Body


to interpreter ("NEXT")

This identical code fragment is used by all high-level (i.e., threaded) Forth definitions! That's why a pointer to this code fragment, not the fragment itself, is included in the Forth definition. Over hundreds of definitions, the savings add up! And this is why it's called Indirect Threading.

The "return from subroutine" is the word EXIT, which gets compiled when Forth sees ';'. EXIT just executes a machine language routine which does the following:


from the "return address stack"


to interpreter

Walk through a couple of nested Forth definitions, just to assure yourself that this works.

Note the characteristics of ITC: every Forth word has a one-cell Code Field. Colon definitions compile one cell for each word used in the definition. And the Forth interpreter must actually perform a double indirection to get the address of the next machine code to run (first through IP, then through W).

ITC is neither the smallest nor the fastest threading technique. It may be the simplest; although DTC (described next) is really no more complex. So why are so many Forths indirect-threaded? Mainly because previous Forths, used as models, were indirect-threaded. These days, DTC is becoming more popular.

So when should ITC be used? Of the various techniques, ITC produces the cleanest and most elegant definitions -- nothing but addresses. If you're attuned to such considerations, ITC may appeal to you. If your code fiddles around with the insides of definitions, the simplicity and uniformity of the ITC representation may enhance portability. ITC is the classical Forth model, so it may be preferred for education. Finally, on the x86 family, CALL and RET are more expensive than the double indirection.

There are other techniques; DTC and STC are the common alternatives. For details, please see Brad Rodriguez'  "Moving Forth".

Document $Id: p-arch2.htm,v 1.1 2004/12/21 00:18:55 alex_mcdonald Exp $