This document was originally written by Andrew McKewan; the original copyright lies with him.
Win32Forth uses the MOPS metaphor for object oriented programming. That is you can define classes which are used to create objects. Classes define data which is local to each individual object it creates, and methods which are available to all the objects it creates. here is an example of a simple class;
:Class Disk <Super Object int cylinders int sectors int b/sec 32 bytes disk-name :M ClassInit: ( -- ) 0 to cylinders 0 to sectors 0 to b/sec ;M :M !Attributes: ( cyl sec b/sec -- ) to b/sec to sectors to cylinders ;M :M Attributes: ( -- cyl sec b/sec ) cylinders sectors b/sec ;M :M FreeBytes: ( -- freebytes ) cylinders sectors * b/sec * ;M ;Class
Now that we have define the class, we can create an object of the class as follows;
Disk myDisk1 1024 32 512 !Attributes: myDisk1
Here we have defined an object called "myDisk", and given this new disk the attributes of: cylinders=1024, sectors=32, b/sec=512
So ":Class" and ";Class" encompass a collection of data items and methods that define and identify an object and the way it behaves. The word "int" defines a local data item similar to a "value" that is local to each object created by a class. A second data type available for use within a class is "bytes", which was used to create a buffer to hold the disk name.
The "ClassInit:" method is special in the sense that it is automatically executed right after the object is created. So it is a perfect place to initialize an object to the common default values that are shared by all objects of this class.
Additional methods, (their names always end with a ':' so they can be identified as methods) can be defined in your class to initialize, display, calculate, or perform whatever type of operation you will need to perform on the objects of this class.
So far we have discussed static objects, objects that reside in the application dictionary and always consume space. Another way to create objects is to create them dynamically in ALLOCATEd memory. Win32Forth provides a simple way to do this as follows;
NEW> Disk ( -- a1 )
This creates an object of Class Disk , by allocating memory for the object. Forth then executes the ClassInit: method on the object, and returns a1 the address of the object on the Windows heap. The objects address can then be saved in a variable, value or array for use later in your program. To execute a method on the object, you need to put its address into a variable or value like this.
0 value DiskObj : makeObj ( -- ) NEW> Disk to DiskObj ; makeObj \ make a Disk Object FreeBytes: DiskObj \ -- freebytes
You can create as many dynamic objects as you want, within the constraints of your available memory. When you are done with an object, you can dispose of it like this;
DiskObj Dispose \ dispose of the object 0 to DiskObj \ clear the object holder
It is always a good idea to dispose of any dynamically created object once you are done using it. A special method ~: (tilde colon) is automatically executed before an object memory is released to allow you a chance to do any needed cleanup before the objects is destroyed. The ~: method is defined as a noop in Class Object, but you can define it as whatever you need when you create a new class that will have objects created dynamically.
If you should happen to forget to dispose of any of the dynamic objects you create, before your program terminates, then Forth will automatically release their allocated memory at program termination.
It is sometimes useful to be able to define a series of data objects inside a class that will be laid down next to each other in memory. This is often useful when you are trying to build a data structure that will be passed to a Windows Procedure Call. Normally Forth lays down data items in a class separated by an additional CELL that holds the class pointer for the object. This makes it easy to decompile and debug objects. If you don't mind limited debugging of a method containing the previously described continuous data structure, then you can create them in Win32Forth as follows;
Here is an example of a 'C' data structure;
typedef
struct _WIN32_FIND_DATA {
DWORD
dwFileAttributes;
FILETIME
ftCreationTime;
FILETIME
ftLastAccessTime;
FILETIME
ftLastWriteTime;
DWORD
nFileSizeHigh;
DWORD
nFileSizeLow;
DWORD
dwReserved0;
DWORD
dwReserved1;
TCHAR
cFileName[ MAX_PATH ];
TCHAR
cAlternateFileName[ 14 ];
} WIN32_FIND_DATA;
Here is the equivalent Forth Class and data structure that can be used to get
files from the Windows Directory;
:Class DirObject <super object
Record: FIND_DATA \ returns the address of the structure
int dwFileAttributes
dint ftCreationTime
dint ftLastAccessTime
dint ftLastWriteTime
int nFileSizeHigh
int nFileSizeLow
int dwReserved0
int dwReserved1
max-path bytes cFileName
14 bytes cAlternateFileName
;Record
\ Note the instance variable defining words that are used between
\ Record: and ;Record, words like; int, dint and bytes. In addition to
\ these data types, Win32Forth now supports; byte, bits and short. If
\ you look at the file WINSER.F, you will see an example of the use of
\ these additional data types. Now we will continue including the source
\ for the rest of this example.
int findHandle
:M ClassInit: ( -- ) \ init the structure
ClassInit: super
0 to dwFileAttributes
0.0 to ftCreationTime
0.0 to ftLastAccessTime
0.0 to ftLastWriteTime
0 to nFileSizeHigh
0 to nFileSizeLow
0 to dwReserved0
0 to dwReserved1
cFileName off
cAlternateFileName off
-1 to findHandle
;M
:M FindFirst: { adr len \ filename$ -- f } \ f1=TRUE if found file
max-path LocalAlloc: filename$
adr len filename$ place
filename$ +NULL
FIND_DATA rel>abs
filename$ 1+ rel>abs
Call FindFirstFile to findHandle
findHandle
;M
:M FindNext: ( -- f )
FIND_DATA rel>abs
findHandle Call FindNextFile
;M
:M FindClose: ( -- f ) \ close the find when we are all done
findHandle
Call FindClose 0=
-1 to findHandle
;M
:M ShowFile: ( -- ) \ display last found file
cFileName max-path 2dup 0 scan nip - type
;M
;Class
DirObject aFile \ make an instance of the class DirObject
: SIMPLEDIR ( -- )
s" *.F" FindFirst: aFile
if begin cr ShowFile: aFile
start/stop \ pause if key pressed
FindNext: aFile 0=
until cr
FindClose: aFile drop
then ;
This last definition when typed at the Forth command line, displays a list of the *.F files in the current directory;
NOTE: The above code can be loaded into Win32Forth, by highlighting the lines you want to load, and then press Ctrl+C (or use "Copy" in the Edit menu) to copy the text, then select the Win32Forth console window, and press CTRL+V (or use "Paste to Keyboard" in the Edit menu) to paste the text into Win32Forth. The lines will be pasted one at a time to Win32Forth, which will compile them as if you had typed them from the keyboard. After doing this, just type: SIMPLEDIR [enter] you should see the program display a list of forth files in the current directory.
This article describes the use and implementation of an object-oriented extension to Forth. The extension follows the syntax in Yerk and Mops but is implemented in ANS Standard Forth.
When I first began programming in Forth for Windows NT, I became aware of the huge amount of complexity in the environment. In looking for a way to tame this complexity, I studied the object-oriented Forth design in Yerk. Yerk is the Macintosh Forth system that was formerly marketed as a commercial product under the name Neon. It implemented an environment that allowed you to write object- oriented programs for the Macintosh.
While much of Yerk was Macintosh-specific, the underlying class/object/message ideas were quite general. I ported these to Win32Forth, a public-domain Forth system for Windows NT and Windows 95.
However, in both Yerk and Win32Forth, much of the core system is written in assembly language and is very machine-specific. Additionally, both systems modified the outer interpreter to adapt to the new syntax.
What I hope to accomplish here is to provide any ANS Forth System the ability to use the object-oriented syntax and programming style in these platform- specific systems. In doing so, I have sacrificed some performance and a few of the features.
The object-oriented model closely follows Smalltalk. I will first describe the names used in this model: Objects, Classes, Messages, Methods, Selectors, Instance Variables and Inheritance.
This example of a Point class illustrates the basic syntax used to define a class:
:Class Point <Super Object Var x Var y :M Get: ( -- x y ) Get: x Get: y ;M :M Put: ( x y -- ) Put: y Put: x ;M :M Print: ( -- ) Get: self SWAP ." x = " . ." y = " . ;M :M ClassInit: 1 Put: x 2 Put: y ;M ;Class
The class Point inherits from the class Object. Object is the root of all classes and defines some common behaviour (such as getting the address of an object or getting its class) but does not have any instance variables. All classes must inherit from a superclass.
Next we define two instance variables, x and y. Both of these are instances of class Var. Var is a basic cell-sized class similar to a Forth variable. It has methods Get: and Put: to fetch and store its data.
The Get: and Put: methods of class Point access its data as a pair of integers. They are implemented by sending Get: and Put: messages to the instance variables. Print: prints out the x and y coordinates.
ClassInit: is a special initialization method. Whenever an object is created, the system sends it a ClassInit: message. This allows the object to perform any initialization functions. Here we initialize the variables x and y to a preset value. Whenever a point is created, it will be initialized to these values. This is similar to a constructor in C++.
Not all classes need a ClassInit: method. If a class does not define the ClassInit: method, there is one in class Object that does nothing.
Now we have defined the Point class, let's create a point:
Point myPoint
As you can see, Point is a defining word. It creates a Forth definition called "myPoint." Let's see what it contains:
Print: myPoint
This should print the text "x = 1 y = 2" on the screen. You can see that the new point has been initialized with the ClassInit: message.
Now we can modify myPoint and we should see the new value:
3 4 Put: myPoint Print: myPoint
Notice that in the definition of Point, we created two instance variables of class Var. The object defining words are "class smart" and will create instance variables if used inside a class and global objects if used outside of a class.
In the definition of Print: we used the phrase "Get: self." Here we are sending the Get: message to ourselves. Self is a name that refers to the current object. The compiler will compile a call to Point's Get: method. Similarly, we could have defined ClassInit: like this:
:M ClassInit: 1 2 Put: self ;M
This is a common factoring technique in Forth and is equally applicable here.
Let's say we wanted an object like myPoint, but that printed itself in a different format.
:Class NewPoint <Super Point:M Print: ( -- ) Get: self SWAP 0 .R ." @" . ;M;Class
A subclass inherits all of the instance variables of its superclass, and can add new instance variables and methods of its own, or override methods defined in the superclass. Now lets try it out:
NewPoint myNewPointPrint: myNewPoint
This will print "1@2" which is the Smalltalk way of printing points. We have changed the Print: method but have inherited all of the other behaviour of a Point.
In some cases, we do not want to replace a method but just add something to it. Here's a class that always prints its value on a new line:
:Class CrPoint <Super NewPoint:M Print: ( -- ) CR Print: super ;M;ClassCrPoint myCrPoint Print: myCrPoint
When we use the phrase "Print: super" we are telling the compiler to send the print message that was defined in our superclass.
Class Point had two named instance variables, "x" and "y." The type and number of named instance variables is fixed when the class is defined. Objects may also contain indexed instance variables. These are accessed via a zero-based index. Each object may define a different number of indexed index variables. The size of each variable is defined in the class header by the word <Indexed .
:Class Array <Super Object CELL <IndexedAt: ( index -- value ) (At) ;M To: ( value index -- ) (To) ;M;Class
We have declared that an Array will have indexed instance variables that are each CELL bytes wide. To define an array, put the number of elements before the class name:
10 Array myArray
This will define an Array with 10 elements, numbered from 0 to 9. We can access the array data with the At: and To: methods:
4 At: myArray .64 2 To: myArray
Indexed instance variables allow the creation of arrays, lists and other collections.
In these examples, you may have been thinking "all of this message sending must be taking a lot of time." In order to execute a method, an object must look up the message in its class, and then its superclass, until it is found.
But if the class of the object is known at compile time, the compiler does the lookup then and compiles the execution token of the method. This is called "early binding." There is still some overhead with calling a method, but it is quite small. In all of the code we have seen so far, the compiler will do early binding.
There are cases when you do want the lookup to occur at runtime. This is called "late binding." An example of this is when you have a Forth variable that will contain a pointer to an object, yet the class of the object is not known until runtime. The syntax for this is:
VARIABLE objPtr myPoint objPtr !Print: [ objPtr @ ]
The expression within the brackets must produce an object address. The compiler recognized the brackets and will do the message lookup at runtime.
(Don't worry, I haven't redefined "[" or "]". When a message selector recognizes the left bracket, it uses PARSE and EVALUATE to compile the intermediate code and then compiles a late-bound message send. This also works in interpret state.)
(Dave Boulton called this "promiscuous binding.") Class binding is an optimization that allows us to get the performance of early binding when we have object pointers or objects that are passed on the stack. If we use a selector with a class name, the compiler will early bind the method, assuming that an object of that class is on the stack. So if we write a word to print a point like this,
: .Point ( aPoint -- ) Print: Point ;objPtr @ .Point
it will early bind the call. If you pass anything other that a Point, you will not get the expected result (It will print the first two cells of the object, no matter what they are). This is an optimization technique that should be used with care until a program is fully debugged.
If a system has dynamic memory allocation, the programmer may want to create objects on the heap at runtime. This may be the case, for instance, if the programmer does not know how many objects will be created by the user of the application.
The syntax for creating an object on the heap is:
Heap> Point objPtr !
Heap> will return the address of the new point, which can be kept on the stack or stored in a variable. To release the point and free its memory, we use:
objPtr @ Release
Before the memory is freed, the object will receive a Release: message. It can then do any cleanup necessary (like releasing other instance variables). This is similar to a C++ destructor.
The address of the current object is stored in the value ^base . (In a native system, this would be a good use for a processor register.)
The only time you can use ^base is inside of a method. Whenever a method is called, ^base is saved and loaded with the address of the object being sent the message. When the method exits, ^base is restored.
All offsets and sizes are in Forth cells.
Offset Size Name Description 0 8 MFA Method dictionary (8-way hashed list) 8 1 IFA Linked-list of instance variables 9 1 DFA Data length of named instance variables 10 1 XFA Width of indexed instance variables 11 1 SFA Superclass pointer 12 1 TAG Class tag field 13 1 USR User-defined field
The first 8 cells are an 8-way hashed list of methods. Three bits from the method selector are used to determine which list the method may be in. This cuts down search time for late-bound messages.
The IFA field is a linked list of named instance variables. The last two entries in this list are always "self" and "super."
The DFA field contains the length of the named instance variables for an object.
The XFA field actually serves a dual role. For classes with indexed instance variables it contains the width of each element. For non-indexed classes this field is usually zero. A special value of -1 is a flag for general classes (see below).
The TAG field contains a special value that helps the compiler determine if a structure really represents a class. In native implementations, a unique code field is used to identify classes, but this is not available in ANS Forth.
The USR field is not used by the compiler but is reserved for a programmer's use. In the future I may extend this concept of "class variables" to allow adding to the class structure. This field is used in a Windows implementation to store a list of window messages the class will respond to.
Offset Size Description 0 1 Pointer to object's class 1 DFA Named instance variable data DFA+1 1 Number of indexed instance variables (if indexed) DFA+2 ? Indexed instance variables (if indexed)
The first field of a global or heap-based object is a pointer to the object's class. This allows us to do late binding. Normally, the class field is not stored for an instance variable. This saves space and is not usually needed because the compiler knows the class of the instance variable and the instance variable is not visible outside of the class definition. For indexed classes, the class pointer is always stored because the class contains information needed to locate the indexed data. Also, the programmer may mark a class as "general" so that the class pointer is always stored. This is needed in cases where the object sends itself late-bound messages (i.e. msg: [ self ]).
When an object executes, it returns the address of the first named instance variable. This is what we refer to when we mean the "object address." This field contains the named instance variable data. Since instance variables are themselves objects, this structure can be nested indefinitely.
Objects with indexed instance variables have two more fields. The indexed header contains the number of indexed instance variables. The width of the indexed variables is stored in the class structure which is why we must always store a class pointer for indexed objects.
Following the indexed header is the indexed data. The size of this area is the product of the indexed width and the number of elements. There are primitives defined to access this data area.
Offset Size Name Description 0 1 link points to link of next ivar in chain 1 1 name hash value of name 2 1 class pointer to class 3 1 offset offset in object to start of ivar data 4 1 #elem number of elements (indexed ivars only)
The link field points to the next instance variable in the class. The head of this list is the IFA field in the class. When a new class is created, all the class fields are copied from the superclass and so the new class starts with all of the instance variables and methods from the superclass.
The name field is a hash value computed from the name of the instance variable. This could be stored as a string with a space and compile-time penalty. But with a good 32-bit hash function collisions are not common. In any event, the compiler will abort if you use a name that collides with a previous name. You can rename your instance variable or improve the hash function.
Following the name is a pointer to the class of the instance variable. The compiler will always early-bind messages sent to instance variables.
The offset field contains the offset of this instance variable within the object. When sending a message to an object, this offset is added to the current object address.
If the instance variable is indexed, the number of elements is stored next. This field is not used for non-indexed classes.
Unlike objects, instance variables are not names in the Forth dictionary. Correspondingly, you cannot execute them to get their address. You can only send them messages. If you need an address, you can use the Addr: method defined in class Object.
Methods are stored in an 8-way linked-list from the MFA field. Each method is identified by a 32-bit selector which is the parameter field address of the message selector.
Offset Size Description 0 1 Link to next method 1 1 Selector 2 1 Method execution token
The code for a method is created with the Forth word :NONAME . In this implementation it contains no special prolog or epilog code. When the method executes, the current object will be in ^base . Method execution is done by the following word that saves the current object pointer and loads it from the stack, calls the method, and then restores the object pointer.
: EXECUTE-METHOD ( ^obj xt -- ) ^base >R SWAP TO ^base EXECUTE R> TO ^base ;
When a method is compiled into a definition, the object and execution token are compiled as literals followed by EXECUTE-METHOD .
This represents the overhead for calling a method over a normal colon definition. (This was one of the concessions I made to ANS Forth. In the native versions, a fast code word at the start and end of a method performed a similar action, making the overhead negligible.)
When a message is sent to an instance variable, the method execution token and variable offset are compiled as a literals followed by EXECUTE-IVAR
: EXECUTE-IVAR ( xt offset -- ) ^base >R ^base + TO ^base EXECUTE R> TO ^base ;
An optimization is made if the offset is zero (for messages to self and super and the first named instance variable). Since we do not need to change ^base we just compile the execution token directly.
In the Yerk implementation, the interpreter was changed (by vectoring FIND ) so that it automatically recognized words ending in ":" as a message to an object. It computed a hash value from the message name and used this as the selector. This kept the dictionary small.
In ANS Forth, there is no way to modify the interpreter (short of writing a new one). It has also been argued whether this is a "good thing" anyway.
In this implementation, messages selectors are immediate Forth words. They are created automatically the first time they are used in a method definition. Since they are unique words, we use the parameter field of the word as the selector.
When the selector executes it compiles or executes code to send a message to the object that follows. If used inside a class, it first looks to see if the word is one of the named instance variables. If not, it sees if it is a valid object. Lastly it sees if it is a class name and does class binding.
Yerk also allowed sending messages to values and local variables and automatically compiled late-bound calls. In ANS Forth, we cannot tell anything about these words from their execution token, so this feature is not implemented. We can achieve the same effect by using explicit late binding:
Message: [ aValue ]
When an object is created, it must be initialized. The memory for the object is cleared to zero and the class pointer and indexed header are set up. Then each of the named instance variables is initialized.
This is done with the recursive word ITRAV . It takes the address of an instance variable structure and an offset and follows the chain, initializing each of the named instance variables in the class and sending it a ClassInit: message. As it goes it recursively initializes that instance variable's instance variables, and so on.
Finally, the object is sent a ClassInit: message. This same process is followed when an object is created from the heap.
I have implemented some simple classes to serve as a basis for your own class library. These classes have similar names and methods to the predefined classes in Yerk and Mops. The code for the class implementation and sample classes is available from the Fig ftp site.
For me, the primary benefit of using objects is in managing complexity. Objects are little bundles of data that understand and act on messages sent by other parts of the program. By keeping the implementation details inside the object, it appears simpler to the rest of the program. Inheritance can help reduce the amount of code you have to write. If a mature class library is available, you can often find needed functionality already there.
If the Forth community could agree on a object-oriented model, we could begin to assemble an object-oriented Forth library similar to the Forth Scientific Library project headed by Skip Carter, code and tools that all Forth programmers can share. That project had not been possible before the ANS standardization of floating-point in Forth.
Unfortunately, there are many different ways to add objects to Forth. Just look at the number of articles on object-oriented programming that have appeared in Forth Dimensions over the past ten years. Because Forth is so easy (and fun) to modify and extend, everybody ends up doing it their own (different) way.
Andrew McKewan
Document $Id: p-objects.htm,v 1.2 2006/03/13 14:16:49 georgeahubert Exp $