Win32Forth

Floating point words in Win32Forth

Win32Forth implements the full ANSI floating-point and floating-point extension wordsets as well as a number of useful extra words. It uses a separate floating-point stack (implemented in the USER area for task safety).

The floating-point words can be compiled as 8 byte (for speed) or 10 byte (for accuracy). The default when the system is built is 8 byte, but can be set to 10 byte (in src\extend.f) by altering the CONSTANT B/FLOAT and re-extending the system (using setup.exe). If the CONSTANT is not defined then the file automatically creates it and compiles the code for 10 byte floats.

The only error that is thrown is for FP stack Underflow (error code -45); arithmetic operations which produce values too large to be represented use infinity, while indeterminate results produce NANs.

Glossary

Loading and saving FPU registers
Memory Access
FP Stack operations
FP Stack operations on pairs of entries
FP Constants
FP Variables
Rounding functions
Integer to float conversion
FP Comparison operators
Arithmetic operators
Trigonometric functions
Inverse Trigonometric functions
Logarithmic functions
Exponential functions
Hyperbolic functions
Inverse hyperbolic functions
Input of Floating Point numbers
Output conversion
Format FP number to a buffer
Display FP numbers
Debugging tools

Loading and saving FPU registers

The following words are for examining, saving, restoring and changing the state of the x87 FPU. They are not normally needed by applications although they can be useful for dealing with legacy code, which requires different rounding modes, precision, or exception handling.
Since the default error handler resets the control word then applications that use other settings will need to CATCH all exceptions or modify the error handler.
NOTE if programs unmask exceptions then they need to handle their own errors. For information on the settings and writng exception handlers refer to the INTEL processor documentation.

WARNING! do not alter the settings unless you know what you're doing.

code >fregs     ( addr -- )             \ W32F         Floating extra

Restore x87 FPU State.

code >fregs>    ( addr -- )             \ W32F        Floating extra

Save and Restore x87 FPU State.

code fpcw>      ( -- n )                \ W32F        Floating extra

Get x87 FPU Control Word.

code >fpcw      ( n -- )                \ W32F        Floating extra

Set x87 FPU Control Word.

code fpsw>      ( -- n )                \ W32F        Floating extra

Get x87 FPU Status Word.

 10 constant B/FLOAT  ( -- n )          \ W32F         Floating extra

Number of bytes in a floating-point number. Note the default is 8 bytes.

value cells/float

Number of cells in a floating-point number. If the number of bytes is not a multiple of 4 this is rounded up.

cell NEWUSER FLOATSP ( -- addr )                    \ W32F             Floating extra

Address of floating point stack pointer in the user area.

code finit      ( -- )                  \ W32F    Floating extra

Clears the floating-point stack & sets the appropriate byte mode. It is executed by the system on start-up and by the default exception handler. Users generally don't need to call this word in a single-task program. Tasks in a multi-task program should execute this word before executing any other floating-point words.

Memory Access

code F@         ( addr -- ; fs: -- r )  \ ANSI      Floating

Fetch a float.

code SF@        ( addr -- ; fs: -- r )  \ ANSI       Floating ext

Fetch a 32 bit (short) float.

code DF@        ( addr -- ; fs: -- r )  \ ANSI       Floating ext

Fetch a 64 bit (double) float.

code F!         ( addr -- ; fs: r -- )  \ ANSI        Floating

Store a float.

code SF!        ( addr -- ; fs: r -- )  \ ANSI        Floating ext

Store a 32 bit (short) float.

code DF!        ( addr -- ; fs: r -- )  \ ANSI        Floating ext

Store a 64 bit (double) float.

code F+!        ( addr -- ; fs: r -- )  \ W32F       Floating extra

Add the value to a float.

: F,            ( fs: r -- )     \ W32F        Floating extra

Compile a float into the dictionary.

: FVARIABLE     ( compiling -<name>-  -- ; run-time -- addr) \ ANSI        Floating

Define a floating-point variable in the dictionary. The contents are undefined.

: FVALUE  ( compiling -<name>- -- ; fs: r -- ; run-time FS: -- r  ) \ W32F    Floating extra

Define a floating point value initialised from the FP stack.

: FTO                                   \ W32F               Floating extra

Interpretation: ( -<fvalue>- -- fs: r -- )
Compilation: ( -<fvalue>- -- Run-time: FS: r -- )

Store r into -<fvalue>-. If -<fvalue>- is not defined with fvalue then memory may be corrupted; no checks are made so the user should take care. FTO should not be POSTPONEd.

: FCONSTANT      ( -<name>- ; fs: r -- )      \ ANSI               Floating

      Interpretation: ( -<name>- ; fs: r -- )
Define an FP constant.
      Compilation:
Append the run-time semantics given below to the current definition.
      Run-time: ( fs: -- r )
Place r on the floating-point stack.

: FLITERAL      ( Compilation fs: r -- ; Runtime fs: -- r )  \ ANSI         Floating

         Interpretation:
Interpretation semantics for this word are undefined.
         Compilation: ( fs: r -- )
Append the run-time semantics given below to the current definition.
         Run-time: ( fs: -- r )
Place r on the floating-point stack.

FP Stack operations

code FDROP      ( fs: r -- )                \ ANSI           Floating

Remove r from the floating-point stack.

code FDUP       ( fs: r -- r r )             \ ANSI           Floating

Duplicate the top entry on the floating-point stack.

code FSWAP      ( fs: r1 r2 -- r2 r1 )            \ ANSI          Floating

Exchange the top 2 FP numbers.

code FOVER      ( fs: r1 r2 -- r1 r2 r1 )         \ ANSI           Floating

Copy the 2nd FP stack number to the top of the FP stack.

code FROT       ( fs: r1 r2 r3 -- r2 r3 r1 )       \ ANSI          Floating

Rotate the top 3 FP stack numbers.

code FPICK      ( n -- ; fs: -- r )              \ W32F         Floating extra

Copy the n'th number from the FP stack.

: FNIP          ( fs: r1 r2 -- r2 )                \ W32F          Floating extra

Remove the 2nd FP stack entry.

FP Stack operations on pairs of entries

The following words can be used for pairs of FP numbers and are useful for dealing with complex numbers or 2-dimensional vectors on the FP stack.

code F2DROP     ( fs: r1 r2 -- )                   \ W32F          Floating extra

Remove the top 2 FP stack entries.

: F2DUP         ( fs: r1 r2 -- r1 r2 r1 r2 )       \ W32F          Floating extra

Duplicate the top 2 FP stack entries.

: F2SWAP        ( fs: r1 r2 r3 r4 -- r3 r4 r1 r2 )          \ W32F     Floating extra

Swap the top pair of floating-point numbers with the second pair.

: F2NIP         ( fs: r1 r2 r3 r4 -- r3 r4 )        \ W32F          Floating extra

Remove the 2nd pair of FP stack entries.

FP Constants

code fpi        ( fs: -- r )      \ W32F           Floating extra

Push the value 3.141596... on to the FP stack.

code f0.0       ( fs: -- r )       \ W32F          Floating extra

Push plus zero on to the FP stack.

code f1.0       ( fs: -- r )       \ W32F          Floating extra

Push the value 1.0 on to the FP stack.

code fL2t       ( fs: -- r )       \ W32F          Floating extra

Push the value of log base 2 of 10.

code fL2e       ( fs: -- r )       \ W32F          Floating extra

Push the value of log base 2 of e.

code fLog2      ( fs: -- r )       \ W32F          Floating extra

Push the value of log base 10 of 2.

code fLn2       ( fs: -- r )       \ W32F          Floating extra

Push the value of ln 2 (the natural logarithm).

             fconstant finf  ( fs: -- r ) \ W32F             Floating extra

Push plus infinity.

2e0          fconstant f2.0  ( fs: -- r ) \ W32F             Floating extra

Push floating-point 2.0.

10e0         fconstant f10.0 ( fs: -- r ) \ W32F             Floating extra

Push floating-point 10.0.

5e-1         fconstant f0.5  ( fs: -- r ) \ W32F             Floating extra

Push floating-point 0.5.

       f0.0 fconstant fbig   ( fs: -- r ) \ W32F             Floating extra

Push the largest non-infinite floating-point number.

       f0.0 fconstant feps   ( fs: -- r ) \ W32F             Floating extra

Push the smallest non-zero floating-point number.

       f1.0 fconstant fsmall ( fs: -- r ) \ W32F             Floating extra

Push the smallest non-denormalised floating-point number.

FP Variables

        fvariable a2**63     ( -- addr )  \ W32F             Floating extra

Return the address of a float containing 2**63.

        fvariable sq2m1      ( -- addr )  \ W32F             Floating extra

Return the address of a float containing sqrt(2) - 1.

        fvariable sq2/2m1    ( -- addr )  \ W32F             Floating extra

Return the address of a float containing sqrt(2)/2 - 1.

Rounding functions

code FLOOR      ( fs: r1 --  r2 )         \ ANSI         Floating

Round r1 to an integral value using the round toward negative infinity rule, giving r2.

code FCEIL      ( fs: r1 -- r2 )          \ W32F         Floating extra

Round r1 to an integral value using the round toward positive infinity rule, giving r2.

code FTRUNC     ( fs: r1 -- r2 )          \ W32F         Floating extra

Round r1 to an integral value using the round toward zero rule, giving r2.

code FROUND     ( fs: r1 -- r2 )          \ ANSI         Floating

Round r1 to an integral value using the round to nearest rule, giving r2.

Integer to float conversion

code D>F        ( d -- ; Fs: -- r )       \ ANSI           Floating

Convert double number to floating-point number.

code F>D        ( -- d ; fs: r -- )       \ ANSI           Floating

Convert floating-point number to double number, by rounding towards zero. If the result would be too large to fit in a double number then
-9223372036854775808 is returned.

code ZF>D       ( -- d ; fs: r -- )       \ W32F           Floating extra

Convert floating-point number to double number, using the current rounding mode (rounding towards nearest unless changed by the user). If the result would be too large to fit in a double number then
-9223372036854775808 is returned.

: s>f           ( n -- ; fs: -- r )       \ W32F            Floating extra

Convert the single number n to floating point number r.

: f>s           ( -- n ; fs: r -- )       \ W32F           Floating extra

Convert the floating point number r to single number n.

code FS>DS      ( -- dfloat fs: r -- )    \ W32F           Floating extra

Move floating point number bits to the data stack as a 64-bit float. This function is for passing floats to DLLs.

code SFS>DS     ( -- float ; fs: r -- )   \ W32F           Floating extra

Push the top of the float stack onto the data stack as a 32-bit float. This function is for passing floats to DLLs.

FP Comparison operators

: F0=           ( -- f ; fs: r -- )       \ ANSI           Floating

Return true if r equals ±0e0. Returns false for NAN.

: F0<           ( -- f ; fs: r -- )       \ ANSI           Floating

Return true if r is less than ±0e0. Returns false for NAN.

: f0>           ( -- f ; fs: r -- )       \ W32F           Floating extra

Return true if r is greater than ±0e0. Returns false for NAN.

: f=            ( -- f ; fs: r1 r2 -- )   \ W32F           Floating extra

Return true if r1 equals r2. Returns false if either number is a NAN.

: F<            ( -- f ; fs: r1 r2 -- )   \ ANSI           Floating

Return true if r1 is less than r2. Returns false if either number is a NAN.

: f>            ( -- f ; fs: r1 r2 -- )   \ W32F           Floating extra

Return true if r1 is greater than r2. Returns false if either number is a NAN.

: f<=           ( -- f ; fs: r1 r2 -- )   \ W32F           Floating extra

Return true if r1 is less than or equal to r2. Returns true if either number is a NAN.

: f>=           ( -- f ; fs: r1 r2 -- )   \ W32F           Floating extra

Return true if r1 is greater than or equal to r2. Returns true if either number is a NAN.

: FMAX          ( fs: r1 r2 -- r3 )       \ ANSI           Floating

Return r3 the maximum of r1 and r2. If r1 is a NAN then so is r3. If r2 is a NAN then r3=r1.

: FMIN          ( fs: r1 r2 -- r3 )       \ ANSI           Floating

Return r3 the minimum of r1 and r2. If r1 is a NAN then so is r3. If r2 is a NAN then r3=r1.

Arithmetic operators

code F+         ( fs: r1 r2 -- r3 )       \ ANSI          Floating

Add r1 to r2.

code F-         ( fs: r1 r2 -- r3 )       \ ANSI          Floating

Subtract r2 from r1.

code F*         ( fs: r1 r2 -- r3 )       \ ANSI          Floating

Multiply r1 by r2.

code F/         ( fs: r1 r2 -- r3 )       \ ANSI         Floating

Divide r1 by r2.

code FNEGATE    ( fs: r1 -- r2 )          \ ANSI          Floating

Reverse the sign of r1.

: 1/f           ( fs: r1 -- r2 )          \ W32F        Floating extra

r2 is the reciprocal of r1.

code f2*        ( fs: r1 -- r2 )          \ W32F        Floating extra

Multiply by 2.

code f2/        ( fs: r1 -- r2 )          \ W32F        Floating extra

Divide by 2.

code FABS       ( fs: r1 -- r2 )          \ ANSI        Floating ext

r2 is the absolute value of r1.

code FSQRT      ( fs: r1 -- r2 )          \ ANSI        Floating ext

r2 is the positive square root of r1. r2 is NAN for negative r1.

: F~            ( -- flag ; fs: r1 r2 r3 -- )  \ ANSI        Floating ext

If r3 is positive, flag is true if the absolute value of (r1 minus r2) is less than r3. If r3 is zero, flag is true if the implementation-dependent encoding of r1 and r2 are exactly identical (positive and negative zero are unequal). If r3 is negative, flag is true if the absolute value of (r1 minus r2) is less than the absolute value of r3 times the sum of the absolute values of r1 and r2.

This provides the three types of floating point equality in common use -- close in absolute terms, exact equality as represented, and relatively close.

Trigonometric functions

: FSIN          ( fs: r1 -- r2 )          \ ANSI         Floating ext

r2 is the sine of r1 in radians.

: FCOS          ( fs: r1 -- r2 )          \ ANSI         Floating ext

r2 is the cosine of r1 in radians.

: FSINCOS       ( fs: r1 -- r2 r3 )       \ ANSI         Floating ext

r2 is the sine and r3 the cosine of r1 in radians. This function is more efficient than calling FSIN and FCOS separately.

: FTAN          ( fs: r1 -- r2 )          \ ANSI         Floating ext

r2 is the tangent of r1 in radians.

Inverse Trigonometric functions

code FASIN      ( fs: r1 -- r2 )          \ ANSI         Floating ext

r2 is the radian angle whose sine is r1. The result for |x| =< 1 is between ±pi/2. The result for |x| > 1 is NAN.

code FACOS      ( fs: r1 -- r2 )          \ ANSI         Floating ext

r2 is the radian angle whose cosine is r1. The result for |x| =< 1 is between 0 and pi. The result for |x| > 1 is NAN

code FATAN      ( fs: r1 -- r2 )          \ ANSI          Floating ext

r2 is the radian angle whose tangent is r1. The result is between ±pi/2.

code FATAN2     ( fs: r1 r2 -- r3 )       \ ANSI         Floating ext

r3 is the radian angle whose tangent is r1/r2. The result is between ±pi with the same sign as r2. If r1 and r2 are both zero then r3 is ±zero. This function can be used to convert cartesian coordinates into the angle of the polar coordinates.

Logarithmic functions

code FLN        ( fs: r1 -- r2 )          \ ANSI            Floating ext

r2 is the natural logarithm of r1. If r1 is ±0 then r2 is -infinity. If r1 is infinity then r2 is infinity. If r1 is less than zero then r2 is a NAN.

code FLNP1      ( fs: r1 -- r2 )          \ ANSI           Floating ext

r2 is the natural logarithm of the quantity r1 plus one. If r1 is -1.0 then r2 is -infinity. If r1 is infinity then r2 is infinity. If r1 is less than -1.0 then r2 is a NAN.

code FLOG       ( fs: r1 -- r2 )          \ ANSI       Floating ext

r2 is the logarithm to base 10 of r1. If r1 is ±0 then r2 is -infinity. If r1 is infinity then r2 is infinity. If r1 is less than zero then r2 is a NAN.

Exponential functions

code FEXP       ( fs: r1 -- r2 )          \ ANSI       Floating ext

Raise e to the power r1, giving r2.

code FEXPM1     ( fs: r1 -- r2 )          \ ANSI      Floating ext

Raise e to the power r1 and subtract one, giving r2.

This function allows accurate computation when its arguments are close to zero, and provides a useful base for the standard exponential functions. Hyperbolic functions such as cosh(x) can be efficiently and accurately implemented by using FEXPM1; accuracy is lost in this function for small values of x if the word FEXP is used.

: f**           ( fs: r1 r2 -- r3 )       \ ANSI            Floating ext

Raise r1 to the power r2, giving the product r3.

: FALOG         ( fs: r1 -- r2 )          \ ANSI           Floating ext

Raise ten to the power r1, giving r2.

Hyperbolic functions

: FSINH         ( fs: r1 -- r2 )          \ ANSI             Floating ext

r2 is the hyperbolic sine of r1.

: FCOSH         ( fs: r1 -- r2 )          \ ANSI             Floating ext

r2 is the hyperbolic cosine of r1.

: FTANH         ( fs: r1 -- r2 )          \ ANSI             Floating ext

r2 is the hyperbolic tangent of r1, |r2| <= 1.

Inverse hyperbolic functions

code FASINH     ( fs: r1 -- r2 )          \ ANSI             Floating ext

r2 is the number whose hyperbolic sine is r1.

code FACOSH     ( fs: r1 -- r2 )          \ ANSI            Floating ext

r2 is the number whose hyperbolic cosine is r1. If r1 < 1.0 then r2 is a NAN.

: FATANH        ( fs: r1 -- r2 )          \ ANSI             Floating ext

r2 is the number whose hyperbolic tangent is r1. IF |r1| > 1.0 then r2 is a NAN.

Input of Floating Point numbers

: >FLOAT        ( addr len -- f ; fs: -- r | <nothing> ) \ ANSI           Floating

An attempt is made to convert the string specified by c-addr and u to internal floating-point representation. If the string represents a valid floating-point number in the syntax below, its value r and true are returned. If the string does not represent a valid floating-point number only false is returned.
A string of blanks is treated as a special case representing zero.

The syntax of a convertible string := <significand>[<exponent>]

<significand> := [<sign>]{<digits>[.<digits0>] |
.<digits> }
<exponent>    := <marker><digits0>
<marker>      := {<e-form> | <sign-form>}
<e-form>      := <e-char>[<sign-form>]
<sign-form>   := { + | - }
<e-char>      := { D | d | E | e }

: f#    ( Interpretation: "fp no." -- ; fs: -- r )    \ W32F             Floating extra

( Compilation: "fp no." -- ; run-time: fs: -- r )

An attempt is made to convert the space delimited string following F# to internal floating-point representation. If the string represents a valid floating-point number in the syntax below, its value r is returned. If the string does not represent a valid floating-point number an error is thrown.
F# used at the end of a line is treated as a special case representing zero.
If interpreting the FP number is placed on the FP stack, while it is compiled as an Fliteral if compiling.
The syntax of a convertible string is the same as >FLOAT .

Output conversion

: REPRESENT     ( addr u -- n flag1 flag2 ; fs: r -- )  \ ANSI          Floating

At c-addr, place the character-string external representation of the significand of the floating-point number r. Return the decimal-base exponent as n, the sign as flag1 and valid result as flag2. The character string shall consist of the u most significant digits of the significand represented as a decimal fraction with the implied decimal point to the left of the first digit, and the first digit zero only if all digits are zero. The significand is rounded to u digits following the round to nearest rule; n is adjusted, if necessary, to correspond to the rounded magnitude of the significand. If flag2 is true then r was in the implementation-defined range of floating-point numbers. If flag1 is true then r is negative.
An ambiguous condition exists if the value of BASE is not decimal ten.
When flag2 is false, n is 7FFFFFFF and flag1 is the sign. The contents of c-addr are the first u characters of either NAN or Infinity, padded with spaces if necessary.

: PRECISION     ( -- u )     \ ANSI         Floating ext

Return the number of significant digits currently used by (F.), (FE.), (FS.), F., FE., or FS. as u.

: SET-PRECISION ( u -- )     \ ANSI         Floating ext

Set the number of significant digits currently used by (F.), (FE.), (FS.), F., FE., or FS. to u.

: min-precision ( u -- )     \ W32F         Floating extra

Set the number of significant digits currently used by (F.), (FE.), (FS.), F., FE., or FS. to u if it is greater than the present setting.

Format FP number to a buffer

The following words are for formatting floating point numbers as counted strings in the buffer whose address is supplied so they can be used for purposes other than printing the numbers to the console. The string is not null terminated.

: (F.)          ( addr -- ; fs: r -- )     \ W32F         Floating extra

Format the top number on the floating-point stack using fixed-point notation:

       [-] <digits>.<digits0>

: (FE.)         ( addr -- ; fs: r -- )     \ W32F        Floating extra

Format r as a string in engineering notation.

: (FS.)          ( addr -- ; fs: r -- )     \ W32F        Floating extra

Format r as a string in scientific notation:

       <significand><exponent>

where:

       <significand>  :=  [-]<digit>.<digits0>
       <exponent>     :=  E[-]<digits>

SYNONYM (E.) (FS.) ( addr -- ; fs: r -- )     \ W32F        Floating extra

See above.

: (G.)          ( addr -- ; fs: r -- )     \ W32F           Floating extra

Format r as a string using scientific notation or ordinary representation according to the size of r.

Display FP numbers

: F.            ( fs: r -- )     \ ANSI            Floating ext

Display, with a trailing space, the top number on the floating-point stack using fixed-point notation:

       [-] <digits>.<digits0>

: FE.           ( fs: r -- )     \ ANSI            Floating ext

Display, with a trailing space, the top number on the floating-point stack using engineering notation, where the significand is greater than or equal to 1.0 and less than 1000.0 and the decimal exponent is a multiple of three.

: FS.           ( fs: r -- )     \ ANSI            Floating ext

Display, with a trailing space, the top number on the floating-point stack in scientific notation:

       <significand><exponent>

where:

       <significand>  :=  [-]<digit>.<digits0>
       <exponent>     :=  E[-]<digits>

SYNONYM E. FS.  ( fs: r -- )     \ W32F            Floating extra

See above.

: G.            ( fs: r -- )     \ W32F            Floating extra

Display the top number on the floating-point stack using scientific notation or ordinary representation according to the size of r.

Debugging tools

: f.s           ( -- )   \ W32F          Floating debug

Display floating point stack.

: .fdepth       ( -- )   \ W32F          Floating debug

Display depth of floating point stack.

: fdump         ( -- )      \ W32F        Floating debug

Dump of the real Floating Point Unit.

Handling Errors

If you reset the FPU control word to use other than the default rounding or precision values then you may need to modify the default error handling. In version V6.10 or higher this can be done by defining your own handler, MY-RESET-STACKS then adding it in thus;

reset-stacks-chain chain-add my-reset-stacks

Ealier versions need to add

: new-reset-stacks [ defer@ reset-stacks compile, ] my-reset-stacks ;
new-reset-stacks is reset-stacks

You can test for the presence of NANs with;

  .... fdup f= 0= ....

which returns true only for NANs. You can test for both NANs and infinities with;

  .... fdup f- f0= 0=

and for infinities with;

  .... fabs finf f=

Document $Id: p-float.htm,v 1.23 2007/05/26 10:24:11 dbu_de Exp $