Software Floating Point (32SX)

Introduction

Software floating-point is supplied with the cross-compiler and the target Forth. The target floating point wordset is not ANS or Forth-2012 compliant, but satisfies the needs of embedded systems without undue complexity. The Forth data stack and the floating point stack are the same (combined) in Common/SoftFP32CX.fth and separate in Common/SoftFP32SX.fth. The floating point data storage format is not IEEE, but is optimised for performance on small controllers. If you need a separate floating point stack or IEEE double format storage, please contact MPE. Any variations in the implementation will be documented in the target specific section of the manual.

The cross-compiler has a more limited floating-point support than the target. Some words are available during compilation of colon definitions, but not while interpreting.

Source code

The source code is in two sets of files, one for 32 bit Forth targets, the other for 16 bit targets. The files are:


  Common\SoftFP32SX current 32 bit code for separate stacks
  Common\SoftFP32CX current 32 bit code for a combined stack
  Common\sfp32hi    old 32 bit primitives
  Common\sfp32com   old 32 bit high level code
  Common\sfp16hi    16 bit primitives
  Common\sfp16com   16 bit high level code

The SoftFP32xx files use no assembler definitions. Some targets have code versions of the primitives, and these will be found in the CPU-specific code directory. An increase in performance can be obtained by using the code files.

Entering floating-point numbers

Floating point number entry is enabled by REALS and disabled by INTEGERS.

Floating-point numbers of the form 0.1234e1 are required (see FNUMBER?) during interpretation and compilation of source code. Floating-point numbers are compiled as literal numbers when in a colon definition (compiling) and placed on the stack when outside a definition (interpreting). Inside a colon definition, a floating point literal number must be preceded by F#.

: foo  ... f# 1.234e0 ...  ;

The more flexible word >FLOAT accepts numbers in two forms, 1.234 and 0.1234e1. Both words are documented later in this chapter. See also the section on Gotchas later in this chapter.

Note also that by default, MPE Forths use ',' as the double number indicator - it makes life much easier for Europeans.

The form of floating-point numbers

A floating-point number is placed on the Forth data stack. In the Forth literature, this is referred to as a combined floating point and data stack. For 32 bit targets, a floating point number consists of two 32-bit numbers, one for the mantissa and one for the exponent. For 16 bit targets, it consists of a 32-bit double mantissa and a single 16-bit exponent. The mantissa is normalised. The exponent is on the top of the stack. Note that for 16 bit targets, number conversion is affected by the cross-compiler directives HOST-MATH and TARGET-MATH. HOST-MATH leaves double numbers and floats in 32-bit form, whereas TARGET-MATH leaves them in 16-bit form.

Creating and using variables

To create a variable, use FVARIABLE. FVARIABLE works in the same way as VARIABLE. For example, to create a floating-point variable called VAR1 you code:

  FVARIABLE VAR1

When VAR1 is used, it returns the address of the floating-point number.

Two words are used to access floating-point variables, F@ and F!. These are analogous to @ and !.

Creating constants

To create a floating-point constant, use FCONSTANT, which is analogous to CONSTANT. For example, to generate a floating-point constant called CON1 with a value of 1.234, you enter:

  1.234e0 FCONSTANT CON1

When CON1 is executed, it returns 1.234 on the Forth stack.

Using the supplied words

The supplied words split into several groups:

The following functions only exist as target words so you cannot use them in calculations in your source code when outside a colon definition.

Calculating sines, cosines and tangents

To calculate sine, cosine and tangent, use FSIN, FCOS and FTAN respectively. Angles are expressed in radians.

Calculating arc sines, cosines and tangents

To calculate arc sine, cosine and tangent, use FASIN, FACOS

and FATAN respectively. They return an angle in radians.

Calculating logarithms

Two words are supplied to calculate logarithms, FLOG and FLN. FLOG calculates a logarithm to base 10 (decimal). FLN calculates a logarithm to base e. Both take a floating-point number in the range from 0 to Einf.

Calculating powers

Three power functions are supplied:

  FE^X F10^X X^Y

Degrees or radians

The angular measurement used in the trigonometric functions are in radians. To convert between degrees and radians use RAD>DEG or DEG>RAD. RAD>DEG converts an angle from radians to degrees. DEG>RAD converts an angle from degrees to radians.

Displaying floating-point numbers

Two words are available for displaying floating-point numbers, F. and E.. The word F. takes a floating-point number from the stack and displays it in the form xxxx.xxxxx or x.xxxxxEyy depending on the size of the number. The word E. displays the number in the latter form.

Number formats, ANS and Forth200x

The ANS Forth standard specifies that floating point numbers must be entered in the form 1.234e5 and must contain a point '.' and 'e' or 'E', and that double integers are terminated by a point '.'.

This situation prevents the use of the standard conversion words in international applications because of the interchangable use of the '.' and ',' characters in numbers. Because of this, the cross-compiler's host VFX Forth uses two four-byte arrays, FP-CHAR and DP-CHAR, to hold the characters used as the floating point and double integer indicator characters. By default, FP-CHAR is initialised to '.' and DP-CHAR is initialised to to ',' and '.'. For strict ANS compliance, you should set them as follows before CROSS-COMPILE is run.


\ ANS standard setting
  char . dp-char !
  char . fp-char !
: ans-floats    \ -- ; for strict ANS compliance
  [char] . dp-char !
  [char] . fp-char !
;
\ MPE defaults
  char , dp-char !
  char . dp-char 1+ c!
  char . fp-char !
: mpe-floats    \ -- ; for existing and most legacy code
  [char] , dp-char !
  [char] . dp-char 1+ c!
  [char] . fp-char !
;
\ Legacy defaults, including ProForth
  char , dp-char !
  char . fp-char !
: legacy-floats \ -- ; for legacy code
  [char] , dp-char !
  [char] . fp-char !
;

You can of course set these arrays to hold any values which suit your application's language and locale. Note that integer conversion is always attempted before floating point conversion. This means that if the FP-CHAR and DP-CHAR arrays contain the same character, floating point numbers must contain 'e' or 'E'. If the arrays are all different, a number containing the FP-CHAR will be successfully converted as a floating point number, even if it does not contain 'e' or 'E'.

Glossary

Separators

Before July 2010, the floating point separator, '.', was fixed. To ease internationalisation, it is now variable.

variable fp-char        \ -- addr
Holds up to four character(s) to be treated as floating point indicators. Set to '.' for ANS compatibility. Note that this should be accessed as a one to four byte array. The first character is used as the point character for output.

0 equ SepArray? \ -- flag
If the equate is non-zero, fp-char is treated as a four byte array, otherwise as a one byte array. This is a flag for future expansion.

: isSep?        \ char addr -- flag
Return true if char is one of the four bytes at addr. If less than than four bytes are needed, a zero byte acts as a terminator. Used when SepArray? is true.

: isSep?  c@ =  ;
A compiler macro used when SepArray? is false.

FP Stack primitives

8 equ FPCELL    \ -- u
Size of a floating point number.

: finit         \ F: i*f -- ; resets FPU and FP stack
Reset the floating point stack. Do not forget to use this in a task before using floating point.

: fdepth        \ -- #f
Floating point equivalent of DEPTH. The result is returned on the Forth data stack.

: fs@           \ -- f ; F: f -- f
Copy the top of the floating point stack to the data stack.

: fs>           \ F: f -- ; -- f
Move the top of the floating point stack to the data stack.

: fs!           \ f1 -- ; F: f2 -- f1
Move a float from the data stack and overwrite the top of the float stack.

: >fs           \ f -- ; F: -- f
Move a float from the data stack to a new position on the float stack.

: exp@          \ F: f -- f ; -- exp
Copy the exponent of the top float to the data stack.

: exp!          \ exp -- ; F: f -- f'
Change/Set the exponent of the top float.

: mant@         \ F: f -- f ; -- mant
Copy the mantissa of the top float to the data stack.

: mant!         \ F: f -- f ; -- mant
Change/Set the mantissa of the top float.

High Level primitives

The software floating point pack requires several support primitives. High level versions are provided in SFP16HI.FTH and SFP32HI.FTH for 16 and 32 bit targets. Some targets have coded versions in the CPU directory and these will provide much better performance. The support file should be compiled before the common file.

In SoftFP32CX.fth and SoftFP32SX.fth a set of high-level primitives are compiled if the primitives have not yet been supplied.

: S->           \ n1 carry-in-flag --- n2 carry-out-flag
Perform a right shift, applying the carry in to the m.s. bit and returning the carry out as 1 or 0.

: <-S           \ n1 carry-in-flag --- n2 carry-out-flag
Perform a left shift, applying the carry in to the l.s. bit and returning the carry out as 1 or 0.

: d<<1          \ xd -- xd<<1
One bit double left shift.

: d>>1          \ xd -- xd>>1
One bit double right logical shift.

: D>>N          \ d n -- d>>n
N bit double right logical shift.

: rshiftx       \ x u -- x' ; right shift
Used for right shifts that may exceed 31 bits. In the ANS and Forth 2012 standards, this is an ambiguous condition. We need shifts over 31 bits to return 0. On x86 targets, a check is made for shifts over 31 bits.

Basic stack and memory operators

: F!            \ F: r -- ; addr --
Stores r at addr

: F@            \ addr -- ; F: -- r
Fetches r from addr.

: F,            \ F: r --
Lays a real number into the dictionary, reserving 8 bytes.

: FDUP          \ F: r -- r r
Floating point equivalent of DUP.

: FOVER         \ F: r1 r2 -- r1 r2 r1
Floating point equivalent of OVER.

: FSWAP         \ F: r1 r2 -- r2 r1
Floating point equivalent of SWAP.

: FPICK         \ F: fu..f0 u -- fu..f0 fu
Floating point equivalent of PICK.

: FROT          \ F: r1 r2 r3 -- r2 r3 r1
Floating point equivalent of ROT.

: F-ROT         \ F: r1 r2 r3 -- r3 r1 r2
Floating point equivalent of -ROT.

: FROLL         \ F: f1 f2 f3 --  f2 f3 f1
Floating point equivalent of ROLL.

: FDROP         \ F: r --
Floating point equivalent of DROP.

: FNIP          \ F: r1 r2 -- r2
Floating point equivalent of NIP.

Floating point defining words

: FVARIABLE     \ "<spaces>name" -- ; Run: -- f-addr
Use in the form: FVARIABLE <name> to create a variable that will hold a floating point number.

: FCONSTANT     \ r "<spaces>name" -- ; Run: -- r
Use in the form: <float> FCONSTANT <name> to create a constant that returns a floating point number.

: FARRAY        \ "<spaces>name" fn-1..f0 n -- ; Run: i -- ri
Create an initialised array of floating point numbers. Use in the form:

  fn-1 .. f1 f0 n FARRAY <name>

to create an array of n floating point numbers. When the array name is executed, the index i is used to return the address of the i'th 0 zero-based element in the array. For example:

  4e0 3e0 2e0 1e0 0e0 5 FARRAY TEST

will set up an array of five elements. Note that the rightmost float (0e0) is element 0. Then i TEST will return the *\{i}th element. If you create this array in IDATA, restore CDATA afterwards.

: FBUFF         \ u "name" -- ; i -- addr
Creates a buffer for u floats in the current memory section. The child action is to return the address of the ith element (zero-based).

  10 fbuff foo

Creates an buffer for ten float elements in the current memory section.

  3 foo

Returns the address of element 3 in the buffer.

The default section is CDATA, and we recommend that you leave it that way! To create a ten element array in UDATA space, you can use:


udata
10 fbuff MyFloats
cdata

Type conversions

: NORM          \ n exp -- f
Normalise a single integer and a single exponent to produce a floating point number on the data stack. INTERNAL.

: DNORM         \ d exp -- fn ; normalise a 64 bit double
Normalise a double integer and a single exponent to produce a floating point number on the data stack. INTERNAL.

: (FSIGN)       \ fn -- |fn| flag ; true if negative
Return the absolute value of fn and a flag which is true if fn is negative. Data stack operation.

: FSIGN         \ F: fn -- |fn| ; -- flag ; true if negative
Return the absolute value of fn and a flag which is true if fn is negative. F.P. stack operation.

: S>F           \ n -- ; F: -- fn
Converts a single integer to a float.

: D>F           \ d -- ; F: -- fn
Converts a double integer to a float.

: F>S           \ F: fn -- ; -- n
Converts a float to a single integer. Note that F>S truncates the number towards zero according to the ANS specification. If |fn| is greater than maxint, +/-maxint is returned.

: F>D           \ F: fn -- ; -- d
Converts a float to a double integer. Note that F>D truncates the number towards zero according to the ANS specification. If |fn| is greater than dmaxint, +/-dmaxint is returned.

: FINT          \ F: f1 -- f2
Chop the number towards zero to produce a floating point representation of an integer.

Arithmetic

: FNEGATE       \ F: r1 -- r2
Floating point negate.

: ?FNEGATE      \ n -- ; F: fn -- fn|-fn
If n is negative, negate fn.

: FABS          \ F: fn -- |fn|
Floating point absolute.

: F*            \ F: r1 r2 -- r3
Floating point multiply.

: F/            \ F: r1 r2 -- r3
Floating point divide.

: F+            \ F: r1 r2 -- r3
Floating point addition.

: F-            \ F: r1 r2 -- r3
Floating point subtraction, r3=r1-r2

: FSEPARATE     \ F: f1 f2 -- f3 f4
Leave the signed integer quotient f4 and remainder f3 when f1 is divided by f2. The remainder has the same sign as the dividend.

: FFRAC         \ f1 f2 -- f3
Leave the fractional remainder from the division f1/f2. The remainder takes the sign of the dividend.

Relational operators

: F0<           \ F: f1 -- ; -- flag
Floating point 0<.

: F0>           \ F: f1 -- ; -- flag
Floating point 0>.

: F0=           \ F: f1 -- ; -- flag
Floating point 0=.

: F0<>          \ F: f1 -- ; -- flag
Floating point 0<>.

: F=            \ F: f1 f2 -- ; -- flag
Floating point =.

: F<            \ F: r1 r2 -- ; -- flag
Floating point <.

: F>            \ F: f1 f2 -- ; -- flag
Floating point >.

: FMAX          \ F: r1 r2 -- r1|r2
Floating point MAX.

: FMIN          \ F: r1 r2 -- r1|r2
Floating point MIN.

Miscellaneous

: FALIGNED      \ addr -- f-addr
Aligns the address to accept an 8-byte float.

: FALIGN        \ --
Aligns the dictionary to accept an 8-byte float.

: FLOAT+        \ f-addr1 -- f-addr2
Increments addr by 8, the size of a float.

: FLOATS        \ n1 -- n2
Returns n2, the size of n1 floats.

Powers of ten operations

1 s>f 10 s>f f/ fconstant %.1
Floating point 0.1.

1 s>f fconstant %1
Floating point 1.0.

10 s>f fconstant %10
Floating point 10.0.

1250000000 34 fconstant %10^10
Floating point 10^10.

1844674407 -33 fconstant %10^-10
Floating point 10^-10.

F# 1.0E256 FCONSTANT %10^256
Floating point 10^256.

F# 1.0E-1 FCONSTANT %10E-1
Floating point 10^-1.

F# 1.0E-10 FCONSTANT %10E-10
Floating point 10^-10.

F# 1.0E-256 FCONSTANT %10^-256
Floating point 10^-256.

16 FARRAY POWERS-OF-10E1
An array of 16 powers of ten starting at 10^0 in steps of 1.

17 FARRAY POWERS-OF-10E16
An array of 17 powers of ten starting at 10^0 in steps of 16.

16 FARRAY POWERS-OF-10E-1
An array of 16 powers of ten starting at 10^0 in steps of -1.

17 FARRAY POWERS-OF-10E-16
An array of 17 powers of ten starting at 10^0 in steps of -16.

: RAISE_POWER   \ exp(10) -- ; F: f -- f'
Raise the power in preparation for number formatting.

: SINK_FRACTION \ exp(10) -- ; F: f -- f'
Reduce the power in preparation for number formatting.

: *10^X         \  exp(10) -- ; F: f -- f'
Generate float' = float *10^dec_exp. INTERNAL.

Floating point input

Note that number conversion takes place in PAD.

: FLITERAL      \ Comp: F: r -- ; Run: F: -- r
Compiles a float as a literal into the current definition. At execution time, a float is returned. For example, [ %PI F2* ] FLITERAL will compile 2PI as a floating point literal. Note that FLITERAL is immediate.

: CONVERT-EXP   \ c-addr --
If the character at c-addr is 'D' convert it to 'E'. INTERNAL.

: CONVERT-FPCHAR        \ c-addr --
Convert the f.p. char '.' to the double char ',' for conversion. INTERNAL.

: ALL-BLANKS?   \ c-addr len -- flag
Return true if string is all blanks (spaces). INTERNAL.

: FCHECK        \ -- am lm ae le e-flag .-flag
Check the input string at PAD, returning the separated mantissa and exponent flags. The e-flag is returned true if the string contained an exponent indicator 'E' and the .-flag is returned true if a '.' was found. INTERNAL.

: doMNUM        \ c-addr u -- d 2 | 0
Convert the mantissa string to a double number and 2. If conversion fails, just return 0. INTERNAL.

: doENUM        \ c-addr u -- n 1 | 0 ; str as above
Convert the exponent string to a single number and 1. If conversion fails, just return 0. INTERNAL.

: FIXEXP     \ dmant exp(10) -- ; F: -- f
Convert a double integer mantissa and a single integer exponent into a floating point number. INTERNAL.

: FNUMBER?      \ addr --  0/.../mant exp 2
Behaves like the integer version of NUMBER? except that if the number is in F.P. format and BASE is decimal, a floating point conversion is attempted. If conversion is successful, the floating point number is left on the float stack and the result code is 2. This word only accepts text with an 'E' as a floating point indicator, e.g, 1.2345e0. If *\fo{BASE is not decimal all numbers are treated as integers. The integer prefixes '#','$','0x' etc. are recognised and cause integer conversion to be used.

: >FLOAT        \ c-addr u -- true|false ; F: -- [f]
Try to convert the string at c-addr/u to a floating point number. If conversion is successful, flag is returned true, and a floating number is returned on the float stack, otherwise just flag=0 is returned. This word accepts several forms, e.g. 1.2345e0, 1.2345, 12345 and converts them to a float. Note that double numbers (containing a ',') cannot be converted. Number conversion is decimal only, regardless of the current BASE.

: (F#)          \ addr -- 2|0 ; F: -- [f]
The primitive for F# and F#IN below.

: F#IN          \ -- 2|0 ; F: -- [f]
Attempts to convert a token from the input stream to a floating-point number. Numbers in integer format will be converted to floating-point. An indicator (0 or 2/3) is returned in the same way as an indicator is returned by FNUMBER?.

: F#            \ F: -- [f] ; or compiles it [ state smart ]
If interpreting, takes text from the input stream and, if possible converts it to a f.p. number on the stack. Numbers in integer format will be converted to floating-point. If compiling, the converted number is compiled.

: REALS         \ -- ; allow f.p input
Switch NUMBER? to permit floating point input using FNUMBER?. This action can be reversed by INTEGERS. Both REALS and INTEGERS are in the FORTH vocabulary.

: INTEGERS      \ -- ; no f.p input
Switch NUMBER? to restore integer only input.

Floating point output

variable places  8 places !     \ -- addr
Number of digits output after the decimal point.

: ROUND         \ F: f1 -- f2
Rounds least significant eight bits to 0 if higher 2 bits are all 0s or all 1s.

: ?10PWR        \ F: f -- f ; -- exp[10]
Generate the power of ten corresponding to the float's power of two.

: SIGFIGS       \ F: f -- f' ; n -- dec_exponent
Scale f and generate a decimal exponent corresponding to n significant digits.

: op-prepare    \ F: fn -- ; -- d exp(10) sign
From fn, generate a double number corresponding to 8 significant digits, a decimal exponent and a sign indicator (nz=negative). INTERNAL.

: .EXP          \ exp --
Display the exponent. INTERNAL.

: N#            \ d n -- d'
Convert n digits. INTERNAL.

: .FPsign       \ flag --
If flag is non-zero, generate a '-' otherwise a space.

: .FPsep        \ --
Issue the FP separator, usually '.'.

: E.            \ F: f --
Print the f.p. number on the stack in exponential form, x.xxxxxEyy.

: REPRESENT     \ F: r -- ; c-addr u -- n flag1 flag2
Assume that the floating number is of the form +/-0.xxxxEyy. Place the significand xxxxx at c-addr with a maximum of u digits. Return n the signed integer version of yy. Return flag1 true if f is negative, and return flag2 true if the results are valid. In this implementation all errors are handled by exceptions, and so flag2 is always true.

: F.            \ F: f --
Print the f.p. number in free format, xxxx.yyyy, if possible. Otherwise display using the x.xxxxEyy format.

Rounding

f# 1.0 fconstant %ONE
Floating point 1.0.

: FLOOR         \ r1 -- r2
Floored round towards -infinity.

: FROUND        \ r1 -- r2
Round the number to nearest or even.

Trigonmetric functions

N.B. All angles are in radians.

: DEG>RAD       \ F: n1 -- n2
Convert degrees to radians.

: RAD>DEG       \ F: n1 -- n2
convert radians to degrees.

: FSQR          \ F: f1 -- f2 ; FSQR by Heron's formula
F2=sqrt(f1) by Heron's formula.

: FSIN          \ F: f1 -- f2
f2=sin(f1).

: FCOS          \ F: f1 -- f2
f2=cos(f1).

: FTAN          \ f1 -- f2
f2=tan(f1).

: FASIN         \ F: f1 -- f2
f2=arcsin(f1).

: FACOS         \ F: f1 -- f2
f2=arccos(f1).

: FATAN         \ F: f1 -- f2
f2=arctan(f1).

Logarithmic and Power functions

: FLN           \ F: f1 -- f2
Take the logarithm of f1 to base e and return the result.

: FLOG          \ F: f1 -- f2
Take the logarithm of f1 to base 10 and return the result.

: FE^X          \ F: f1 -- f2
f2=e^f1.

: F10^X         \ F: f1 -- f2
f2=10^f1

: FX^N          \ x-real n-integer -- fx^n
fx^n=x^n where x is a float and n is an integer.

: FX^Y          \ F: fx fy -- fx^fy
fn=X^Y where Y and Y are both floats.

IEEE format conversion

: FP>IEEE       \ F: fp -- ; -- ieee32
Convert native FP value to IEEE 32 bit format.

: IEEE>FP       \ ieee32 -- ; F: -- fp
Convert IEEE 32 bit float to native format.

Gotchas

The ANS and Forth200x specifications define the format of floating point numbers during text interpretation as:


Convertible string := <significand><exponent>

<significand> := [<sign>]<digits>[.<digits0>]
<exponent>    := E[<sign>]<digits0>
<sign>        := { + | - }
<digits>      := <digit><digits0>
<digits0>     := <digit>*
<digit>       := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }

This format is handled by the word FNUMBER?. The word >FLOAT accepts a more relaxed format.


Convertible string := <significand>[<exponent>]

<significand> := [<sign>]{<digits>[.<digits0>] | .<digits> }
<exponent>    := <marker><digits0>
<marker>      := {<e-form> | <sign-form>}
<e-form>      := <e-char>[<sign-form>]
<sign-form>   := { + | - }
<e-char>      := { D | d | E | e }

This restriction makes it difficult to use the text interpreter during program execution as it requires floating point numbers to contain 'D' or 'E' indicators, which is not profane practice. A quick kluge to fix this is to change FNUMBER? as below.


Replace:
  fcheck drop if                       \ valid f.p. number?
with:
  fcheck or if                         \ valid f.p. number?

Note that this change can/will cause problems if number base is not DECIMAL.