VFP Floating Point - single precision

Introduction

The Forth data stack and the floating point stack are separate. As with the data and return stacks, the floating point stack grows down. The floating point data storage format is IEEE 32 bit (single precision) format. The source code is in the file Cortex/VFP32SX.fth. As of January 2017, the float stack pointer is in R8 and the top of the float stack is in S8 for performance. The previous non-R8 version is in VFP32SX.old.fth. The files should also work with minimal change on ARM32 CPUs but have not been tested for these.

A separate IEEE 64 bit (double precision) floating point package is provided in Cortex/VFP64SX.fth. This is ideal for use with Cortex-M7 CPUs as well as with hosted systems such as for ARM Linux.

Note that single-precision floating point only has 23 bits in the mantissa and 8 bits in the exponent; and thus has a severely restricted dynamic range.

Compiling the code

The source code for the ARM/Cortex version of the code is in the file Cortex/VFP32SX.fth. In order to provide full initialisation in other files, the following equates must be set in the control file before any code is compiled:


  0 equ softfp?     \ true for software floating point
  1 equ hardfp?     \ true for hardware floating point
  1 equ fpstack?    \ true for a separate floating point stack,
                    \ in which case FP-SIZE must be non-zero
  $0100 equ FP-SIZE \ size of FP stack in bytes. Must not be 0.

The separate floating point stack is used when fpstack? is true and FP-SIZE is defined and is non-zero. Add the line below to compile the code.


  include %CpuDir%/VFP32SX.fth

The code must be compiled with xArmCortexHfpDev, which has support for compiling IEEE F.P. numbers.

Entering floating-point numbers

Floating point number entry is enabled by REALS and disabled by INTEGERS.

Floating-point numbers of the form 0.1234e1 are required (see FNUMBER?) during interpretation and compilation of source code. Floating-point numbers are compiled as literal numbers when in a colon definition (compiling) and placed on the stack when outside a definition (interpreting). Inside a colon definition, a floating point literal number must be preceded by F#.

: foo  ... f# 1.234e0 ...  ;

The more flexible word >FLOAT accepts numbers in two forms, 1.234 and 0.1234e1. Both words are documented later in this chapter. See also the section on Gotchas later in this chapter.

Creating and using variables

To create a variable, use FVARIABLE. FVARIABLE works in the same way as VARIABLE. For example, to create a floating-point variable called VAR1 you code:

  FVARIABLE VAR1

When VAR1 is used, it returns the address of the floating-point number.

Two words are used to access floating-point variables, F@ and F!. These are analogous to @ and !.

Creating constants

To create a floating-point constant, use FCONSTANT, which is analogous to CONSTANT. For example, to generate a floating-point constant called CON1 with a value of 1.234, you enter:

  F# 1.234e0 FCONSTANT FCON1

When FCON1 is executed, it returns 1.234 on the float stack.

Using the supplied words

The supplied words split into several groups:

The following functions only exist as target words so you cannot use them in calculations in your source code when outside a colon definition.

Calculating sines, cosines and tangents

To calculate sine, cosine and tangent, use FSIN, FCOS and FTAN respectively. Angles are expressed in radians.

Calculating arc sines, cosines and tangents

To calculate arc sine, cosine and tangent, use FASIN, FACOS

and FATAN respectively. They return an angle in radians.

Calculating logarithms

Two words are supplied to calculate logarithms, FLOG and FLN. FLOG calculates a logarithm to base 10 (decimal). FLN calculates a logarithm to base e. Both take a floating-point number in the range from 0 to Einf.

Calculating powers

Three power functions are supplied:

  FEXP F10^X X^Y

Degrees or radians

The angular measurement used in the trigonometric functions are in radians. To convert between degrees and radians use RAD>DEG or DEG>RAD. RAD>DEG converts an angle from radians to degrees. DEG>RAD converts an angle from degrees to radians.

Displaying floating-point numbers

Two words are available for displaying floating-point numbers, F. and E.. The word F. takes a floating-point number from the stack and displays it in the form xxxx.xxxxx or x.xxxxxEyy depending on the size of the number. The word E. displays the number in the latter form.

Number formats, ANS and Forth-2012

The ANS Forth standard specifies that floating point numbers must be entered in the form 1.234e5 and must contain a point '.' and 'e' or 'E'. In order to distinguish between double and floating point numbers, a floating point number must contain 'e' and an exponent.

FP Stack primitives

The floating point stack pointer is register R8. The top float is cached in register S8.

4 equ FPCELL    \ -- u
Size of a floating point number in bytes.

FPCELL constant FPCELL  \ -- u
Size of a floating point number in bytes.

FPCELL setFloatSize     \ --
Tell the cross compiler the size in memory of a floating point number.

FPCELL setFloatAlignment        \ --
Tell the cross compiler the alignment in memory of a floating point number.

: (doFloatAlignment)    \ --
Host word that performs float alignment.

: finit         \ F: i*f -- ; resets FPU and FP stack
Reset the floating point stack. Do not forget to use this in a task before using floating point.

: fdepth        \ -- #f
Floating point equivalent of DEPTH. The result is returned on the Forth data stack.

code CLZ        \ x -- u
Return the number of leading zeros in x.

: DCLZ          \ dx -- u
Return the number of leading zeros in the double dx.

code >fs        \ f -- ; F: -- f
Copy a float from the data stack to the floating point stack.

code fs>        \ F: f -- ; -- f
Copy a float from the float stack to the data stack.

code fps@       \ -- fps
Read the floating point status/control register.

code fps!       \ fps --
Set the floating point status/control register.

code exp@       \ F: f -- f ; -- exp(2)
Copy the exponent of the top float to the data stack. The IEEE exponent offset is removed.

code exp!               \ exp(2) -- ; F: f -- f'
Change/Set the exponent of the top float. The IEEE exponent offset is added.

code F!         \ F: r -- ; addr --
Stores r at addr.

code F@         \ addr -- ; F: -- r
Fetches r from addr.

synonym SF! F!          \ F: r -- ; addr --
Stores r at addr in IEEE 32 bit format.

synonym SF@ F@          \ addr -- ; F: -- r
Fetches r from addr, which contains a float in IEEE 32 bit format..

: F,            \ F: r --
Lays a real number into the dictionary, reserving FPCELL bytes.

Synonym SF, F,          \ F: r --
Lays a real number into the dictionary as an IEEE 32 bit number.

code FDUP       \ F: r -- r r
Floating point equivalent of DUP.

code FOVER      \ F: r1 r2 -- r1 r2 r1
Floating point equivalent of OVER.

code FSWAP      \ F: r1 r2 -- r2 r1
Floating point equivalent of SWAP.

code FPICK      \ u -- ; F: fu..f0 -- fu..f0 fu
Floating point equivalent of PICK.

code FROT       \ F: r1 r2 r3 -- r2 r3 r1
Floating point equivalent of ROT.

code F-ROT              \ F: r1 r2 r3 -- r3 r1 r2
Floating point equivalent of -ROT.

code FDROP      \ F: r --
Floating point equivalent of DROP.

code FNIP               \ F: r1 r2 -- r2
Floating point equivalent of NIP.

code f>r        \ F: f -- ; R: -- f
Put float onto return stack.

code fr>        \ R: f -- ; F: -- f
Pull float from the return stack.

code flit       \ F: -- f ; inline literal
Run-time routine for a floating point literal. Cortex version.

code flit       \ F: -- f ; inline literal
Run-time routine for a floating point literal. ARM32 version.

: DLSHIFT       \ d u -- d<<u
Double number left shift, with u in the range 1..64.

: DRSHIFT       \ d u -- d>>u
Double number right shift, with u in the range 1..64.

Floating point defining words

: FVARIABLE     \ "<spaces>name" -- ; Run: -- addr
Use in the form: FVARIABLE <name> to create a variable that will hold a floating point number.

: FCONSTANT     \ F: r -- ; "<spaces>name" -- ; Run: -- r
Use in the form: <float> FCONSTANT <name> to create a constant that returns a floating point number.

: FARRAY        \ "<spaces>name" fn-1..f0 n -- ; Run: i -- ; F: -- ri
Create an initialised array of floating point numbers. Use in the form:

  fn-1 .. f1 f0 n FARRAY <name>

to create an array of n floating point numbers. When the array name is executed, the index i is used to return the address of the i'th 0 zero-based element in the array. For example:

  4e0 3e0 2e0 1e0 0e0 5 FARRAY TEST

will set up an array of five elements. Note that the rightmost float (0e0) is element 0. Then i TEST will return the *\{i}th element.

: FBUFF         \ u "name" -- ; i -- addr
Creates a buffer for u floats in the current memory section. The child action is to return the address of the ith element (zero-based).

  10 fbuff foo

Creates an buffer for ten float elements.

  3 foo

Returns the address of element 3 in the buffer.

Type conversions

code FSIGN      \ F: fn -- |fn| ; -- flag ; true if negative
Return the absolute value of fn and a flag which is true if fn is negative. F.P. stack operation.

code S>F        \ n -- ; F: -- fn
Converts a single signed integer to a float.

code F>S        \ F: fn -- ; -- n
Converts a float to a single integer. Note that F>S truncates the number towards zero according to the ANS specification. If |fn| is greater than maxint, +/-maxint is returned.

: D>F           \ d -- ; F: -- fn
Converts a double integer to a float.

: f>d           \ f -- ; -- dint(f)
Converts a float to a double integer. Note that F>D truncates the number towards zero according to the ANS specification.

: FINT          \ F: f1 -- f2
Chop the number towards zero to produce a floating point representation of an integer.

Arithmetic

code FNEGATE    \ F: r1 -- r2
Floating point negate.

: ?FNEGATE      \ n -- ; F: fn -- fn|-fn
If n is negative, negate fn.

code FABS       \ F: fn -- |fn|
Floating point absolute.

code F+         \ F: r1 r2 -- r3
Floating point addition.

code F-         \ F: r1 r2 -- r3
Floating point subtraction, r3=r1-r2

code F*         \ F: r1 r2 -- r3
Floating point multiply.

code F/         \ F: r1 r2 -- r3
Floating point divide.

code fsqrt      \ F: f1 -- f2
F2=sqrt(f1).

: FSEPARATE     \ F: f1 f2 -- f3 f4
Leave the signed integer quotient f4 and remainder f3 when f1 is divided by f2. The remainder has the same sign as the dividend.

: FFRAC         \ F: f1 f2 -- f3
Leave the fractional remainder from the division f1/f2. The remainder takes the sign of the dividend.

Relational operators

code F0<        \ F: f1 -- ; -- flag
Floating point 0<.

code F0>        \ F: f1 -- ; -- flag
Floating point 0>.

code F0=                \ F: f1 -- ; -- flag
Floating point 0=.

code F0<>       \ F: f1 -- ; -- flag
Floating point 0<>.

: F=            \ F: f1 f2 -- ; -- flag
Floating point =.

: F<            \ F: r1 r2 -- ; -- flag
Floating point <.

: F>            \ F: f1 f2 -- ; -- flag
Floating point >.

: FMAX          \ F: r1 r2 -- r1|r2
Floating point MAX.

: FMIN          \ F: r1 r2 -- r1|r2
Floating point MIN.

Miscellaneous

: FALIGNED      \ addr -- f-addr
Aligns the address to accept a 4-byte float.

: FALIGN        \ --
Aligns the dictionary to accept a 4-byte float.

Synonym SFALIGNED ALIGNED       \ addr -- f-addr
Aligns the address to accept a 4-byte float.

Synonym SFALIGN ALIGN           \ --
Aligns the dictionary to accept a 4-byte float.

: FLOAT+        \ f-addr1 -- f-addr2
Increments addr by 4, the size of a float.

: FLOATS        \ n1 -- n2
Returns n2, the size of n1 floats.

Synonym SFLOAT+ 4+      \ f-addr1 -- f-addr2
Increments addr by 4, the size of an S-float.

Synonym SFLOATS CELLS   \ n1 -- n2
Returns n2, the size of n1 S-floats.

Powers of ten operations

Floating point IEEE numbers have the following approximate ranges:

As a result, the input code is different for 32 bit and 64 bit floats.

f# 1.0e-32 fconstant f%10^-32
Floating point 1.0e-32.

f# 1.0e-16 fconstant f%10^-16
Floating point 1.0e-16.

f# 0.1e0 fconstant f%.1
Floating point 0.1.

f# 1.0e0 fconstant f%1
Floating point 1.0.

f# 10.0e0 fconstant f%10
Floating point 10.0.

f# 1.0e16 fconstant f%10^16
Floating point 1.0e16.

f# 1.0e32 fconstant f%10^32
Floating point 1.0e32.

16 FARRAY POWERS-OF-10E1
An array of 16 powers of ten starting at 10^0 in steps of 1.

16 FARRAY POWERS-OF-10E-1
An array of 16 powers of ten starting at 10^0 in steps of -1.

: RAISE_POWER   \ exp(10) -- ; F: f -- f'
Raise the power in preparation for number formatting.

: SINK_FRACTION \ exp(10) -- ; F: f -- f'
Reduce the power in preparation for number formatting.

: *10^X         \  exp(10) -- ; F: f -- f'
Generate float' = float *10^dec_exp.

Floating point input

Note that number conversion takes place in PAD.

: CONVERT-EXP   \ c-addr --
If the character at c-addr is 'D' convert it to 'E'.

: CONVERT-FPCHAR        \ c-addr --
Convert the f.p. char '.' to the double char ',' for conversion.

: ALL-BLANKS?   \ c-addr len -- flag
Return true if string is all blanks (spaces).

: FCHECK        \ -- am lm ae le e-flag .-flag
Check the input string at PAD, returning the separated mantissa and exponent flags. The e-flag is returned true if the string contained an exponent indicator 'E' and the .-flag is returned true if a '.' was found.

: doMNUM        \ c-addr u -- d 2 | 0
Convert the mantissa string to a double number and 2. If conversion fails, just return 0.

: doENUM        \ c-addr u -- n 1 | 0 ; str as above
Convert the exponent string to a single number and 1. If conversion fails, just return 0.

: FIXEXP     \ dmant exp(10) -- ; F: -- f
Convert a double integer mantissa and a single integer exponent into a floating point number.

: FNUMBER?      \ addr -- 0 | n 1 | d 2 | -1 ; F: -- [f]
Behaves like the integer version of NUMBER? except that if the number is in F.P. format and BASE is decimal, a floating point conversion is attempted. If conversion is successful, the floating point number is left on the float stack and the result code is 2. This word only accepts text with an 'E' as a floating point indicator, e.g, 1.2345e0. If *\fo{BASE is not decimal all numbers are treated as integers. The integer prefixes '#','$','0x' etc. are recognised and cause integer conversion to be used.

: >FLOAT        \ c-addr u -- true|false ; F: -- [f]
Try to convert the string at c-addr/u to a floating point number. If conversion is successful, flag is returned true, and a floating number is returned on the float stack, otherwise just flag=0 is returned. This word accepts several forms, e.g. 1.2345e0, 1.2345, 12345 and converts them to a float. Note that double numbers (containing a ',') cannot be converted. Number conversion is decimal only, regardless of the current BASE.

: FLITERAL      \ Comp: F: r -- ; Run: F: -- r
Compiles a float as a literal into the current definition. At execution time, a float is returned. For example, [ f%PI 2e0 F* ] FLITERAL will compile 2PI as a floating point literal. Note that FLITERAL is immediate.

: (F#)          \ addr -- -1|0 ; F: -- [f]
The primitive for F# below.

: F#            \ F: -- [f] ; or compiles it (state smart)
If interpreting, takes text from the input stream and, if possible converts it to a f.p. number on the stack. Numbers in integer format will be converted to floating-point. If compiling, the converted number is compiled.

Floating point output

Note that, because global conversion buffers are used, the floating point output routines are not thread-safe.

8 value precision       \ -- u
Number of significant digits output.

: set-precision         \ u --
Set the number of significant digits used for output.

: exp(10)       \ F: f -- f ; -- exp[10]
Generate the power of ten corresponding to the float's power of two.

64 buffer: fopbuff      \ -- addr
Buffer in which output string is built.

32 buffer: frepbuff     \ -- addr
Buffer for use as the output of REPRESENT.

: roundfp       \ F: +f -- +f'
Add 0.5e(exp-precision-1).

: REPRESENT     \ F: r -- ; c-addr u -- n flag1 flag2
Assume that the floating number is of the form +/-0.xxxxEyy. Place the significand xxxxx at c-addr with a maximum of u digits. Return n the signed integer version of yy. Return flag1 true if f is negative, and return flag2 true if the results are valid. In this implementation all errors are handled by exceptions, and so flag2 is always true.

: append        \ c-addr u $dest --
Add the string described by C-ADDR U to the counted string at $DEST. The strings must not overlap.

: (.sign)       \ flag $out --
Add '-' or nothing to the output string.

: (.mant)       \ binp $out n --
Add the mantissa string at binp), produced by *\fo{REPRESENT, to a counted string at $out) with *\i{n digits before the decimal point.

: (.exp)        \ exp(10) $out --
Add the exponent to the output string.

: (.initfop)    \ f -- ; -- exp(10)
initialise output conversion.

: (fs.)         \ F: f -- ; -- caddr len
Produce a string containing the number in scientific notation.

: (fe.)         \ F: f -- ; -- caddr len
Produce a string containing the number in engineering notation.

: ff?           \ f: f -- f ; -- flag
Return true if the number can be represented in free format.

: (ff.)         \ F: f -- ; -- caddr len
Produce a string containing the number in free notation. If the number cannot be displayed in free notation, scientific notation is uesed.

: fs.           \ F: f --
Display f in scientific notation:

  x.xxxxxE[-]yy

: fe.           \ F: f --
Display f in engineering notation:

  x.xxxxxE[-]yy

where the mantissa is 1 <= mantissa < 1000 and the exponent is a multiple of three.

: ff.           \ F: f --
Display f in free notation:

  xxx.xxxxx

: F.            \ F: f --
Print the f.p. number in free format, xxxx.yyyy, if possible. Otherwise display using the x.xxxxEyy format.

Rounding

f# 1.0e0 fconstant f%ONE
Floating point 1.0.

f# 2.0e0 fconstant f%two
Floating point 2.0.

: 1/f           \ F: f1 -- f2
Reciprocal; f2=1/f1.

: f2/           \ F: f1 -- f2
Divide by 2.0; f2=f1/2.0.

: FLOOR         \ F: r1 -- r2
Floored round towards -infinity.

: FROUND        \ F: r1 -- r2
Round the number to nearest or even.

Trigonmetric functions

N.B. All angles are in radians.

: DEG>RAD       \ F: n1 -- n2
Convert degrees to radians.

: RAD>DEG       \ F: n1 -- n2
convert radians to degrees.

: FSIN          \ F: f1 -- f2
f2=sin(f1).

: FCOS          \ F: f1 -- f2
f2=cos(f1).

: FTAN          \ F: f1 -- f2
f2=tan(f1).

: FASIN         \ F: f1 -- f2
f2=arcsin(f1).

: FACOS         \ F: f1 -- f2
f2=arccos(f1).

: FATAN         \ F: f1 -- f2
f2=arctan(f1).

Logarithms and Powers

: FLN           \ F: f1 -- f2
Take the logarithm of f1 to base e and return the result.

: FLOG          \ F: f1 -- f2
Take the logarithm of f1 to base 10 and return the result.

: FEXP          \ F: f1 -- f2
f2=e^f1.

Synonym FE^X FEXP       \ F: f1 -- f2
Compatibility word.

: fexpm1        \ r1 -- r2
Raise e to the power r1 and subtract one, giving r2.

: F10^X         \ F: f1 -- f2
f2=10^f1

: FX^N          \ n -- ; F: fx -- fx^n
fx^n=x^n where x is a float and n is an integer.

: F**   \ F: fx fy -- fx^fy
fn=X^Y where Y and Y are both floats.

Synonym FX^Y F**        \ --
Compatibility word for old code.

COSEC SEC COTAN and hyberbolics

: fcosec        \ F: f -- cosec(f)
Floating point cosecant.

: fsec          \ F: f -- sec(f)
Floating point secant.

: fcotan        \ f: f -- cot(f)
Floating point cotangent.

: fsinh         \ F: f -- sinh(f) ; (e^x - 1/e^x)/2
Floating point hyberbolic sine.

: fcosh         \ F: f -- cosh(f) ; (e^x + 1/e^x)/2
Floating point hyberbolic cosine.

: ftanh         \ F: f -- tanh(f) ; (e^x - 1/e^x)/(e^x + 1/e^x)
Floating point hyberbolic tangent.

: fasinh        \ F: f -- asinh(f) ; ln( f+sqrt(1+f*f) )
Floating point hyberbolic arcsine.

: facosh        \ F: f -- acosh(f) ; ln( f+sqrt(f*f-1) )
Floating point hyberbolic arccosine.

: fatanh        \ F: f -- atanh(f) ; ln( (1+f)/(1-f) )/2
Floating point hyberbolic arctangent.

Plugging floats into the system

: reals         \ -- ; turn FP system on
Switch the system and NUMBER? to permit floating point input using FNUMBER?. This action can be reversed by INTEGERS.

: integers      \ -- ; turn FP system off
Switch the system and NUMBER? to restore integer-only input.

Gotchas

Floating point constants

As of January 2019, target floating point constants are defined as f%name rather than the previous %name. The change prevents name conflicts with binary integers that use a '%' prefix.

Number formats

The ANS and Forth-2012 specifications define the format of floating point numbers during text interpretation as:


Convertible string := <significand><exponent>

<significand> := [<sign>]<digits>[.<digits0>]
<exponent>    := E[<sign>]<digits0>
<sign>        := { + | - }
<digits>      := <digit><digits0>
<digits0>     := <digit>*
<digit>       := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }

This format is handled by the word FNUMBER?. The word >FLOAT accepts a more relaxed format.


Convertible string := <significand>[<exponent>]

<significand> := [<sign>]{<digits>[.<digits0>] | .<digits> }
<exponent>    := <marker><digits0>
<marker>      := {<e-form> | <sign-form>}
<e-form>      := <e-char>[<sign-form>]
<sign-form>   := { + | - }
<e-char>      := { D | d | E | e }

This restriction makes it difficult to use the text interpreter during program execution as it requires floating point numbers to contain 'D' or 'E' indicators, which is not profane practice. A quick kluge to fix this is to change FNUMBER? as below.


Replace:
  fcheck drop if                       \ valid f.p. number?
with:
  fcheck or if                         \ valid f.p. number?

Note that this change can/will cause problems if number base is not DECIMAL.

Floating point literals

Because we still have to support a variety of legacy floating point packages, plus new ones for as yet undefined CPUs, the handling of F.P. literals is far from perfect.

Our recommendation for use of floating point literals in both cross-compiled and target code, is to use F# and/or FLITERAL, e.g to compile 2.0e0, use one of the following


: foo  ... f# 2.0e0 ... ;
: foo  ... [ 2 s>f ] fliteral ... ;