The Forth data stack and the floating point stack are separate. As with the data and return stacks, the floating point stack grows down. The floating point data storage format is IEEE 32 bit (single precision) format. The source code is in the file Cortex/VFP32SX.fth. As of January 2017, the float stack pointer is in R8 and the top of the float stack is in S8 for performance. The previous non-R8 version is in VFP32SX.old.fth. The files should also work with minimal change on ARM32 CPUs but have not been tested for these.
A separate IEEE 64 bit (double precision) floating point package is provided in Cortex/VFP64SX.fth. This is ideal for use with Cortex-M7 CPUs as well as with hosted systems such as for ARM Linux.
Note that single-precision floating point only has 23 bits in the mantissa and 8 bits in the exponent; and thus has a severely restricted dynamic range.
The source code for the ARM/Cortex version of the code is in the file Cortex/VFP32SX.fth. In order to provide full initialisation in other files, the following equates must be set in the control file before any code is compiled:
0 equ softfp? \ true for software floating point
1 equ hardfp? \ true for hardware floating point
1 equ fpstack? \ true for a separate floating point stack,
\ in which case FP-SIZE must be non-zero
$0100 equ FP-SIZE \ size of FP stack in bytes. Must not be 0.
The separate floating point stack is used when fpstack?
is true and FP-SIZE
is defined and is non-zero. Add
the line below to compile the code.
include %CpuDir%/VFP32SX.fth
The code must be compiled with xArmCortexHfpDev, which has support for compiling IEEE F.P. numbers.
Floating point number entry is enabled by REALS
and
disabled by INTEGERS
.
Floating-point numbers of the form 0.1234e1 are required
(see FNUMBER?
) during interpretation and compilation
of source code. Floating-point numbers are compiled as
literal numbers when in a colon definition (compiling) and
placed on the stack when outside a definition (interpreting).
Inside a colon definition, a floating point literal number
must be preceded by F#
.
: foo ... f# 1.234e0 ... ;
The more flexible word >FLOAT
accepts numbers in two
forms, 1.234 and 0.1234e1. Both words are documented later
in this chapter. See also the section on Gotchas later
in this chapter.
To create a variable, use FVARIABLE
. FVARIABLE
works in the
same way as VARIABLE
. For example, to create a floating-point
variable called VAR1
you code:
FVARIABLE VAR1
When VAR1
is used, it returns the address of the floating-point
number.
Two words are used to access floating-point variables,
F@
and F!
. These are analogous to @
and
!
.
To create a floating-point constant, use FCONSTANT
, which
is analogous to CONSTANT
. For example, to generate a
floating-point constant called CON1
with a value of 1.234,
you enter:
F# 1.234e0 FCONSTANT FCON1
When FCON1
is executed, it returns 1.234 on the float
stack.
The supplied words split into several groups:
The following functions only exist as target words so you cannot use them in calculations in your source code when outside a colon definition.
To calculate sine, cosine and tangent, use FSIN
, FCOS
and
FTAN
respectively. Angles are expressed in radians.
To calculate arc sine, cosine and tangent, use FASIN
, FACOS
and FATAN
respectively. They return an angle in radians.
Two words are supplied to calculate logarithms, FLOG
and FLN
.
FLOG
calculates a logarithm to base 10 (decimal).
FLN
calculates a logarithm to base e. Both take a
floating-point number in the range from 0 to Einf.
Three power functions are supplied:
FEXP F10^X X^Y
The angular measurement used in the trigonometric functions
are in radians. To convert between degrees and radians use
RAD>DEG
or DEG>RAD
. RAD>DEG
converts an angle from radians
to degrees. DEG>RAD
converts an angle from degrees to radians.
Two words are available for displaying floating-point numbers,
F.
and E.
. The word F.
takes a floating-point
number from the stack and displays it in the form xxxx.xxxxx
or x.xxxxxEyy depending on the size of the number. The word
E.
displays the number in the latter form.
The ANS Forth standard specifies that floating point numbers must be entered in the form 1.234e5 and must contain a point '.' and 'e' or 'E'. In order to distinguish between double and floating point numbers, a floating point number must contain 'e' and an exponent.
The floating point stack pointer is register R8. The top float is cached in register S8.
4 equ FPCELL \ -- u
Size of a floating point number in bytes.
FPCELL constant FPCELL \ -- u
Size of a floating point number in bytes.
FPCELL setFloatSize \ --
Tell the cross compiler the size in memory of a floating
point number.
FPCELL setFloatAlignment \ --
Tell the cross compiler the alignment in memory of a floating
point number.
: (doFloatAlignment) \ --
Host word that performs float alignment.
: finit \ F: i*f -- ; resets FPU and FP stack
Reset the floating point stack.
Do not forget to use this in a task before using floating
point.
: fdepth \ -- #f
Floating point equivalent of DEPTH
. The result is returned
on the Forth data stack.
code CLZ \ x -- u
Return the number of leading zeros in x.
: DCLZ \ dx -- u
Return the number of leading zeros in the double dx.
code >fs \ f -- ; F: -- f
Copy a float from the data stack to the floating point stack.
code fs> \ F: f -- ; -- f
Copy a float from the float stack to the data stack.
code fps@ \ -- fps
Read the floating point status/control register.
code fps! \ fps --
Set the floating point status/control register.
code exp@ \ F: f -- f ; -- exp(2)
Copy the exponent of the top float to the data stack.
The IEEE exponent offset is removed.
code exp! \ exp(2) -- ; F: f -- f'
Change/Set the exponent of the top float.
The IEEE exponent offset is added.
code F! \ F: r -- ; addr --
Stores r at addr.
code F@ \ addr -- ; F: -- r
Fetches r from addr.
synonym SF! F! \ F: r -- ; addr --
Stores r at addr in IEEE 32 bit format.
synonym SF@ F@ \ addr -- ; F: -- r
Fetches r from addr, which contains a float
in IEEE 32 bit format..
: F, \ F: r --
Lays a real number into the dictionary, reserving FPCELL bytes.
Synonym SF, F, \ F: r --
Lays a real number into the dictionary as an IEEE 32 bit number.
code FDUP \ F: r -- r r
Floating point equivalent of DUP
.
code FOVER \ F: r1 r2 -- r1 r2 r1
Floating point equivalent of OVER
.
code FSWAP \ F: r1 r2 -- r2 r1
Floating point equivalent of SWAP
.
code FPICK \ u -- ; F: fu..f0 -- fu..f0 fu
Floating point equivalent of PICK
.
code FROT \ F: r1 r2 r3 -- r2 r3 r1
Floating point equivalent of ROT
.
code F-ROT \ F: r1 r2 r3 -- r3 r1 r2
Floating point equivalent of -ROT
.
code FDROP \ F: r --
Floating point equivalent of DROP
.
code FNIP \ F: r1 r2 -- r2
Floating point equivalent of NIP
.
code f>r \ F: f -- ; R: -- f
Put float onto return stack.
code fr> \ R: f -- ; F: -- f
Pull float from the return stack.
code flit \ F: -- f ; inline literal
Run-time routine for a floating point literal. Cortex version.
code flit \ F: -- f ; inline literal
Run-time routine for a floating point literal. ARM32 version.
: DLSHIFT \ d u -- d<<u
Double number left shift, with u in the range 1..64.
: DRSHIFT \ d u -- d>>u
Double number right shift, with u in the range 1..64.
: FVARIABLE \ "<spaces>name" -- ; Run: -- addr
Use in the form: FVARIABLE <name>
to create a variable
that will hold a floating point number.
: FCONSTANT \ F: r -- ; "<spaces>name" -- ; Run: -- r
Use in the form: <float> FCONSTANT <name>
to create a
constant that returns a floating point number.
: FARRAY \ "<spaces>name" fn-1..f0 n -- ; Run: i -- ; F: -- ri
Create an initialised array of floating point numbers. Use
in the form:
fn-1 .. f1 f0 n FARRAY <name>
to create an array of n floating point numbers. When the
array name
is executed, the index i is used to return
the address of the i'th 0 zero-based element in the array.
For example:
4e0 3e0 2e0 1e0 0e0 5 FARRAY TEST
will set up an array of five elements.
Note that the rightmost float (0e0) is element 0.
Then i TEST
will return the *\{i}th element.
: FBUFF \ u "name" -- ; i -- addr
Creates a buffer for u floats in the current memory
section. The child action is to return the address of the
ith element (zero-based).
10 fbuff foo
Creates an buffer for ten float elements.
3 foo
Returns the address of element 3 in the buffer.
code FSIGN \ F: fn -- |fn| ; -- flag ; true if negative
Return the absolute value of fn and a flag which is true
if fn is negative. F.P. stack operation.
code S>F \ n -- ; F: -- fn
Converts a single signed integer to a float.
code F>S \ F: fn -- ; -- n
Converts a float to a single integer.
Note that F>S
truncates the number towards zero
according to the ANS specification. If |fn| is greater
than maxint, +/-maxint is returned.
: D>F \ d -- ; F: -- fn
Converts a double integer to a float.
: f>d \ f -- ; -- dint(f)
Converts a float to a double integer.
Note that F>D
truncates the number towards zero
according to the ANS specification.
: FINT \ F: f1 -- f2
Chop the number towards zero to produce a floating point
representation of an integer.
code FNEGATE \ F: r1 -- r2
Floating point negate.
: ?FNEGATE \ n -- ; F: fn -- fn|-fn
If n is negative, negate fn.
code FABS \ F: fn -- |fn|
Floating point absolute.
code F+ \ F: r1 r2 -- r3
Floating point addition.
code F- \ F: r1 r2 -- r3
Floating point subtraction, r3=r1-r2
code F* \ F: r1 r2 -- r3
Floating point multiply.
code F/ \ F: r1 r2 -- r3
Floating point divide.
code fsqrt \ F: f1 -- f2
F2=sqrt(f1).
: FSEPARATE \ F: f1 f2 -- f3 f4
Leave the signed integer quotient f4 and remainder f3 when
f1 is divided by f2. The remainder has the same sign as the
dividend.
: FFRAC \ F: f1 f2 -- f3
Leave the fractional remainder from the division f1/f2. The
remainder takes the sign of the dividend.
code F0< \ F: f1 -- ; -- flag
Floating point 0<
.
code F0> \ F: f1 -- ; -- flag
Floating point 0>
.
code F0= \ F: f1 -- ; -- flag
Floating point 0=
.
code F0<> \ F: f1 -- ; -- flag
Floating point 0<>
.
: F= \ F: f1 f2 -- ; -- flag
Floating point =
.
: F< \ F: r1 r2 -- ; -- flag
Floating point <
.
: F> \ F: f1 f2 -- ; -- flag
Floating point >
.
: FMAX \ F: r1 r2 -- r1|r2
Floating point MAX
.
: FMIN \ F: r1 r2 -- r1|r2
Floating point MIN
.
: FALIGNED \ addr -- f-addr
Aligns the address to accept a 4-byte float.
: FALIGN \ --
Aligns the dictionary to accept a 4-byte float.
Synonym SFALIGNED ALIGNED \ addr -- f-addr
Aligns the address to accept a 4-byte float.
Synonym SFALIGN ALIGN \ --
Aligns the dictionary to accept a 4-byte float.
: FLOAT+ \ f-addr1 -- f-addr2
Increments addr by 4, the size of a float.
: FLOATS \ n1 -- n2
Returns n2, the size of n1 floats.
Synonym SFLOAT+ 4+ \ f-addr1 -- f-addr2
Increments addr by 4, the size of an S-float.
Synonym SFLOATS CELLS \ n1 -- n2
Returns n2, the size of n1 S-floats.
Floating point IEEE numbers have the following approximate ranges:
As a result, the input code is different for 32 bit and 64 bit floats.
f# 1.0e-32 fconstant f%10^-32
Floating point 1.0e-32.
f# 1.0e-16 fconstant f%10^-16
Floating point 1.0e-16.
f# 0.1e0 fconstant f%.1
Floating point 0.1.
f# 1.0e0 fconstant f%1
Floating point 1.0.
f# 10.0e0 fconstant f%10
Floating point 10.0.
f# 1.0e16 fconstant f%10^16
Floating point 1.0e16.
f# 1.0e32 fconstant f%10^32
Floating point 1.0e32.
16 FARRAY POWERS-OF-10E1
An array of 16 powers of ten starting at 10^0
in steps of 1.
16 FARRAY POWERS-OF-10E-1
An array of 16 powers of ten starting at 10^0
in steps of -1.
: RAISE_POWER \ exp(10) -- ; F: f -- f'
Raise the power in preparation for number formatting.
: SINK_FRACTION \ exp(10) -- ; F: f -- f'
Reduce the power in preparation for number formatting.
: *10^X \ exp(10) -- ; F: f -- f'
Generate float' = float *10^dec_exp.
Note that number conversion takes place in PAD
.
: CONVERT-EXP \ c-addr --
If the character at c-addr is 'D' convert it to 'E'.
: CONVERT-FPCHAR \ c-addr --
Convert the f.p. char '.' to the double char ',' for
conversion.
: ALL-BLANKS? \ c-addr len -- flag
Return true if string is all blanks (spaces).
: FCHECK \ -- am lm ae le e-flag .-flag
Check the input string at PAD
, returning the separated
mantissa and exponent flags. The e-flag is returned true
if the string contained an exponent indicator 'E' and
the .-flag is returned true if a '.' was found.
: doMNUM \ c-addr u -- d 2 | 0
Convert the mantissa string to a double number and 2. If
conversion fails, just return 0.
: doENUM \ c-addr u -- n 1 | 0 ; str as above
Convert the exponent string to a single number and 1. If
conversion fails, just return 0.
: FIXEXP \ dmant exp(10) -- ; F: -- f
Convert a double integer mantissa and a single integer
exponent into a floating point number.
: FNUMBER? \ addr -- 0 | n 1 | d 2 | -1 ; F: -- [f]
Behaves like the integer version of NUMBER?
except that if
the number is in F.P. format and BASE
is decimal, a floating
point conversion is attempted. If conversion is successful,
the floating point number is left on the float stack and
the result code is 2. This word only accepts text with an
'E' as a floating point indicator, e.g, 1.2345e0.
If *\fo{BASE is not decimal all numbers are treated as integers.
The integer prefixes '#','$','0x' etc. are recognised and
cause integer conversion to be used.
: >FLOAT \ c-addr u -- true|false ; F: -- [f]
Try to convert the string at c-addr/u to a floating point number.
If conversion is successful, flag is returned true, and a floating
number is returned on the float stack, otherwise just flag=0 is
returned. This word accepts several forms, e.g. 1.2345e0, 1.2345,
12345 and converts them to a float. Note that double numbers
(containing a ',') cannot be converted.
Number conversion is decimal only, regardless of the current BASE
.
: FLITERAL \ Comp: F: r -- ; Run: F: -- r
Compiles a float as a literal into the current definition.
At execution time, a float is returned. For example,
[ f%PI 2e0 F* ] FLITERAL
will compile 2PI as a floating point
literal. Note that FLITERAL
is immediate.
: (F#) \ addr -- -1|0 ; F: -- [f]
The primitive for F#
below.
: F# \ F: -- [f] ; or compiles it (state smart)
If interpreting, takes text from the input stream and,
if possible converts it to a f.p. number on the stack.
Numbers in integer format will be converted to floating-point.
If compiling, the converted number is compiled.
Note that, because global conversion buffers are used, the floating point output routines are not thread-safe.
8 value precision \ -- u
Number of significant digits output.
: set-precision \ u --
Set the number of significant digits used for output.
: exp(10) \ F: f -- f ; -- exp[10]
Generate the power of ten corresponding to the float's power of two.
64 buffer: fopbuff \ -- addr
Buffer in which output string is built.
32 buffer: frepbuff \ -- addr
Buffer for use as the output of REPRESENT
.
: roundfp \ F: +f -- +f'
Add 0.5e(exp-precision-1).
: REPRESENT \ F: r -- ; c-addr u -- n flag1 flag2
Assume that the floating number is of the form +/-0.xxxxEyy.
Place the significand xxxxx at c-addr with a maximum of u digits.
Return n the signed integer version of yy. Return flag1 true
if f is negative, and return flag2 true if the results are
valid. In this implementation all errors are handled by
exceptions, and so flag2 is always true.
: append \ c-addr u $dest --
Add the string described by C-ADDR U to the counted string at
$DEST. The strings must not overlap.
: (.sign) \ flag $out --
Add '-' or nothing to the output string.
: (.mant) \ binp $out n --
Add the mantissa string at binp), produced by *\fo{REPRESENT,
to a counted string at $out) with *\i{n digits before the
decimal point.
: (.exp) \ exp(10) $out --
Add the exponent to the output string.
: (.initfop) \ f -- ; -- exp(10)
initialise output conversion.
: (fs.) \ F: f -- ; -- caddr len
Produce a string containing the number in scientific notation.
: (fe.) \ F: f -- ; -- caddr len
Produce a string containing the number in engineering notation.
: ff? \ f: f -- f ; -- flag
Return true if the number can be represented in free format.
: (ff.) \ F: f -- ; -- caddr len
Produce a string containing the number in free notation.
If the number cannot be displayed in free notation,
scientific notation is uesed.
: fs. \ F: f --
Display f in scientific notation:
x.xxxxxE[-]yy
: fe. \ F: f --
Display f in engineering notation:
x.xxxxxE[-]yy
where the mantissa is 1 <= mantissa < 1000 and the exponent is a multiple of three.
: ff. \ F: f --
Display f in free notation:
xxx.xxxxx
: F. \ F: f --
Print the f.p. number in free format, xxxx.yyyy, if
possible. Otherwise display using the x.xxxxEyy format.
f# 1.0e0 fconstant f%ONE
Floating point 1.0.
f# 2.0e0 fconstant f%two
Floating point 2.0.
: 1/f \ F: f1 -- f2
Reciprocal; f2=1/f1.
: f2/ \ F: f1 -- f2
Divide by 2.0; f2=f1/2.0.
: FLOOR \ F: r1 -- r2
Floored round towards -infinity.
: FROUND \ F: r1 -- r2
Round the number to nearest or even.
N.B. All angles are in radians.
: DEG>RAD \ F: n1 -- n2
Convert degrees to radians.
: RAD>DEG \ F: n1 -- n2
convert radians to degrees.
: FSIN \ F: f1 -- f2
f2=sin(f1).
: FCOS \ F: f1 -- f2
f2=cos(f1).
: FTAN \ F: f1 -- f2
f2=tan(f1).
: FASIN \ F: f1 -- f2
f2=arcsin(f1).
: FACOS \ F: f1 -- f2
f2=arccos(f1).
: FATAN \ F: f1 -- f2
f2=arctan(f1).
: FLN \ F: f1 -- f2
Take the logarithm of f1 to base e and return the result.
: FLOG \ F: f1 -- f2
Take the logarithm of f1 to base 10 and return the result.
: FEXP \ F: f1 -- f2
f2=e^f1.
Synonym FE^X FEXP \ F: f1 -- f2
Compatibility word.
: fexpm1 \ r1 -- r2
Raise e to the power r1 and subtract one, giving r2.
: F10^X \ F: f1 -- f2
f2=10^f1
: FX^N \ n -- ; F: fx -- fx^n
fx^n=x^n where x is a float and n is an integer.
: F** \ F: fx fy -- fx^fy
fn=X^Y where Y and Y are both floats.
Synonym FX^Y F** \ --
Compatibility word for old code.
: fcosec \ F: f -- cosec(f)
Floating point cosecant.
: fsec \ F: f -- sec(f)
Floating point secant.
: fcotan \ f: f -- cot(f)
Floating point cotangent.
: fsinh \ F: f -- sinh(f) ; (e^x - 1/e^x)/2
Floating point hyberbolic sine.
: fcosh \ F: f -- cosh(f) ; (e^x + 1/e^x)/2
Floating point hyberbolic cosine.
: ftanh \ F: f -- tanh(f) ; (e^x - 1/e^x)/(e^x + 1/e^x)
Floating point hyberbolic tangent.
: fasinh \ F: f -- asinh(f) ; ln( f+sqrt(1+f*f) )
Floating point hyberbolic arcsine.
: facosh \ F: f -- acosh(f) ; ln( f+sqrt(f*f-1) )
Floating point hyberbolic arccosine.
: fatanh \ F: f -- atanh(f) ; ln( (1+f)/(1-f) )/2
Floating point hyberbolic arctangent.
: reals \ -- ; turn FP system on
Switch the system and NUMBER?
to permit floating point
input using FNUMBER?
. This action can be reversed by
INTEGERS
.
: integers \ -- ; turn FP system off
Switch the system and NUMBER?
to restore integer-only
input.
As of January 2019, target floating point constants are defined as
f%name
rather than the previous %name
. The change
prevents name conflicts with binary integers that use a '%'
prefix.
The ANS and Forth-2012 specifications define the format of floating point numbers during text interpretation as:
Convertible string := <significand><exponent>
<significand> := [<sign>]<digits>[.<digits0>]
<exponent> := E[<sign>]<digits0>
<sign> := { + | - }
<digits> := <digit><digits0>
<digits0> := <digit>*
<digit> := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }
This format is handled by the word FNUMBER?
. The word
>FLOAT
accepts a more relaxed format.
Convertible string := <significand>[<exponent>]
<significand> := [<sign>]{<digits>[.<digits0>] | .<digits> }
<exponent> := <marker><digits0>
<marker> := {<e-form> | <sign-form>}
<e-form> := <e-char>[<sign-form>]
<sign-form> := { + | - }
<e-char> := { D | d | E | e }
This restriction makes it difficult to use the text interpreter
during program execution as it requires floating point numbers
to contain 'D' or 'E' indicators, which is not profane practice.
A quick kluge to fix this is to change FNUMBER?
as below.
Replace:
fcheck drop if \ valid f.p. number?
with:
fcheck or if \ valid f.p. number?
Note that this change can/will cause problems if number base
is not DECIMAL
.
Because we still have to support a variety of legacy floating point packages, plus new ones for as yet undefined CPUs, the handling of F.P. literals is far from perfect.
Our recommendation for use of floating point literals in both
cross-compiled and target code, is to use F#
and/or
FLITERAL
, e.g to compile 2.0e0, use one of the following
: foo ... f# 2.0e0 ... ;
: foo ... [ 2 s>f ] fliteral ... ;