The Lib folder/directory contains tools maintained and and periodically updated by MPE. The contents of Lib differ between the Windows, Linux, OS X and DOS versions as some of the tools are operating system specific.
Cross reference information helps you to manage your source code.
When LIB\XREF.FTH is loaded you can use XREF <name>
to
find out in which other words <name>
is used. You can also
find out which words you defined but did not use. XREF
is precompiled in the Studio version of VFX Forth but not in
the base version.
The compiler generates cross references by building a chain of fields
including LOCATE
format (link:32, xt:32, line#:32) in a
separate area of memory. Links and pointers are relative to the start
of the XREF
memory area.
Two chains are maintained. The first produces a chain of
where a word is used, so that the user can find out where
(say) DUP
is used. The second produces a chain of
which words and literals are called in order. This is the
basis of decompilation and debugging.
XREF
is initialised by the switch +XREFS
and is
terminated by -XREFS
. You must use +XREFS
to
turn on the production of cross reference information.
By default 1Mb of cross reference memory is allocated from the heap.
If you need more than this for a very large application, use the phrase
<n> XREF-KB
to set the size of the cross reference memory, where
<n> is in kilobytes.
Because the VFX code generator optimises so heavily, there is no
direct relationship between the binary code and the source code.
Consequently DIS
and DASM
use disassembly and special cases, but
cannot produce a good approximation to the original source code.
The cross reference information includes a decompilation chain.
When you use SHOW <name>
the cross reference information is used to
produce a machine decompilation. This includes none of the comments
from the original source code, and is machine formatted.
The decompilation produced by SHOW
is mostly default and
automatic. However, some words such as string handling
take in line data which would not be displayed by SHOW
without
special handling.
SHOW
can be extended by adding items to the DCC-SWITCH
chain. The
stack effect of the action is: addrx -- addr ; where addrx is the
offset of the cross reference packet in the cross reference information
memory. See the /REF[X]
structure in LIB\XREF.FTH for details
of the structure of this data packet. The example below is for a
word X" which takes an in-line string like S".
[+switch dcc-switch
' X" run: ." X" [char] " emit dup .$inline ;
switch]
Note that unlike previous VFX Forth decompilers, SHOW
is based
on cross reference information which references the source word
without knowledge of what it compiles. The only reasons for special
cases are control of the decompilation layout and display of
associated data to reconstruct source code.
: dump(x) \ offset len --
Displays the specified contents of the XREF table. Note that the
given address is an offset from the start of the XREF table.
: init-xref \ --
Initialise XREF memory and information if not already set up.
: term-xref \ --
Free up XREF memory.
: save-xref \ -- ; save XREF memory to file
Save the cross reference memory to disc. Unless the
file name has been changed by XREF: <filename>
the
file will be called XREF.XRF.
: load-xref \ -- ; reload XREF file from disc
Load the cross reference memory from disc. Unless the
file name has been changed by XREF: <filename>
the
file used will be XREF.XRF.
: xref: \ "filename" -- ; enable XREFs
Use in the form XREF: <filename>
to define the file that
SAVE-XREF
and LOAD-XREF
will use.
: xref-kb \ n --
Specifies the size of the cross reference memory in kilobytes.
By default this is 1024 kb, or 1Mb.
: +xrefs \ -- ; enable XREF
Initialises the cross reference system if it has not already
been initialised, and enables production of cross reference
information.
: -xrefs \ -- ; disable XREF
Stops production of cross reference information, which can
be restarted by +XREFS
.
Cross reference memory is not erased or released. Thus,
restarting with +XREFS
will retain information.
To release all previous information use TERM-XREF
before +XREFS
.
: xref-report \ -- ; display XREF information
Displays some statistics about cross reference memory
usage.
: WalkXref \ xt1 xt2 -- ; XREF of XT1 using XT2 to display.
Used by application tools to walk the XREF chain for XT1.
The structure offset for each step in the chain is handled
by XT2 ( offset -- ). Because writing XT2 requires use of
the internal XREF structure, you must expose the XREFFER
module: EXPOSE-MODULE XREFFER
to get access to the words
in Lib\XREF.FTH.
: (show) \ xt -- ; show/decompile words used by this XT
Given an XT, produces a machine decompilation of the word
using the cross reference information. If cross referencing
is not enabled, no action is taken.
: $show \ $addr --
Given a counted string, it is looked up as a Forth word name
and (SHOW)
produces a machine decompilation of the word
using the cross reference information. If cross referencing
is not enabled, no action is taken.
: show \ -- ; SHOW <name>
The following name is looked up as a Forth word name
and (SHOW)
produces a machine decompilation of the word
using the cross reference information. If cross referencing
is not enabled, no action is taken.
: hasXref? \ xt -- flag ; true if word has XREF info
produces TRUE if xt has XREF information
otherwise FALSE is returned.
: hasXDecomp? \ xt -- flag ; true if word has XREF decompilation info
produces TRUE if xt has XREF decompilation information
otherwise FALSE is returned.
: WalkDecomp \ xt1 xt2 -- ; DECOMP of XT1 using XT2 to display.
Used by application tools to walk the decompilation chain for XT1.
The structure offset for each step in the chain is handled
by XT2 ( offset -- ). Because writing XT2 requires use of
the internal XREF structure, you must expose the XREFFER
module: EXPOSE-MODULE XREFFER
to get access to the words
in Lib\XREF.FTH.
: FindXrefInfo \ pc xt -- info | 0 ; finds xref packet corresponding to PC
Given the current PC and the XT of the word the PC is in,
FindXrefInfo
returns a pointer to an XREF packet if the
PC is at an exact compilation boundary, otherwise it returns
zero.
: FindXrefNearest \ pc xt -- info|0
Given the current PC and the XT of the word the PC is in,
FindXrefNearest
returns a pointer to the Xref packet
for the address at or less than the PC. If no Xref
information is available for the word, zero is returned.
: GetXrefPos \ info -- startpos len line addr
Given a pointer to an XREF packet, GetXrefPos
returns the
position, name length, line number of the source text in the source file, and
the value of HERE
at the time of compilation.
: NextXref \ info1 -- info2
Steps to the next info packet, given the offset of the previous.
: xref \ -- ; XREF <name>
Use in the form XREF <name> to display where <name> is used.
: uses \ -- ; synonym for XREF
A synonym for XREF
above.
: xref-all \ -- ; cross reference all words
Produces a cross reference listing of all the words with
cross reference information. This information is often too
long to be directly useful, but can be pasted from the
console to an editor for sorting, printing, and other
post-processing.
: xref-unused \ -- ; cross reference all words
Produces a cross reference listing of all the unused words with
cross reference information. This information is often too
long to be directly useful, but can be pasted from the
console to an editor for sorting, printing, and other
post-processing.
: ttx-set \ xt -- ; xt TTX-SET "<text>"
The quoted string is saved as the tooltip text for the
word whose xt is given, e.g.
' dup ttx-set "x -- x x ; duplicate top item on stack"
: ttx-get \ xt -- caddr len
Given an xt, return the tooltip text for the word.
: ttx? \ xt -- flag
Return true if the word whose xt is given has a tooltip.
This optional wordset found in /Lib/StringPk.fth contains the following definitions to aid in the manipulation of counted strings.
: $variable \ #chars "name" --
Create a string buffer with space reserved for #chars characters
: $constant \ "name" "text" --
Create a string constant called "name" and parse the the closing
quotes for the content.
: ($+) \ c-addr u $dest --
Add the string described by C-ADDR U to the counted string at
$DEST. This word is now in the kernel.
: $+ \ $addr1 $addr2 --
Add the counted string $ADDR1 to the counted buffer at $ADDR2.
This word is now in the kernel.
: $left \ $addr1 n $addr2 --
Add the leftmost N characters of the counted string at $ADDR1 to
the counted buffer at $ADDR2.
: $mid \ $addr1 s n $addr2 --
Add N characters starting at offset S from the counted string at
$ADDR1 to the counted buffer at $ADDR.
: $right \ $addr1 n $addr2 --
Add the rightmost N characters of the counted string at $ADDR1 to
the counted buffer at $ADDR2.
: $val \ $addr -- n1..nn n
Attempt to convert the counted string at $ADDR1 into a number. The
top-most return item indicates the number of CELLS used on stack to
store the return result. 0 Indicates the string was not a number, 1
for a single and 2 for a double. $VAL obeys the same rules as NUMBER?.
: $len \ $addr -- len
Return the length of a counted string. Actually performs C@ and is
the same as COUNT NIP.
: $clr \ $addr --
Clear the contents of a counted string. Actually sets its length
to zero. Primarily used to reset buffers declared with $VARIABLE.
: $upc \ $addr --
Convert the counted string at $ADDR to uppercase. This acts in
place.
: $compare \ $addr1 $addr2 -- -1/0/+1
Compare two counted strings. Performs the same action as the ANS
kernel definition COMPARE except that it uses counted strings as input
parameters.
: $< \ $1 $2 -- flag
A counted string equivalent to the numeric < operator. Uses
$COMPARE then generates a well - formed flag.
: $= \ $1 $2 -- flag
A counted string equivalent to the numeric = operator. Uses
$COMPARE then generates a well - formed flag.
: $> \ $1 $2 -- flag
A counted string equivalent to the numeric > operator. Uses
$COMPARE then generates a well - formed flag.
: $<> \ $1 $2 -- flag
A counted string equivalent to the numeric <> operator. Uses
$COMPARE then generates a well-formed flag.
: $instr \ $1 $2 -- false | index true
Look for an occurance of the counted string $2 within the string
$1. If found then the start offset within $1 is returned along with
a TRUE flag, otherwise FALSE is returned.
A CHAIN is an extensible version of the CASE..OF..ENDOF..ENDCASE
mechanism. It is very similar to the SWITCH
mechanism
described in the Tools and Utilities chapter.
: case-chain \ -- addr ; -- addr MPE.0000
Begin initial definition of a chain
: item: \ addr n -- addr ; MPE.0000
Begin definition of a conditional code block
: end-chain \ addr -- MPE.0000
Flag the end of the current block of additions to a chain
: in-chain? \ n addr -- flag ; MPE.0000
Return TRUE if N is in the chain beginning at ADDR
: exec-chain? \ i*x n addr -- j*x true | n FALSE MPE.0000
Run through a given chain using TOS as a selector. If a match is
made execute the relevant code block and return TRUE otherwise
the initial selector and a FALSE flag is returned.
CASE-CHAIN <foo>
<n> ITEM: <words> ;
<m> ITEM: <words> ;
<k> ITEM: <words> ;
END-CHAIN
More items can be added later:
<foo>
<x> ITEM: <words> ;
...
END-CHAIN
The data structures are as follows:
CASE-CHAIN <foo> generates a variable that points to the last item added to the list.
ITEM: generates two cells and a headerless word:
selector
link
headerless word .... exit
Binary overlays are pieces of the dictionary that have been compiled and saved with relocation information. They can be reloaded as needed and released on demand. Binary overlays are useful when you want to ship tools that are only needed during development, or if you have a large application whose memory footprint you want to reduce by only loading parts of the application when needed.
The binary overlay utility is not part of the kernel, but can be compiled from LIB\OVLVFX.FTH. As of build 3.40.0808, there has been major change in the way overlays are constructed. This change removes many restrictions that were present in earlier builds. To use the new overlay handler, all overlays must be rebuilt.
An overlay is generated by MAKEOVERLAY
MAKEOVERLAY <sourcename> <overlayname>
the file <sourcename> is compiled twice. Relocation information is extracted and saved to the overlay along with the raw binary information. If any previously loaded overlays are needed by this overlay, their names are saved in the overlay and they will be automatically reloaded if necessary. After the overlay has been generated, the overlay code is removed. Overlays can be tested by compiling <sourcename> conventionally, and then finally generating the overlay when you are satisfied with it. MAKEOVERLAY preserves and links all vocabularies including SOURCEFILES. Overlay files are saved by MAKEOVERLAY in the current directory. The compiler imposes the following initial condition before the overlay file is compiled:
DECIMAL -SHORT-BRANCHES +SIN +SINDOES
MAKEOVERLAY releases all previously loaded overlays. As a consequence, if the overlay to be compiled requires other overlays, you must load them explicitly by specifying them as dependencies before using MAKEOVERLAY. A dependency list is defined by the word [DEPENDENCIES followed by a list of overlay file names as required by LOADOVERLAY below. The list is termininated by DEPENDENCIES]. Use in the form:
[dependencies
primovl secovl ...
dependencies]
makeoverlay MyOvL
This will cause MAKEOVERLAY to load the dependent overlays PRIMOVL.OVX and SECOVL.OVX and so on.
When an overlay is reloaded by LOADOVERLAY
LOADOVERLAY <overlayname>
the binary code and relocation information are loaded. If the overlay file references other overlays, these are loaded before the relocated binary code is installed. Overlay code is loaded into memory allocated from the Windows heap, and are linked in reverse load order, so that the last loaded is found first. The result of this is that the overlays are always loaded in dependency order, and releasing a "leaf" overlay will not affect the dependencies of other previously loaded overlays.
Although overlay files are saved by MAKEOVERLAY in the current directory, LOADOVERLAY will look first in the current directory and then in the directory from which the application was loaded. This allows all overlays and the main executable to reside in the same directory regardless of the current directory, but maintains convenience during development.
An overlay can be released by the use of RELEASEOVERLAY.
RELEASEOVERLAY <overlayname>
All loaded overlays can be released by RELEASEALLOVERLAYS
ReleaseAllOverlays
A word can be set to excute whenever the overlay is loaded from file or released. These words permit the overlay to allocate and free resources such as memory buffers.
' <load-action> SetOvlLoadHook
' <release-action> SetOvlReleaseHook
Note that these settings should be in the overlay load file. The stack effect of <load-action> and <release-action> must be neutral, i.e. take nothing and return nothing [ -- ].
From VFX Forth v3.4 onwards, the naming conventions have been changed.
The binary overlay files have a ".OVX" extension. The word MAKEOVERLAY creates the overlay for you as follows:
MAKEOVERLAY <sourcename> <overlayname>
If the source file name does not have an extension, the rules of INCLUDED will be followed, checking for files with extensions ".BLD" ".FTH" ".F" ".CTL" ".SEQ" in that order. If the destination file name does not have an extension ".OVX" will be used. If the destination file name is not provided, the source file name is used with a ".OVX" extension. Thus, just typing MAKEOVERLAY FOO will compile FOO.FTH to create FOO.OVX. The overlay name held by the system is the output file specification as given or created by MAKEOVERLAY, converted to upper case. This is important when reloading the overlay.
If no extension is provided for LOADOVERLAY, a ".OVX" extension will be added to the file name. Thus LOADOVERLAY FOO will check if an overlay called FOO.OVX has been loaded, and will load from file FOO.OVX. Similarly, LOADOVERLAY FOO.OVX will check if an overlay called FOO.OVX has been loaded, and will load from file FOO.OVX.
Each overlay contains VFX Forth information, and overlays cannot be loaded by a version of VFX Forth other than the one that built it. A user defined version string can be added to the version control information using SETOVLVER, which takes the address of a counted string. The format of the string is entirely user defined, the overlay handler simply checks the strings for identity.
Note that this version of LIB\OVLVFX.FTH requires VFX Forth build 3.40.0808 of 15 March 2002 or later.
The following system state is preserved and restored by the overlay handler.
Overlays needed by the current overlay
Vocabularies and vocabulary link
Wordlists and wordlist link
Libraries
Imported functions
If you generate other system-wide chains, these will NOT be preserved. To preserve them, modify the code in LIB\OVLVFX.FTH using the xxxIMPORTLINK words as a model. Future versions of this code may support a chain of chains model, but this will require that ALL such chains are anchored in the VFX Forth kernel/application before any overlays are either generated or reloaded.
N.B. If you modify this code, please pass it back to MPE so that it can be incorporated in later builds. This will reduce your maintenance work, our technical support load, and you will benefit from the work of others.
The overlay is produced by comparing two versions of the binary at different addresses, and generating relocation information from any differences. If a relocation value does not correspond to another overlay or the VFX Forth kernel, the build of the overlay will cause an error. Such errors can be caused by anything that inadvertently changes the data or code generation of the two versions being compared.
If data space in the dictionary is not initialised at compile time, it may contain random data. Compare:
<size> BUFFER: <name> \ safe
CREATE <name> <size> ALLOT \ unsafe
CREATE <name> <size> ALLOT&ERASE \ safe
The initial conditions of directives that affect code generation must be the same for each build. At least the following directives should be considered:
+SHORT-BRANCHES -SHORT-BRANCHES branch code size
+SIN -SIN source inlining
+SIN-DOES -SIN-DOES DOES> clause inlining
Similarly the starting codition of BASE should also be considered. The compiler imposes the following initial condition before the overlay file is compiled:
DECIMAL -SHORT-BRANCHES +SIN +SINDOES
When compiling an overlay strict control of the initial search order is often necessary, especially because of redefinitions. We recommend that overlays are constructed from a build file which ensures that other required overlays are installed.
A sign of bad search order control is that the overlay can be correctly built with the source inliner turned off, but will not build with it on.
You cannot use file names with spaces, even though GETPATHSPEC is used to input the file names, because the file names are internally used as Forth word names.
There are occasions when a four-byte code sequence matches an address in another overlay, causing false relocation data to be generated. The result will be code that is corrupt after loading.
This situation has been drastically improved by the overhaul of 14 March 2002, but the warning has been left in until we are confident that all situations have been covered.
defer ovl-init-compile \ -- ; set initial state
A DEFERred word to set the initial compilation state for
both compilations of the overlay source code. The default
condition is:
decimal optimised -short-branches +sin +sindoes
Do not rely on this word being present in future releases. It is only present for experimental use with very large overlays.
: [dependencies \ -- ; set up dependency list
This word is used before MAKEOVERLAY below to define a list
of overlays required by the overlay to be made. It is followed
by a list of overlay file names as required by LOADOVERLAY below.
The list is termininated by DEPENDENCIES]. Use in the form:
[dependencies
primovl secovl ...
dependencies]
: $MakeOverlay \ c-addr1 u1 c-addr2 u2 --
Use the first string as the source file name and the second string
as the overlay name. This word constructs a MAKEOVERLAY string and
EVALUATES it. $MAKEOVERLAY is provided for the construction of
higher level overlay management functions.
: MakeOverlay \ "src" ["dest"] -- ; MAKEOVERLAY <buildfile> <overlay>
Creates an overlay by loading an input file, which can itself load
other files, and producing an output file. If the source file name
does not have an extension, the rules of INCLUDED will be followed,
checking for files with extensions ".BLD" ".FTH" ".F" ".CTL" ".SEQ"
in that order. If the destination file name does not have an extension
".OVX" will be used. If the destination file name is not provided,
the source file name is used with a ".OVX" extension. Thus, just
typing MAKEOVERLAY FOO will compile FOO.FTH to create FOO.OVX.
The overlay name held by the system is the output specification
as given. This is important when reloading the overlay.
The compiler imposes the following initial condition before the
overlay file is compiled:
DECIMAL -SHORT-BRANCHES +SIN +SINDOES
: SetOvlLoadHook \ xt -- ; ' <load-action> SETOVLOADHOOK
This word sets the action to be performed whenever the overlay
is loaded from the file. This action is NOT called by LOADOVERLAY
if the overlay is already loaded. SETOVLLOADHOOK must be included
in the overlay load file.
: SetOvlReleaseHook \ xt -- ; ' <release-action> SETOVLRELEASEHOOK
This word sets the action to be performed when the overlay
is released. SETOVLRELEASEHOOK must be included
in the overlay load file.
: SetOvlVer \ c-addr --
Sets the address of a counted string added to the version control
information. All overlay loads will be checked against this string.
SETOVLVER must be used before MAKEOVERLAY. The string can be reset
at any time by 0 SETOVLVER.
: $OvlLoaded? \ c-addr u -- start true | 0 0
Converts the string to upper case and tests whether or not
the overlay has been loaded, returning its start address
in memory and true if loaded, or two zeros if not loaded.
See MAKEOVERLAY for a discussion of overlay names.
: $LoadOverlay \ c-addr u -- start|ior end|-1
Uses the given string as an overlay name, and reloads the
the overlay if not already loaded. If the overlay name does
not have an extension, ".OVX" will be used. Any other required
overlays will be loaded before the requested overlay. The start
and end+1 address of the overlay code after installation are
returned. $LOADOVERLAY is provided for the construction of
higher level overlay mangement functions.
On error, the start and end values are replaced by ior and -1.
: LoadOverLay \ "name" -- ; LOADOVERLAY <name>
Load an overlay whose name follows in the input stream.
See $LOADVERLAY for more details.
: .overlays \ -- ; display loaded overlays
Shows the names of the the loaded overlays.
: lo \ "name" -- ; LO <name>
A synonym for LOADOVERLAY.
See $LOADVERLAY for more details.
: mo \ "src" ["dest"] -- ; MO <buildfile> <overlay>
A synonym for MAKEOVERLAY.
: $ReleaseOverlay \ c-addr u -- ior
Release the overlay of the given name, returning a non-zero code if
the overlay was not loaded. The name is converted to upper case before
the comparison is performed. $RELEASEOVERLAY is provided for the
construction of higher level overlay mangement functions.
If the overlay was loaded when OVL_IN_DICT was set FALSE (the default),
overlays loaded after the specified one will also be removed.
If the overlay was loaded when OVL_IN_DICT was set TRUE, the overlay
is in the 'kernel' area of the dictionary, and any code compiled or loaded after
the overlay will also be removed. Overlays dependent on this one will
be removed.
: ReleaseOverlay \ "text" -- ; RELEASEOVERLAY <name>
Uses $RELEASEOVERLAY to release the overlay whose name follows.
See $RELEASEOVERLAY for more details.
: ro \ "text" -- ; RO <name>
A synonym for RELEASEOVERLAY.
See $RELEASEOVERLAY for more details.
: ReleaseAllOverlays \ --
Releases and unhooks all overlays. Executed automatically by
the Exit chain.
: ovl_in_dict \ -- addr ; true to load overlays in dictionary ; SFP022
Set this variable to TRUE to load overlays at the end of the
dictionary, rather than in memory allocated from the heap.
This is only required in special circumstances.
After overlays have been built, restore OVL_IN_DICT
to FALSE.
The code in Lib\XML.fth contains support for parsing XML input
and outputting XML using TYPE
and friends. The parser
is derived from Jenny Brien's JenX parser published at EuroForth
and in the magazine ForthWrite. Additional code was taken
from a a modified JenX parser by Leo Wong. The generic XML
description is by permission of Willem Botha of Construction
Computer Software (http://www.ccssa.com).
Additional tools required for XML handling are contained in this file. These may be moved to Lib\Win32\Helpers.fth in the future.
Since XML is non-proprietary and easy to read and write, it’s an excellent format for the interchange of data among different applications.
XML is a non-proprietary format, not encumbered by copyright, patent, trade secret, or any other sort of intellectual property restriction. It has been designed to be extremely powerful, while at the same time being easy for both human beings and computer programs to read and write. Thus it’s an obvious choice for exchange languages.
By using XML instead of a proprietary data format, you can use any tool that understands XML to work with your data.
XML is ideal for large and complex documents because the data is structured. It not only lets you specify a vocabulary that defines the elements in the document; it also lets you specify the relations between elements.
XML also provides a client-side include mechanism that integrates data from multiple sources and displays it as a single document.
XML doesn’t operate in a vacuum. Using XML as more than a data format requires interaction with a number of related technologies. These technologies include HTML for backward compatibility with legacy browsers, the CSS and XSL stylesheet languages, URLs and URIs, the XLL linking language, and the Unicode character set.
Since XML allows arbitrary tags to be included in a document, there isn’t any way for the browser to know in advance how each element should be displayed. When you send a document to a user you also need to send along a style sheet that tells the browser how to format individual elements. One kind of style sheet you can use is a Cascading Style Sheet (CSS).
CSS, initially designed for HTML, defines formatting properties like font size, font family, font weight, paragraph indentation, paragraph alignment, and other styles that can be applied to particular elements.
It’s easy to apply CSS rules to XML documents. You simply change the names of the tags you’re applying the rules to.
The Extensible Style Language (XSL) is a more advanced style-sheet language specifically designed for use with XML documents. XSL documents are themselves well-formed XML documents.
XSL documents contain a series of rules that apply to particular patterns of XML elements. An XSL processor reads an XML document and compares what it sees to the patterns in a style sheet. When a pattern from the XSL style sheet is recognized in the XML document, the rule outputs some combination of text.
XSL style sheets can rearrange and reorder elements. They can hide some elements and display others. Furthermore, they can choose the style to use not just based on the tag, but also on the contents and attributes of the tag, on the position of the tag in the document relative to other elements, and on a variety of other criteria.
XML documents can live on the Web, just like HTML and other documents. When they do, they are referred to by Uniform Resource Locators (URLs), just like HTML files.
Although URLs are well understood and well supported, the XML specification uses the more general Uniform Resource Identifier (URI). URIs are a more general architecture for locating resources on the Internet, that focus a little more on the resource and a little less on the location. In theory, a URI can find the closest copy of a mirrored document or locate a document that has been moved from one site to another.
As long as XML documents are posted on the Internet, you’re going to want to be able to address them and hot link between them. Standard HTML link tags can be used in XML documents, and HTML documents can link to XML documents.
XML lets you go further with XLinks for linking to documents and XPointers for addressing individual parts of a document.
XLinks enable any element to become a link, not just an A element. Furthermore, links can be bi-directional, multidirectional, or even point to multiple mirror sites from which the nearest is selected. XLinks use normal URLs to identify the site they’re linking to.
XPointers enable links to point not just to a particular document at a particular location, but to a particular part of a particular document. An XPointer can refer to a particular element of a document, to the first, the second, or the 17th such element, to the first element that’s a child of a given element, and so on. XPointers provide extremely powerful connections between documents that do not require the targeted document to contain additional markup just so its individual pieces can be linked to it. XPointers don’t just refer to a point in a document. They can point to ranges or spans.
XML defines a grammar for tags you can use to mark up a document. An XML document is marked up with XML tags. The default encoding for XML documents is Unicode.
Among other things, an XML document may contain hypertext links to other documents and resources. These links are created according to the XLink specification. XLinks identify the documents they’re linking to with URIs (in theory) or URLs (in practice). An XLink may further specify the individual part of a document it’s linking to. These parts are addressed via XPointers.
If an XML document is intended to be read by human beings—and not all XML documents are—then a style sheet provides instructions about how individual elements are formatted. The style sheet may be written in any of several style-sheet languages. CSS and XSL are the two most popular style-sheet languages, though there are others including DSSSL—the Document Style Semantics and Specification Language—on which XSL is based.
All parsing is processed using the input stream. This allows
XML files to be parsed by INCLUDE
, and strings from
sockets to be processed by EVALUATE
.
The XML parser parses tags "<...>" and the text between them,
called the contents. Inside a tag the text is separated into
the tag name and the attribute name/value pairs 'name="value"'.
Everything is held as text. Nested tags are supported.
Three DEFER
red words,
doTags ( -- )
,
doContents ( -- )
and
doAttribute ( val vlen name nlen -- )
must be supplied by the application to handle the data. These
words are documented later. Their default action is to display
the data so that you can see what has been processed.
The parser just generates and isolates the text. It is up to your application how the data is processed by the three words above. When a tag is processed, the tag handling routine can find the current tag name, the tag type, any attributes and the preceeding contents. The most common way to process tags and data is to ignore the contents before an opening tag, but to handle attributes. At the closing tag, the contents represent the data to be processed. Closing tag names include the leading '/' character so that opening and closing tags can be distinguished by name as well as status.
Simple facilities are provided for generating XML text and tags from various types of data. These are designed to allow other scripting tools to generate XML output.
This section contains general-purpose tools which may be useful in other applications.
1 value .UnknownXML? \ -- flag
If non-zero (default), show unknown XML tags and attributes.
: movex \ src dest len --
An optimised version of MOVE
.
: csplit \ addr len char -- raddr rlen laddr llen
Extract a substring at the start of addr/len, returning
the string raddr/rlen which includes char (if found) and
the string laddr/llen which contains the text to left of char.
If the string does not contain the character, raddr is
addr+len and rlen=0.
: #>c \ caddr u -- char
Converts a decimal or hexadecimal number to a single integer.
In XML white space is defined by tab and CR. Under some circumstances LF may also be treated as white space.
: skip-white \ caddr u -- caddr' u'
Remove leading white space.
: scan-black \ caddr u -- caddr' u.
Remove leading spaces and control characters.
: scan-quote \ caddr u -- caddr' u'
Step forward until either a single or a double quote
character is found. The returned string includes the
quote character.
: scan-white \ caddr u -- caddr' u'
Step to next white space character.
: -trailing-white \ caddr u -- caddr' u'
Remove trailing white space.
: -leading-white \ caddr u -- caddr' u'
Remove leading white space. A synonym for skip-white
.
: -white \ caddr u -- caddr' u'
Remove leading and trailing white space.
: >bl \ addr u -- addr u
Convert control characters to spaces.
The output formats are:
: date> \ day month year -- ud ; see month codes
Convert a day/month/year into a Gregorian day number.
1 1 1980 date> 2constant date0 \ -- ud
Defines day 0 as 1 Jan 1980 for dates.
: sdate> \ day month year -- u
Convert a day/month/year to a single day integer
based as above.
: >sdate \ u -- day month year
Convert a single day integer to day/month/year
Time of day may be stored as a single integer count of seconds. These routine provide conversion into secs/mins/hours format.
#24 #60 * #60 * constant secs/day \ -- 86400
Seconds per day.
#60 #60 * constant secs/hr \ -- 3600
Seconds per hour.
#60 constant secs/min \ -- 60
Seconds per minute.
#60 constant mins/hr \ -- 60
Minutes per hour.
#24 constant hrs/day \ -- 24
Hours per day.
: tod> \ ss mm hh -- secs
Convert a time of day in ss/mm/hh form to a single
integer.
: >tod \ secs -- ss mm hh
Convert a seconds integer to ss/mm/hh form.
Stackpads are effectively string stacks. String lengths are
kept as cells. Stackpads can be in statically (ALLOT
ed)
or dynamically (ALLOCATE
d) memory. A stackpad must be
initialised by SINIT
before use and terminated by STERM
after use. In this implementation, defined stackpads are
initialised at COLD
and terminated at BYE
.
Strings on a stackpad are held in the following format, where u is the length of the string in bytes:
len contents
u string text
? padding to cell boundary
cell u
The stackpad's top of stack pointer points to the length cell of the top item. To provide a valid cell, a zero length item is always created when the stackpad is initialised. Because the length cell is after the text, it is easy to manipulate the end of a string, to find the start address and to discard a string.
The requirement to align the length cell adds a little
complexity, but permits portability to processors which
require data alignment, e.g. ARM, and improves speed on
PCs. Stackpads are controlled using the /stackpad
structure below. The sp.ptos
field contains the stack
pointer. The sp.buff
field permits underflow checks.
The sp.len
field permits overflow checks.
The other fields allow for automatic instantiation
and termination of dynamically allocated stackpads.
Implementations without error checking only need the stack
pointer and could use the first cell of the buffer as the
stack pointer.
struct /stackpad \ -- len
Structure defining a stackpad.
variable spChain \ -- addr
Anchors the linked list of defined stackpads.
: sSpad: \ len -- ; -- spad
Create a static stackpad with ALLOT
ed control area
and data buffer.
: mSpad: \ len -- ; -- spad
Create a mixed stackpad with an ALLOT
ed control area
and an ALLOCATE
d buffer.
: newSpad \ len -- spad
Create a dynamic stackpad with ALLOCATE
d control area
and data buffer. A THROW
occurs if the memory cannot
be allocated.
: sinit \ spad --
Initialise a stackpad. A THROW
occurs on error.
: sterm \ spad --
Release dynamic memory if the given stackpad has it.
: initSpads \ --
Initialise all defined stackpads. Performed at COLD
.
: termSpads \ --
Clean up all defined stackpads, releasing any dynamically
allocated memory. Performed at BYE
.
: -align \ caddr -- addr'
Align a byte address to the previous cell boundary.
N.B. This word assumes a byte addressed 32 bit Forth.
: >spstr \ lp -- caddr u
Given a pointer to a length cell, return the string.
: >sps \ lp -- caddr
Given a pointer to a length cell, find the start of the
string.
: >spe \ lp -- caddr
Given a pointer to the length cell, find the address of the
character after the string.
: spush \ caddr u spad --
Push a string onto the stackpad.
: stos \ spad -- caddr u
Return the address and length of the top string. The string
is not popped.
: sdrop \ spad --
Discard top string from stackpad.
: spop \ spad -- caddr u
Return the address and length of the top string. The string
is popped. Note that the stackpad cannot safely be used until
all further processing of the string has been performed.
: snew \ spad --
Add a zero-length string.
: sappend \ caddr u spad --
Add the given string to the top stackpad string.
: s+char \ char spad --
Add the given character to the top stackpad string.
: .spad \ spad --
Display the strings on a stackpad.
Servants are a solution to CASE
statements involving
strings. A wordlist is defined to hold the actions required
when a string is matched, the word names forming the strings
to be matched. A default action must be specified. Note that
in MPE Forths, the name search is case insensitive. Note also
that without extensions to the word creation mechanism, the
Because the strings are isolated in wordlists, calls may be
nested.
: (Servant) \ i*x caddr u wid xt -- j*x
Looks up caddr/u in the wid wordlist. If the
word is found, it is executed. If the word is not found,
the caddr/u string is passed to the default action
xt which is executed.
: servant \ wid xt -- ; i*x caddr u -- j*x
Servant
creates a word that looks up caddr/u
in a given wordlist and executes the matching word if found
or a default word if not found. Servant
is supplied
with the wid of the wordlist and the xt of the
default action.
: creation \ wid --
Perform CREATE
, but define the word in the specified
wordlist.
: def: \ wid --
Perform :
, but define the word in the specified
wordlist.
cell +user CurrSpad \ -- addr
Holds the address of the stackpad being used for output.
cell +user RefillStatus \ -- addr
Holds non-zero when REFILL
has failed.
#32 kb mSpad: TagText \ -- spad
Stackpad for tag text <tag ....>.
#32 kb mSpad: Contents \ -- spad
Stackpad for everything not in a tag.
#32 kb mSpad: Attribs \ -- spad
Stackpad for attribute handling in tags.
In XML code the special characters and numbers are encoded in the form:
&xxx;
This code allows substitution of the original character.
: UnknownEntity \ caddr u --
The default action is to check for a number, and if that
fails just to pass the string to the output buffer.
Note that the string includes the leading '&' but
not the trailing ';'.
wordlist constant entity? \ -- wid
The private wordlist used to contain action words for
known entities.
: centity \ char -- ; --
Children of this defining word add a character to the
current stackpad. The words are used by the servant
DENT
below.
The following standard XML entities are predefined:
char < centity < char > centity > char ' centity &APOS char " centity " char & centity &
entity? ' UnknownEntity servant dent \ caddr u --
A servant which converts known entities and XML numbers
of the form &#xxx;
to characters or just copies the
string to the current stackpad.
: dents+ \ caddr u --
Add the string to the top of the current stackpad,
decoding and translating any entities.
: .Tag \ --
Default action of doTags
below.
: .Contents \ --
Default action of doContents
below.
: .Attribute \ val vlen name nlen --
Display the attribute name and value strings.
defer doTags \ --
User defined action (default .Tag
) that handles
tag strings. The tag handlers are responsible for all
handling of the contents
stackpad. The top string
on the *fo{TagText} stackpad is discarded after processing
the tag text.
defer doContents \ --
User defined action (default .Contents
) that handles
content strings. The contents
stackpad is not
discarded by doContents
.
defer doAttribute \ val vlen name nlen --
Process an attribute given strings for the value and name.
The default action is to display the attrubte.
: DefXML \ --
Set the default XML handlers.
vocabulary inputTags \ --
Vocabulary containing tag actions on input.
' inputTags voc>wid constant widInputs \ -- wid
Wordlist containing tag actions on input.
vocabulary outputTags \ --
Vocabulary containing tag actions on output.
' outputTags voc>wid constant widOutputs \ -- wid
Wordlist containing tag actions on output.
#256 buffer: CurrName \ -- addr
Buffer for the current tag name.
Held as a counted string.
For multi-threaded use this should be redefined
as thread-local storage.
#256 buffer: LastName
Buffer for the previous tag name.
Held as a counted string.
For multi-threaded use this should be redefined
as thread-local storage.
variable TagStatus
Status indicator for the current tag.
For multi-threaded use this should be redefined
as thread-local storage. The tag status is a bit mask in the
bottom 16 bits of a cell The upper 16 bits are reserved for
application use.
$0000 equ OPENING_TAG $0001 equ CLOSING_TAG $0002 equ EMPTY_TAG $0100 equ PI_TAG $0200 equ SPECIAL_TAG
variable LastStatus
Status indicator for the previous tag.
For multi-threaded use this should be redefined
as thread-local storage.
: defInputTag \ caddr u --
The default action for an unknown tag is to display the
content and tag strings.
widInputs ' defInputTag servant doInputTag \ caddr u --
Processes input tags given a tag name string.
: getTagName \ caddr u -- caddr' u' name nlen
From the given string, return the remaing string
and the tag name, which is the first whitespace
delimited token. Note that tag names include leading
'?' and '!' characters.
: getAttribName \ caddr u -- caddr' u' name nlen
From the given string, return the remaing string
and the attribute name, which is the first whitespace
delimited token before an '=' character.
: getAttribValue \ caddr u -- caddr' u' value vlen
From the given string, return the remaing string
and the attribute value string, which is enclosed
by quotation marks ' or ".
: getAttribute \ caddr u -- caddr' u'
From the given string extract an attribute name/value pair,
pass it to the deferred word doAttribute
and return
the remaining string. Attributes are of the form:
name = "value"
: SetTagStatus \ --
Set the tag status for opening/closing/empty, and
for processing instruction and specials (the !xxx tags).
: doTagText \ caddr u --
Parse the tag text <text...> excluding the brackets,
extracting the tag name and the attributes.
: RunInputTag \ --
The tag handler action of doTags
for active processing
of XML tags.
: ActiveXML \ --
Set the active XML handlers, so that known tags are
processed.
: AsFarAs \ char -- flag caddr u
Parse input stream up to char, returning the extracted
string.
: withText \ newspad -- oldspad
Start a new string on the given stackpad for a block of
processings and make it the current stackpad. Return
the previous current stackpad
: doneText \ oldspad --
Discard the current stackpad string and restore
the previous stackpad.
: doXMLblock \ char --
Collect input text up to the terminating character
into the current stackpad, and expand entities.
: skipPast \ c-addr u --
Step through the input stream for a string (not space
delimited), REFILL
ing as necessary until the string
is found or input is exhausted.
: doTagBlock \ x --
Process a tag block "<name ... >" starting immediately after
the leading '<' character. The tag text is discarded after
the tag has been processed. If x is non-zero, the tag
is initialised to "?xml"
: doContentBlock \ --
Process a content block up to but not including
the trailing '<' character.
: ReadXML \ --
Read XML from the current input stream.
: <?xml \ --
After <?xml
has been executed, all further input is
treated as XML source and handled by the XML parser.
These words are factors that can be used when constructing systems that extract and produce data in XML files. When producing an XML file, data is output by primitives that take the address of the data. When reading an XML file, data is set by primitives that take a string and the address of the data.
XML text output of tag or content data must not contain the special characters which must be converted to the standard entity format "&xxx;".
: XMLemit \ char --
Output a character translating the special characters.
: XMLtype \ caddr len --
Output a string translating the special characters.
: ud#>cl \ ud -- caddr len
Convert an unsigned double to a decimal text string.
: d#>cl \ ud -- caddr len
Convert a signed double to a decimal text string.
: cl>d# \ caddr len -- d
Convert the string to a double number.
: cl>ud# \ caddr len -- ud
Convert the string to an unsigned double number.
: ?i \ addr --
Display the contents of a signed 32 bit integer.
: !i \ caddr len dest --
Set the contents of a signed 32 bit integer.
: ?ui \ addr --
Display the contents of an unsigned 32 bit integer.
: !ui \ caddr len dest --
Set the contents of an unsigned 32 bit integer.
: ?d \ addr --
Display the contents of a signed 64 bit integer in Forth format
(high cell at low address).
: !d \ caddr len dest --
Set the contents of a signed 64 bit integer in Forth format
(high cell at low address).
: ?ud \ addr --
Display the contents of an unsigned 64 bit integer in Forth format
(high cell at low address).
: !ud \ caddr len dest --
Set the contents of an unsigned 64 bit integer in Forth format
(high cell at low address).
: ?dI \ addr --
Display the contents of a signed 64 bit integer in Intel format
(low cell at low address).
: !dI \ caddr len dest --
Set the contents of a signed 64 bit integer in Intel format
(low cell at low address).
: ?udI \ addr --
Display the contents of an unsigned 64 bit integer in Intel format
(low cell at low address).
: !udI \ caddr len dest --
Set the contents of a signed 64 bit integer in Intel format
(low cell at low address).
: cl>f# \ caddr u -- ; F: -- f
Convert a string to a floating point number. If a conversion
fault occurs, f is set to zero.
: ?fs \ addr --
Display the contents of 32 bit float.
: !fs \ caddr u dest --
Set the contents of a 32 bit float.
: ?fd \ addr --
Display the contents of 64 bit float.
: !fd \ caddr u dest --
Set the contents of a 64 bit float.
: ?ft \ addr --
Display the contents of an 80 bit float.
: !ft \ caddr u dest --
Set the contents of an 80 bit float.
: .string \ caddr len --
Output the given string in XML format.
: ?cstring \ caddr --
Output a Forth counted string.
: !cstring \ caddr len dest --
Set a Forth counted string.
: ?wstring \ caddr --
Output a word (16 bits) counted string
: !wstring \ caddr len dest --
Set a word (16 bits) counted string
: ?lstring \ caddr --
Output a cell (32 bits) counted string
: !lstring \ caddr len dest --
Set a cell (32 bits) counted string
: .xuw \ u w --
Display the unsigned number u as w digits.
: .xdate \ day month year --
Output a date in XML format "CCYY-MM-DD".
: .xtime \ secs mins hours --
Output a time in XML format "HH-MM-SS".
: .xdateTime \ secs mins hours day month year --
Output a date/time in XML format. No time zone is output.
: .tz \ mins --
Output a time zone indicator as an offset from UTC
in minutes.
: xdt-utc \ secs mins hours day month year --
Output a date/time in XML format. UTC is indicated.
: xdt-zone \ secs mins hours day month year zmins --
Output a date/time in XML format. The time zone is indicated
by a signed offset in minutes.
: .GenTag \ caddr len --
Display the text as a tag "<...>".
Standard entities are encoded.
: .GenTag+ \ attr alen name nlen --
Display attribute and tag name text as "<name attr>".
Standard entities are encoded.
: .ClosingTag \ caddr len --
Display the text as a closing tag "</...>".
Standard entities are encoded.
: .EmptyTag \ caddr len --
Display the text as an empty tag "<.../>".
Standard entities are encoded.
initSpads ActiveXML
Application configuration can be done in a number of ways, especially under Windows.
Registry |
A user nightmare to copy from one machine to another |
INI |
files Very slow for large configurations (before mpeparser.dll) |
binary |
Usually incompatible between versions |
database |
Big and often similar to binary |
Forth |
Already there, needs changes to interpreter. Independent of operating system. |
A solution to this problem is available in Lib/ConfigTools.fth. Before compiling the file, ensure that the file GenIO device from Lib/Genio/FILE.FTH has been compiled.
The Forth interpreter is already available, but we have to consider how to handle incompatibilities between configuration files and issue versions of applications. The two basic solutions are:
The abort on error solution is already available - it just
requires the caller of included
to provide some
additional clean up code.
: CfgIncluded \ caddr len --
-source-files \ don't add source file names
['] included catch
if 2drop endif \ clean stack on error
+source-files \ restore source action
;
In VFX Forth, INTERPRET
is used to process lines of
input. INTERPRET
is DEFER
red and the default
action is (INTERPRET)
. The maximum line size
(including CR/LF) is FILETIBSZ
, which is currently
512 bytes. If we restrict each configuration unit to one
line of source code, we can protect the system by ignoring
the line if an error occurs. We also have to introduce the
convention in configuration files that actions are performed
by the last word on the line (except for any parsing).
This action has to be installed and removed, leading to
the following code.
: CfgInterp \ --
\ Interprets a line, discarding it on error.
['] (interpret) catch
if postpone \ endif
;
: CfgIncluded \ caddr len --
\ Interprets a file, discarding lines with errors.
-source-files \ don't add source file names
behavior interpret >r
['] CfgInterp is interpret
['] included catch
if 2drop endif \ clean stack on error
r> is interpret
+source-files \ restore source action
;
: CfgInterp \ --
A protected version of (INTERPRET)
which discards any
line that causes an error.
: CfgIncluded \ caddr len --
A protected version of INCLUDED
which discards any
line that causes an error, and carries on through the
source file.
: [SaveConfig \ caddr len -- struct|0
Starts saving a configuration file. Creates a configuration
file and allocates required resources, returning a structure
on success or zero on error. On success, the
returned struct contains the sid
for the file at
the start of struct.
: SaveConfig] \ struct --
Ends saving a file device by closing the file, releasing
resources and restoring the previous output device.
: SaveConfig \ caddr len xt --
Save the configuration file, using xt to generate the
text using TYPE
and friends. The word defined by
xt must have no stack effect.
We chose to support five type of configuration data:
variable
s directly and value
s with addr
.All numeric output is done in hexadecimal to save space,
and to avoid problems with BASE
overrides. All words
which generate configuration information must be used
in colon definitions.
: \Emit \ char --
Output a printable character in its escaped form.
: \Type \ caddr len --
Output a printable string in its escaped form.
: .cfg$ \ caddr len --
Output a string in its escaped form, characters in the
escape table being converted to their escaped form. The
string is output as Forth source text, e.g.
s\" escaped text\n\n"
: .sint \ x --
Output x as a hex number with a leading '$' and a trailing
space, e.g.
$1234:ABCD
Single integers are saved by .SintVar
and
.SintVal
.
' (SintVar) SimpleCfg: .SIntVar \ "<name>" --
Saves a single integer as a string. <name>
must
be a Forth word that returns a valid address. Generates
$abcd <name> !
Use in the form:
.SIntVar MyVar
' (SintVal) SimpleCfg: .SIntVal \ "<name>" --
Saves a VALUE
called <name>
. Generates
$abcd to <name>
Use in the form:
.SIntVal MyVal
Double integers are saved by .DintVar
.
' (DintVar) SimpleCfg: .DIntVar \ "<name>" --
Saves a double integer as a string. <name>
must
be a Forth word that returns a valid address. Generates
$01234 $abcd <name> 2!
Use in the form:
.SIntVar MyVar
Counted strings are saved by .C$CFG
.
' (c$cfg) SimpleCfg: .C$var \ "<name>" --
Saves a string <name>
must
be a Forth word that returns a valid address. Generates
s\" <text>" <name> place
Use in the form:
.C$Var MyCstring
Zero terminated strings are saved by .Z$var
.
' (z$cfg) SimpleCfg: .Z$var \ "<name>" --
Saves a zero terminated string at <name>
which must
be a Forth word that returns a valid address. The output
consists of one or more lines of source code, following
lines being appended to the first.
s\" <text>" <name> zplace
s\" <more text>" <name> zAppend
...
Use in the form:
.Z$var MyZstring
Memory blocks are output by
.Mem <name> len
<Name>
must be a Forth word that returns a valid
address. Len
must be a constant or a number.
The output takes one of three forms, depending on len
.
bmem <name> num $ab $cd ...
wmem <name> num $abcd $1234 ...
lmem <name> num $1234:5678 $90ab:cdef ...
A block of memory is output by
.Mem <name> len
<Name>
must be a Forth word that returns a valid
address. Len
must be a constant or a number.
: BMEM \ "<name>" "len" --
Imports a memory block output in byte units by .Mem
.
: WMEM \ "<name>" "len" --
Imports a memory block output in word (2 byte) units by .Mem
.
: LMEM \ "<name>" "len" --
Imports a memory block output in cell (4 byte) units by .Mem
.