The MPE cross compiler has a built-in cross-assembler. This gives
you the ability to define new Forth words in assembler as well as
in Forth. You can also assemble code to anywhere in memory.
This section is not a treatise on ARM Cortex assembly language
programming. The essential document for Cortex-M is the ARMv7-M
Architecture Reference Manual, which is available from www.arm.com
as a PDF file document reference ARM DDI 0403. You may have to register
to download it. The ARM 32 bit instruction set is documented in
ARM DDI 0100. A copy is provided in the Docs\ARM folder
in PDF format. The ARMv6 instruction set used in the Raspberry Pi
is documented as an appendix to the ARMv7-A Architecture
Reference Manual.
The full Thumb-2 instruction set for Cortex-M3/M4 is supported. The instruction notation is very similar to that provided with the MPE ARM compiler. The Cortex assembler notation is very close to that referred to by ARM as UAL (Unified Assembly Language), which was designed to improve portability between ARM and Cortex at the assembler source code level.
This assembler can be switched between Cortex and ARM instruction sets in order to support the Cortex-A profile. Note that the Thumb-2 instruction set can be regarded as an encoding of the ARM instruction set with better code density and features to support system programming. That ARM have achieved this objective is indicated by the fact that with only minor extensions to the compiler (see "Intrinsics" in the code generator chapter), there is no assembly code in the Forth start-up files.
By default, the assembler and VFX code generator are set to use the legacy ARM instruction set with the TOS register set to R10. This is the configuration used by VFX Forth for ARM Linux.
Forth is compact and quick, so why write in assembler? An assembler definition is normally faster than a group of corresponding Forth words. For Cortex CPUs, hand coding can improve performance by keeping more data in registers in loops and by taking more advantage of conditional execution.
That having been said, MPE does not write Cortex or ARM code (even interrupt handlers) in assembler except in very rare cases.
Forth words can easily be defined in assembler. They increase the execution speed of your code and can sometimes make your code smaller.
Forth words written in assembler follow a similar form to a
word written in Forth. Instead of a colon you have CODE
.
Instead of semi-colon you have END-CODE
. For example:
CODE <name>
...
...
NEXT,
END-CODE
creates a word called <name>
. Any assembler code
between the CODE
and END-CODE
will be assembled
into the word. When executed, the macro NEXT,
will
stop the execution of the assembler and return to the calling
word.
The syntax used for the opcodes has been kept similar to the standard ARM syntax. See the list at the end of the chapter for a comparison of ARM versus Forth syntax.
As a company, ARM changes or extends the instruction sets in use at any time as new cores are designed. By normal Forth standards, this assembler is a huge piece of code. Forgive us for deviations from the expected instruction syntax, in places our assembler notation reflects ease of implementation.
Registers are defined as follows.
Rn |
The familiar integer registers R0..R15 |
CRn |
Coprocessor registers CR0..CR15 |
xPSR |
Processor status registers, e.g. CPSR and SPSR |
Sn |
Single-precision (32 bit) floating point or vector register S0..S31 |
Dn |
Double-precision (64 bit) floating point or vector register D0..D15/31 |
Qn |
Quadruple (128 bit) vector register, Q0..Q15 |
Note that the floating point and vector registers overlap. The mapping between the registers is as follows:
Where the instruction cannot distinguish between floating point and integer operation, use the
The Forth interpreter and compiler use some of the target processor's registers. These must be preserved if they are used in the assembler. They can be saved on the stack, in memory or in other registers and restored at the end of the word. The ARM registers that are used are shown in the table below.
Cortex register | Forth register | Notes |
---|---|---|
R15 or PC | IP | The program counter. Altering this register will cause the processor to jump to a new address. |
R14 or LINK | _ | The link register. When a subroutine is entered via the BL instruction, the return address is cached in R14. If a further BL is to be executed within the subroutine, remember to save the contents of R14, usually on the return stack. Note that in Thumb mode, bit 0 will be set to 1 to indicate that the CPU is in Thumb mode. |
R13 or SP | RSP | Forth return stack pointer, do not change this without good reason. This stack holds return addresses. Note that when entering a subroutine or word via the BL instruction the return address is cached in R14, the link register. R13 is used in many ARM systems as a stack pointer. |
R12 | PSP (1) | Forth data stack pointer, do not change this without
good reason. Use it for passing parameters between words.
When writing assembler code, use PSP rather than
R12 . Future versions of the compiler may use a
different register for the data stack pointer.
|
R11 | UP | Pointer to the base of the current User Area. |
R10 | TOS (1) | Currently the default register for TOS for ARM32, but this may change when interworking ARM and Cortex code. |
R9 | LP | Local variable frame pointer. |
R8 | -- | Currently unused, but we have plans for it. |
R7 | TOS (2) | Instead of holding the top item of the Forth data stack in main memory, it is held in a register. This allows many simple operations to execute faster, and it also reduces the amount of memory traffic. For hosted systems such as VFX Forth for Linux, TOS will be in R10 by default, but for code density in the Thumb-2 instruction set, R7 is a better choice. |
R6 | PSP (2) | For best code density with the Thumb-2 instruction set, R6 is a better choice than R10 as the PSP. |
R0..R6 | _ | Scratch |
ARM register | Forth register | Notes |
---|---|---|
R15 or PC | IP | The ARM program counter. Altering this register will cause the processor to jump to a new address. |
R14 or LINK | _ | The ARM link register. When a subroutine is entered via the BL instruction, the return address is cached in R14. If a further BL is to be executed within the subroutine, remember to save the contents of R14, usually on the return stack. |
R13 or SP | RSP | Forth return stack pointer, do not change this without good reason. This stack holds return addresses. Note that when entering a subroutine or word via the BL instruction the return address is cached in R14, the link register. R13 is used in many ARM systems as a stack pointer. |
R12 | PSP | Forth data stack pointer, do not change this without good reason. Use it for passing parameters between words. |
R11 | UP | Pointer to the base of the current User Area. |
R10 | TOS | Instead of holding the top item of the Forth data stack in main memory, it is held in a register. This allows many simple operations to execute faster, and it also reduces the amount of memory traffic. |
R9 | LP | Local variable frame pointer. |
R0..R8 | _ | Scratch |
A Forth word written in assembler is executed in the same way as a word written in Forth. It is executed in the same way as a normal word, by stating its name.
Assembler code can be assembled into memory and not in a Forth word. To do this you need to:
To turn on the assembler, use the word AsmCode
. To
switch back to Forth use the word End-Code
. Between
these two words, any assembler will be assembled. The
assembled code will be placed in the dictionary without a
header. The code can be executed by the use of labels. This
is often used to define low-level interrupts. See the
chapter on Interrupts for more details on writing low-level
interrupts.
The cross compiler allows you to define the run-time (DOES>
)
part of a defining word in assembler. To do this use
;CODE
in the form:
: <name>
CREATE
...
;CODE
...
END-CODE
An example is shown below:
: VARIABLE \ <spaces>name -- ; -- addr
CREATE \ Create header
0 , \ Initial value
;CODE \ Run-time action
\ Cortex version
str tos, [ psp, # -4 ] ! \ save TOS
ldr tos, [ link, # -1 ] \ get const val/addr from afer BL
pop { pc }
END-CODE
\ ARM version
stmfd psp ! { tos } \ Save TOS
ldr tos, [ link ], # 4 \ get pointer to data
ldr tos, [ tos ] \ get variable address
NEXT,
END-CODE
Three facilities are available to give you the advantages of structured programming, in assembler:
There are assembler equivalents to the Forth control structures. The available structures are:
AHEAD, ... THEN, or ENDIF,
cc IF, ... THEN, or ENDIF,
cc IF, ... ELSE, ... THEN, or ENDIF,
BEGIN, ... cc UNTIL,
BEGIN, ... cc WHILE, ... REPEAT
BEGIN, ... AGAIN,
where cc is one of the condition codes in the table below.
ARM | Forth | Condition | ARM | Forth | Condition |
---|---|---|---|---|---|
.CS | CS, | carry set | .NE | NE, | not equal or non-zero |
.CC | CC, | carry clear | .GE | GE, | greater than or equal |
.PL | PL, | plus - positive or zero | .LT | LT, | less than |
.MI | MI, | minus - negative | .GT | GT, | greater than |
.VS | VS, | overflow set | .LS | LS, | unsigned less than or equal (same) |
.VC | VC, | overflow clear | .HS | HS, | unsigned greater than or equal (same). Same as CS |
.LE | LE, | less than or equal | .LO | LO, | unsigned less than. Same as CC |
.EQ | EQ, | equal or zero | .HI | HI, | unsigned greater than |
.AL | * Always (default) |
The Thumb-2 instruction set does not support the conditional
execution facilities of the ARM instruction set. Instead,
it provides the IT
instruction.
In order to avoid performance penalties caused by taken
branches and associated cache flushes, the Thumb-2
instruction set provides the ITxxx <cond>
instruction.
This permits up to four instructions to be executed
depending on the condition flags at the start. The first
instruction is executed if <cond> is true. The next three
instructions are executed if <cond> is true and x=T or
if <cond> is false and x=E. The following example illustrates
the use of the IT instruction.
CODE WITHIN? \ n1 n2 n3 -- flag
\ Return TRUE if N1 is within the range N2..N3.
\ This word uses signed arithmetic.
ldmfd psp ! { r0, r1 }
mov .s r2, # 0
mov .s r3, # 0
cmp r1, r0
it .ge \ next instruction if condition met
mov r2, # 1
cmp r1, tos
it .le \ next instruction if condition met
mov r3, # 1
tst r2, r3
ite .ne
mvn tos, # 0 \ if condition met
mov tos, # 0 \ if condition not met
next,
END-CODE
The ARM is different from many processors in that many instructions
can be executed conditionally depending on the processor status
flags, by appending one of the mnemonics in the table above to the
instruction. An instruction without a condition suffix is assumed
to use .AL
. Note that most instructions (except the test and
compare instructions) do not set the status flags by default. This
has to be done with the .S
suffix:
ADD .S R0, R1, R2 \ Add, set condition codes
ADD .NE .S R0, R1, R2 \ if NE and set condition codes
CS, IF, \ do between IF, and ENDIF, if CS set
...
ENDIF,
It is often quicker to avoid short jumps in code such
as those typically generated by IF,
statements, by the use of
conditionally executed instructions. Skipping several instructions
is generally faster than using a branch instruction as this
involves flushing the processor pipeline. See the file CODEARM.FTH
for examples of conditional execution.
Labels can be used to mark a place in assembler code. That place can then be referenced in other areas of code.
Labels can be defined by using the command L: <name>
.
It is used in the form:
l: <name>
where <name>
is what you want to call the label.
A label is referenced by stating its name. For example,
B .EQ <name>
If you need to use labels within a code definition, you may use the local labels provided. These are used just as normal labels in the assembler, but some restrictions apply:
CODE
or ;CODE
.L$1:
To reference a local label, type its name. For example,
B L$1
assembles code for a branch to L$1:.
A macro is a word that lays down code 'in-line' within an assembler definition. Macros are used when there is a repetitive use of a series of opcodes.
The easiest way to create a macro is by using MACRO:
.
The macro below can be used as a divide step operation.
macro: Udiv63/31_step \ --
adc .s r1, r1, r1
adc .s r0, tos, r0, lsl # 1
sub .cc r0, r0, tos
;m
The assembler's prefix notation leads to some peculiarities when writing macros.
ADC
above, assemble the code for
the previous instruction, so you may not know the stack
conditions. Put literal data on the return stack in the macro,
and retrieve it as needed.R# ( n -- )
for general purpose registers,
CR# ( n -- )
for coprocessor registers, and SR# ( n -- )
for status registers (0=CPSR, 1=SPSR).A macro can also be defined using colon and semi-colon before
CROSS-COMPILE
is executed. It must be defined in the
cross compiler's ASM-ACCESS
vocabulary.
The place to create a macro is in the control file and it must
be defined before CROSS-COMPILE
. As an example, the
macro NEXT,
is shown below. NEXT,
is defined as
a macro, so each time it is used, its code is laid down.
This makes it quicker than calling a subroutine.
\ switch to cross compilers assembler vocab
FORTH ALSO C-C ALSO ASSEMBLER
ALSO ASM-ACCESS DEFINITIONS
\ define NEXT
: NEXT, \ -- ; lay in-line next code
bx lr
;
\ switch back to normal forth vocabulary
ONLY FORTH DEFINITIONS
A macro is used by stating its name. For example, in a CODE
definition, NEXT,
is a macro.
It is possible to disassemble compiled words using:
XDASM <name>
DIS <name>
This can be done during compilation by including an XDASM
statement in the control file, or interactively after compilation
by including the word INTERACTIVE
before FINIS
.
The instruction set of the processor is extended on various processor cores. The selection available in this assembler is for ARM32 with/without Thumb-1 and the Cortex-M0, M1, M3 and M4.
The number base in the Forth assembler can be indicated by
BINARY
, DECIMAL
, and HEX
. In addition,
numbers prefixed by the '$' '#' and '%' characters are
treated as special cases.
These characters affect the number base for that number only.
Note that the characters '$' and '%' follow Motorola usage.
Note also that the '#' symbol attached to a number is not the same
as the word #
word that indicates immediate addressing.
Symbol | Base | Example |
---|---|---|
$ | hex | $55AA |
# | decimal | #1234 |
% | binary | %1011001 |
Thumb-2 literals are encoded in a number of ways. The term
imm refers to an immediate value that is directly
encoded across a number of bit fields. The term const
refers to an immediate value that comes from a restricted
but wide range. See ARM DDI 0403 for the gory details.
Items in square brackets are optional, e.g. [.s]
indicates
that .s
to set the conditions flags is optional.
When one of the ITxxx
instruction applies, do not apply
.s
. The assembler will do what it has to.
The term <shift> means one of
.LSL # n logical left shift by n
.LSR # n logical right shift by n
.ASR # n arithmetic right shift by n
.ROR # n rotate right by n
.RRX 1 bit rotate right, carry in to new bit 31,
old bit 0 to carry out.
You can leave out <shift> and it will be encoded as LSL # 0. Note that when .S applies, the rules for the final value of the carry flag are a bit arcane.
You can force an instruction to use the 16 bit form by
using the .n
indicator. You can force the 32 bit form
using the .w
indicator. Without either of these, the
assembler will choose the shortest form. Using R0..R7 (the
low registers) and .S usually generates the shortest code.
Except in a few cases, using R8..R15 (the high registers)
will generate a 32 bit instruction.
Where a register appears twice in an instruction (usually Rd), that particular encoding (usually a 16 bit form) is only generated when the same register appears twice.
Use of the PC (R15) or SP (R13) registers may cause assembler errors as these register fields are frequently used to handle special instructions. Consult ARM DDI 0403.
This is the Cortex-M3 instruction set. Note that the Cortex-M0/M1 instruction set is a subset of this. In ARM terminology the Cortex-M0/M1 is defined in ARMv6-M.
In some cases using .s
leads to shoter code because a
16 bit Thumb-1 encoding is available. When coding for Cortex-M0
remember that the .s
is often required.
ADC [.s] Rd, Rn, # <const>
ADC [.s] Rd, Rn, Rm
ADC [.s] Rd, Rn, Rm <shift>
ADD .s Rd, Rn, # <imm3>
ADD .s Rd, Rd, # <imm8>
ADD [.s] Rd, Rn, # <const>
ADD [.s] Rd, Rn, # <imm12>
ADD .s Rd, Rn, Rm
ADD Rd, Rd, Rm
ADD [.s] Rd, Rn, Rm <shift>
ADD Rd, SP, # <imm8>
ADD SP, SP, # <imm7>
ADD [.s] Rd, SP, # <const>
ADD [.s] Rd, SP, # <imm12>
ADD Rd, SP, Rd
ADD SP, SP, Rm
ADD Rd, SP, Rm <shift>
ADR Rd, <label>
AND Rd, Rn, # <const>
AND Rd, Rd, Rm
AND [.s] Rd, Rn, Rm <shift>
ASR .s Rd, Rn, # <imm5>
ASR [.s] Rd, Rn, # <imm5>
ASR .s Rd, Rd, Rm
ASR [.s] Rd, Rn, Rm
B <cond> <label>
B <label>
BFC Rd, # <lsb> # <width>
BFI Rd, Rn, # <lsb> # <width>
BIC [.s] Rd, Rn, # <const>
BIC .s Rd, Rd, Rm
BIC [.s] Rd, Rn, Rm <shift>
BKPT # <imm8>
BL <label>
BLX Rm
BX Rm
CBNZ Rn, <label>
CBZ Rn, <label>
CDP <copro> <opc1> CRd, CRn, CRm, <opc2>
CDP2 <copro> <opc1> CRd, CRn, CRm, <opc2>
CLREX
CLZ Rd, Rm
CMN Rn, # <const>
CMN Rn, Rm
CMN Rn, Rm <shift>
CMP Rn, # <imm8>
CMP Rn, # <const>
CMP Rn, Rm
CMP Rn, Rm <shift>
CPS .ie [.i] [.f]
CPS .id [.i] [.f]
DBG # <opt4>
DMB # <opt4>
DSB # <opt4>
EOR [.s] Rd, Rn, # <const>
EOR .s Rd, Rd, Rm
EOR [.s] Rd, Rn, Rm <shift>
ISB # <opt4>
IT <cond>
IT <cond>
ITT <cond>
ITE <cond>
ITTT <cond>
ITET <cond>
ITTE <cond>
ITEE <cond>
ITTTT <cond>
ITETT <cond>
ITTET <cond>
ITEET <cond>
ITTTE <cond>
ITETE <cond>
ITTEE <cond>
ITEEE <cond>
LDC <copro> CRd, [ Rn, # +/-<imm8> ]
LDC <copro> CRd, [ Rn, # +/-<imm8> ] !
LDC <copro> CRd, [ Rn ], # +/-<imm8>
LDC <copro> CRd, [ Rn ], {} <option>
LDC <copro> CRd, <label>
LDCL <copro> CRd, [ Rn, # +/-<imm8> ]
LDCL <copro> CRd, [ Rn, # +/-<imm8> ] !
LDCL <copro> CRd, [ Rn ], # +/-<imm8>
LDCL <copro> CRd, [ Rn ], {} <option>
LDCL <copro> CRd, <label>
LDC2 <copro> CRd, [ Rn, # +/-<imm8> ]
LDC2 <copro> CRd, [ Rn, # +/-<imm8> ] !
LDC2 <copro> CRd, [ Rn ], # +/-<imm8>
LDC2 <copro> CRd, [ Rn ], {} <option>
LDC2 <copro> CRd, <label>
LDC2L <copro> CRd, [ Rn, # +/-<imm8> ]
LDC2L <copro> CRd, [ Rn, # +/-<imm8> ] !
LDC2L <copro> CRd, [ Rn ], # +/-<imm8>
LDC2L <copro> CRd, [ Rn ], {} <option>
LDC2L <copro> CRd, <label>
LDM Rn, [!] { ra, rb ... rn }
LDMIA Rn, [!] { ra, rb ... rn }
LDMFD Rn, [!] { ra, rb ... rn }
LDMDB Rn, [!] { ra, rb ... rn }
LDMEA Rn, [!] { ra, rb ... rn }
LDR Rt, [ Rn, # <imm5*4> ]
LDR Rt, [ SP, # <imm8*4> ]
LDR Rt, [ Rn, # <imm12> ]
LDR Rt, [ Rn, # -<imm8> ]
LDR Rt, [ Rn ], # +/-<imm8>
LDR Rt, [ Rn, # +/-<imm8> ] !
LDR Rt, <label>
LDR Rt, [ Rn ++ Rm ]
LDR Rt, [ Rn ++ Rm, LSL # <imm2> ]
LDR Rt, @= <imm32> \ loads from literal pool
\ Use FLUSHLITPOOL to lay pool.
LDRB Rt, [ Rn, # <imm5> ]
LDRB Rt, [ Rn, # <imm12> ]
LDRB Rt, [ Rn, # -<imm8> ]
LDRB Rt, [ Rn ], # +/-<imm8>
LDRB Rt, [ Rn, # +/-<imm8> ] !
LDRB Rt, <label>
LDRB Rt, [ Rn ++ Rm ]
LDRB Rt, [ Rn ++ Rm, LSL # <imm2> ]
LDRBT Rt, [ Rn, # +<imm8> ]
LDRD Rt, Rt2, [ Rn, # -<imm8> ]
LDRD Rt, Rt2, [ Rn ], # +/-<imm8>
LDRD Rt, Rt2, [ Rn, # +/-<imm8> ] !
LDRD Rt, Rt2, <label>
LDREX Rt, [ Rn, # +<imm8> ]
LDREXB Rt, [ Rn ]
LDREXH Rt, [ Rn ]
LDRH Rt, [ Rn, # <imm5*2> ]
LDRH Rt, [ Rn, # <imm12> ]
LDRH Rt, [ Rn, # -<imm8> ]
LDRH Rt, [ Rn ], # +/-<imm8>
LDRH Rt, [ Rn, # +/-<imm8> ] !
LDRH Rt, <label>
LDRH Rt, [ Rn ++ Rm ]
LDRH Rt, [ Rn ++ Rm, LSL # <imm2> ]
LDRHT Rt, [ Rn, # +<imm8> ]
LDRSB Rt, [ Rn, # <imm12> ]
LDRSB Rt, [ Rn, # -<imm8> ]
LDRSB Rt, [ Rn ], # +/-<imm8>
LDRSB Rt, [ Rn, # +/-<imm8> ] !
LDRSB Rt, <label>
LDRSB Rt, [ Rn ++ Rm ]
LDRSB Rt, [ Rn ++ Rm, LSL # <imm2> ]
LDRSBT Rt, [ Rn, # +<imm8> ]
LDRSH Rt, [ Rn, # <imm12> ]
LDRSH Rt, [ Rn, # -<imm8> ]
LDRSH Rt, [ Rn ], # +/-<imm8>
LDRSH Rt, [ Rn, # +/-<imm8> ] !
LDRSH Rt, <label>
LDRSH Rt, [ Rn ++ Rm ]
LDRSH Rt, [ Rn ++ Rm, LSL # <imm2> ]
LDRSHT Rt, [ Rn, # +<imm8> ]
LDRT Rt, [ Rn, # +<imm8> ]
LSL Rd, Rm, # <imm5>
LSL [.s] Rd, Rm, # <imm5>
LSL .s Rd, Rd, Rm
LSL [.s] Rd, Rn, Rm
LSR Rd, Rm, # <imm5>
LSR [.s] Rd, Rm, # <imm5>
LSR .s Rd, Rd, Rm
LSR [.s] Rd, Rn, Rm
MCR <copro> <opc1> Rt, CRn, CRm, <opc2>
MCR2 <copro> <opc1> Rt, CRn, CRm, <opc2>
MCRR <copro> <opc1> Rt, Rt2, CRm, <opc2>
MCRR2 <copro> <opc1> Rt, Rt2, CRm, <opc2>
MLA Rd, Rn, Rm, Ra
MLS Rd, Rn, Rm, Ra
MOV .s Rd, # <imm8>
MOV [.s] Rd, # <const>
MOV Rd, # <imm16>
MOV Rd, Rm
MOV [.s] Rd, Rm
MOV [.s] Rd, Rm <shiftop> # n
MOV [.s] Rd, Rm <shiftop> Rs
MOV [.s] Rd, Rm .RRX
MOVT Rd, # <imm16>
MRC <copro> <opc1> Rt, CRn, CRm, <opc2>
MRC2 <copro> <opc1> Rt, CRn, CRm, <opc2>
MRRC <copro> <opc1> Rt, Rt2, CRm, <opc2>
MRRC2 <copro> <opc1> Rt, Rt2, CRm, <opc2>
MRS Rd, <SYSm8>
MSR <SYSm8> Rn
<SYSm8> is a special register number in the range 0..255
MUL .s Rd, Rn, Rd
MUL [.s] Rd, Rn, Rm
MVN [.s] Rd, # <const>
MVN .s Rd, Rm
MVN [.s] Rd, Rm <shift>
NEG .s Rd, Rm
NOP [.n/.w]
ORN [.s] Rd, Rm, # <const>
ORN [.s] Rd, Rn, Rm <shift>
ORR [.s] Rd, Rm, # <const>
ORR .s Rd, Rd, Rm
ORR [.s] Rd, Rn, Rm <shift>
PLD [ Rn, # +<imm12> ]
PLD [ Rn, # -<imm8> ]
PLD <label>
PLD [ Rn ++ Rm, LSL # <imm2> ]
PLDW [ Rn, # +<imm12> ]
PLDW [ Rn, # -<imm8> ]
PLI [ Rn, # +<imm12> ]
PLI [ Rn, # -<imm8> ]
PLI <label>
PLI [ Rn ++ Rm, LSL # <imm2> ]
POP { ra, rb ... rn }
PUSH { ra, rb ... rn }
RBIT Rd, Rm
REV Rd, Rm
REV16 Rd, Rm
REVSH Rd, Rm
ROR [.s] Rd, Rm, # <imm5>
ROR .s Rd, Rd, Rm
ROR [.s] Rd, Rn, Rm
RRX Rd, Rm
RSB .s Rd, Rn, # 0
RSB [.s] Rd, Rn, # <const>
RSB [.s] Rd, Rn, Rm <shift>
SBC [.s] Rd, Rn, # <const>
SBC .s Rd, Rd, Rm
SBC [.s] Rd, Rn, Rm <shift>
SBFX Rd, Rn, # <lsb> # <width>
SDIV Rd, Rn, Rm
SEV
SMLAL Rdlo, Rdhi, Rn, Rm
SMULL Rdlo, Rdhi, Rn, Rm
SSAT Rd, # <imm5> Rn <shift>
STC <copro> CRd, [ Rn, # +/-<imm8> ]
STC <copro> CRd, [ Rn, # +/-<imm8> ] !
STC <copro> CRd, [ Rn ], # +/-<imm8>
STC <copro> CRd, [ Rn ], {} <option>
STCL <copro> CRd, [ Rn, # +/-<imm8> ]
STCL <copro> CRd, [ Rn, # +/-<imm8> ] !
STCL <copro> CRd, [ Rn ], # +/-<imm8>
STCL <copro> CRd, [ Rn ], {} <option>
STC2 <copro> CRd, [ Rn, # +/-<imm8> ]
STC2 <copro> CRd, [ Rn, # +/-<imm8> ] !
STC2 <copro> CRd, [ Rn ], # +/-<imm8>
STC2 <copro> CRd, [ Rn ], {} <option>
STC2L <copro> CRd, [ Rn, # +/-<imm8> ]
STC2L <copro> CRd, [ Rn, # +/-<imm8> ] !
STC2L <copro> CRd, [ Rn ], # +/-<imm8>
STC2L <copro> CRd, [ Rn ], {} <option>
STM Rn, [!] { ra, rb ... rn }
STMIA Rn, [!] { ra, rb ... rn }
STMEA Rn, [!] { ra, rb ... rn }
STMDB Rn, [!] { ra, rb ... rn }
STMFD Rn, [!] { ra, rb ... rn }
STR Rt, [ Rn, # <imm5*4> ]
STR Rt, [ SP, # <imm8*4> ]
STR Rt, [ Rn, # <imm12> ]
STR Rt, [ Rn, # -<imm8> ]
STR Rt, [ Rn ], # +/-<imm8>
STR Rt, [ Rn, # +/-<imm8> ] !
STR Rt, [ Rn ++ Rm ]
STR Rt, [ Rn ++ Rm, LSL # <imm2> ]
STRB Rt, [ Rn, # <imm5> ]
STRB Rt, [ Rn, # <imm12> ]
STRB Rt, [ Rn, # -<imm8> ]
STRB Rt, [ Rn ], # +/-<imm8>
STRB Rt, [ Rn, # +/-<imm8> ] !
STRB Rt, [ Rn ++ Rm ]
STRB Rt, [ Rn ++ Rm, LSL # <imm2> ]
STRBT Rt, [ Rn, # <imm8> ]
STRD Rt, Rt2, [ Rn, # -<imm8> ]
STRD Rt, Rt2, [ Rn ], # +/-<imm8>
STRD Rt, Rt2, [ Rn, # +/-<imm8> ] !
STREX Rd, Rt, [ Rn, # +<imm8> ]
STREXB Rd, Rt, [ Rn ]
STREXH Rd, Rt, [ Rn ]
STRH Rt, [ Rn, # <imm5*2> ]
STRH Rt, [ Rn, # <imm12> ]
STRH Rt, [ Rn, # -<imm8> ]
STRH Rt, [ Rn ], # +/-<imm8>
STRH Rt, [ Rn, # +/-<imm8> ] !
STRH Rt, [ Rn ++ Rm ]
STRH Rt, [ Rn ++ Rm, LSL # <imm2> ]
STRHT Rt, [ Rn, # +<imm8> ]
STRT Rt, [ Rn, # <imm8> ]
SUB .s Rd, Rn, # <imm3>
SUB .s Rd, Rd, # <imm8>
SUB [.s] Rd, Rn, # <const>
SUB [.s] Rd, Rn, # <imm12>
SUB .s Rd, Rn, Rm
SUB [.s] Rd, Rn, Rm <shift>
SUB SP, SP, # <imm7*4>
SUB [.s] Rd, SP, # <const>
SUB [.s] Rd, SP, # <imm12>
SUB Rd, SP, Rm <shift>
SVC # <imm8>
SXTB Rd, Rm
SXTB Rd, Rm, <rotation> ( 0/8/16/24 bits )
SXTH Rd, Rm
SXTH Rd, Rm, <rotation> ( 0/8/16/24 bits )
TBB [ Rn, Rm ]
TBH [ Rn, Rm ]
TEQ Rn, # <const>
TEQ Rn, Rm <shift>
TST Rn, # <const>
TST Rn, Rm
TST Rn, Rm <shift>
UBFX Rd, Rn, # <lsb> # <width>
UDIV Rd, Rn, Rm
UMLAL Rdlo, Rdhi, Rn, Rm
UMULL Rdlo, Rdhi, Rn, Rm
USAT Rd, # <imm5> Rn <shift>
UXTB Rd, Rm
UXTB Rd, Rm, <rotation> ( 0/8/16/24 bits )
UXTH Rd, Rm
UXTH Rd, Rm, <rotation> ( 0/8/16/24 bits )
WFE
WFI
YIELD
These instructions are integer DSP instructions added to Cortex-M4.
PKHBT Rd, Rn, Rm
PKHBT Rd, Rn, Rm .lsl # <imm5>
PKHTB Rd, Rn, Rm
PKHTB Rd, Rn, Rm .asr # <imm5>
QADD Rd, Rn, Rm
QADD16 Rd, Rn, Rm
QADD8 Rd, Rn, Rm
QASX Rd, Rn, Rm
QDADD Rd, Rn, Rm
QDSUB Rd, Rn, Rm
QSAX Rd, Rn, Rm
QSUB Rd, Rn, Rm
QSUB16 Rd, Rn, Rm
QSUB8 Rd, Rn, Rm
SADD16 Rd, Rn, Rm
SADD8 Rd, Rn, Rm
SASX Rd, Rn, Rm
SEL Rd, Rn, Rm
SHADD16 Rd, Rn, Rm
SHADD8 Rd, Rn, Rm
SHASX Rd, Rn, Rm
SHSAX Rd, Rn, Rm
SHSUB16 Rd, Rn, Rm
SHSUB8 Rd, Rn, Rm
SMLABB Rd, Rn, Rm, Ra
SMLABT Rd, Rn, Rm, Ra
SMLATB Rd, Rn, Rm, Ra
SMLATT Rd, Rn, Rm, Ra
SMLAD Rd, Rn, Rm, Ra
SMLADX Rd, Rn, Rm, Ra
SMLALBB Rd, Rn, Rm, Ra
SMLALBT Rd, Rn, Rm, Ra
SMLALTB Rd, Rn, Rm, Ra
SMLALTT Rd, Rn, Rm, Ra
SMLALD RdLo, RdHi, Rn, Rm
SMLALDX RdLo, RdHi, Rn, Rm
SMLAWB Rd, Rn, Rm, Ra
SMLAWT Rd, Rn, Rm, Ra
SMLSD Rd, Rn, Rm, Ra
SMLSDX Rd, Rn, Rm, Ra
SMLSLD RdLo, RdHi, Rn, Rm
SMLSLDX RdLo, RdHi, Rn, Rm
SMMLA Rd, Rn, Rm, Ra
SMMLAR Rd, Rn, Rm, Ra
SMMLS Rd, Rn, Rm, Ra
SMMLSR Rd, Rn, Rm, Ra
SMMUL Rd, Rn, Rm
SMMULR Rd, Rn, Rm
SMUAD Rd, Rn, Rm
SMUADX Rd, Rn, Rm
SMULBB Rd, Rn, Rm
SMULBT Rd, Rn, Rm
SMULTB Rd, Rn, Rm
SMULTT Rd, Rn, Rm
SMULWB Rd, Rn, Rm
SMULWT Rd, Rn, Rm
SMUSD Rd, Rn, Rm
SMUSDX Rd, Rn, Rm
SSAT16 Rd, # <imm4> Rn
SSAX Rd, Rn, Rm
SSUB16 Rd, Rn, Rm
SSUB8 Rd, Rn, Rm
SXTAB Rd, Rn, Rm
SXTAB Rd, Rn, Rm, <rotation> ( 0/8/16/24 bits )
SXTAB16 Rd, Rn, Rm
SXTAB16 Rd, Rn, Rm, <rotation> ( 0/8/16/24 bits )
SXTAH Rd, Rn, Rm
SXTAH Rd, Rn, Rm, <rotation> ( 0/8/16/24 bits )
SXTB16 Rd, Rm
SXTB16 Rd, Rm, <rotation> ( 0/8/16/24 bits )
UADD16 Rd, Rn, Rm
UADD8 Rd, Rn, Rm
UASX Rd, Rn, Rm
UHADD16 Rd, Rn, Rm
UHADD8 Rd, Rn, Rm
UHASX Rd, Rn, Rm
UHSAX Rd, Rn, Rm
UHSUB16 Rd, Rn, Rm
UHSUB8 Rd, Rn, Rm
UMAAL RdLo, RdHi, Rn, Rm
UQADD16 Rd, Rn, Rm
UQADD8 Rd, Rn, Rm
UQASX Rd, Rn, Rm
UQSAX Rd, Rn, Rm
UQSUB16 Rd, Rn, Rm
UQSUB8 Rd, Rn, Rm
USAD8 Rd, Rn, Rm
USADA8 Rd, Rn, Rm, Ra
USAT16 Rd, # <imm4> Rn
USAX Rd, Rn, Rm
USUB16 Rd, Rn, Rm
USUB8 Rd, Rn, Rm
UXTAB Rd, Rn, Rm
UXTAB Rd, Rn, Rm, <rotation> ( 0/8/16/24 bits )
UXTAB16 Rd, Rn, Rm
UXTAB16 Rd, Rn, Rm, <rotation> ( 0/8/16/24 bits )
UXTAH Rd, Rn, Rm
UXTAH Rd, Rn, Rm, <rotation> ( 0/8/16/24 bits )
UXTB16 Rd, Rm
UXTB16 Rd, Rm, <rotation> ( 0/8/16/24 bits )
These are single-precision floating point instructions added for the Cortex-M4F instruction set upwards and for the ARM32 instruction set. Double precision instructions are available for systems that support them.
VABS .f32 Sd, Sm
VABS .f64 Dd, Dm
VADD .f32 Sd, Sn, Sm
VADD .f64 Dd, Dn, Dm
VCMP .f32 Sd, Sm
VCMP .f32 Sd, # 0
VCMP .f64 Dd, Dm
VCMP .f64 Dd, # 0
VCMPE .f32 Sd, Sm
VCMPE .f32 Sd, # 0
VCMPE .f64 Dd, Dm
VCMPE .f64 Dd, # 0
VCVT .si32<f32 Sd, Sm
VCVT .ui32<f32 Sd, Sm
VCVT .f32<si32 Sd, Sm
VCVT .f32<ui32 Sd, Sm
VCVT .si32<f64 Sd, Dm
VCVT .ui32<f64 Sd, Dm
VCVT .f64<si32 Dd, Sm
VCVT .f64<ui32 Dd, Sm
VCVT .xs32<f32 Sd, Sd, # fbits
VCVT .xu32<f32 Sd, Sd, # fbits
VCVT .xs16<f32 Sd, Sd, # fbits
VCVT .xu16<f32 Sd, Sd, # fbits
VCVT .xs32<f64 Dd, Dd, # fbits
VCVT .xu32<f64 Dd, Dd, # fbits
VCVT .xs16<f64 Dd, Dd, # fbits
VCVT .xu16<f64 Dd, Dd, # fbits
VCVT .f32<xs32 Sd, Sd, # fbits
VCVT .f32<xu32 Sd, Sd, # fbits
VCVT .f32<xs16 Sd, Sd, # fbits
VCVT .f32<xu16 Sd, Sd, # fbits
VCVT .f64<xs32 Dd, Dd, # fbits
VCVT .f64<xu32 Dd, Dd, # fbits
VCVT .f64<xs16 Dd, Dd, # fbits
VCVT .f64<xu16 Dd, Dd, # fbits
VCVT .f32<f16 Qd, Dm \ ASIMD
VCVT .f16<f32 Dd, Qm \ ASIMD
VCVT .f64<f32 Dd, Sm
VCVT .f32<f64 Sd, Dm
VCVTR .si32<f32 Sd, Sm
VCVTR .ui32<f32 Sd, Sm
VCVTR .si32<f64 Sd, Dm
VCVTR .ui32<f64 Sd, Dm
VCVTB .f32<f16 Sd, Sm
VCVTB .f16<f32 Sd, Sm
VCVTT .f32<f16 Sd, Sm
VCVTT .f16<f32 Sd, Sm
VDIV .f64 Dd, Dn, Dm
VDIV .f32 Sd, Sn, Sm
VFMA .f64 Dd, Dn, Dm
VFMA .f32 Sd, Sn, Sm
VFMS .f64 Dd, Dn, Dm
VFMS .f32 Sd, Sn, Sm
VFNMA .f64 Dd, Dn, Dm
VFNMA .f32 Sd, Sn, Sm
VFNMS .f64 Dd, Dn, Dm
VFNMS .f32 Sd, Sn, Sm
VLDMIA Rn, { Dx-Dy }
VLDMIA Rn ! { Dx-Dy }
VLDMDB Rn ! { Sx-Sy }
VLDR Dd, [ Rn, # imm ]
VLDR Dd, label
VLDR Sd, [ Rn, # imm ]
VLDR Sd, label
VMLA .f64 Dd, Dn, Dm
VMLA .f32 Sd, Sn, Sm
VMLS .f64 Dd, Dn, Dm
VMLS .f32 Sd, Sn, Sm
VMOV .f64 Dd, # imm
VMOV .f32 Sd, # imm
VMOV .f64 Dd, Dm
VMOV .f32 Sd, Sm
VMOV .32 Dd [0/1] Rt
VMOV .32 Rt, Dn [0/1]
VMOV Sn, Rt
VMOV Rt, Sn
VMOV Sm, Sm1, Rt, Rt2 \ Sm1=Sm+1
VMOV Rt, Rt2, Sm, Sm1 \ Sm1=Sm+1
VMOV Dm, Rt, Rt2
VMOV Rt, Rt2, Dm
VMRS Rt, FPSCR
VMSR Rt, FPSCR
VMUL .f64 Dd, Dn, Dm
VMUL .f32 Sd, Sn, Sm
VNEG .f64 Dd, Dm
VNEG .f32 Sd, Sm
VNMLA .f64 Dd, Dn, Dm
VNMLA .f32 Sd, Sn, Sm
VNMLS .f64 Dd, Dn, Dm
VNMLS .f32 Sd, Sn, Sm
VNMUL .f64 Dd, Dn, Dm
VNMUL .f32 Sd, Sn, Sm
VPOP { Dx-Dy }
VPOP { Sx-Sy }
VPUSH { Dx-Dy }
VPUSH { Sx-Sy }
VSQRT .f64 Dd, Dm
VSQRT .f32 Sd, Sm
VSTMIA Rn, { Dx-Dy }
VSTMIA Rn ! { Dx-Dy }
VSTMDB Rn ! { Sx-Sy }
VSTMIA Rn, { Dx-Dy }
VSTR Dd, [ Rn, # imm ]
VSTR Dd, label
VSTR Sd, [ Rn, # imm ]
VSTR Sd, label
VSUB .f64 Dd, Dn, Dm
VSUB .f32 Sd, Sn, Sm
The ARM instruction set is mostly highly orthogonal. All data processing instructions work on the contents of registers and immediate constants only. Any data held in memory has to be loaded into a register, manipulated, then saved back to memory using one of the memory transfer instructions. This may appear to be restrictive, but due to the large number of general-purpose registers available for scratch storage, memory read/writes can be kept to a minimum. The assembler is of the prefix variety, with the instruction mnemonic preceding its parameters. Valid instructions are:
B | BL <<cond>> expression
BLX expression
BLX Rm
MOV | MVN <<cond>> <<S>> Rd op2
CMN | CMP | TEQ | TST <<cond>> <<P>> Rn op2
ADC | ADD | AND | BIC | EOR | ORR | RSB | RSC | SBC | SUB <<cond>>
<<S>> Rd Rn op2
MRS <<cond>> Rd psr
MSR <<cond>> psr Rm
MSR <<cond>> psrf Rm
MSR <<cond>> psrf #expression
MUL <<cond>> <<S>> Rd Rm Rn
MLA <<cond>> <<S>> Rd Rm Rs Rn
UMULL | SMULL | UMLAL | SMLAL <<cond>> <<S>> RdLo RdHi Rm Rs
LDR | LDRB | LDRH | STR | STRB | STRH <<cond>> Rd address <<!>>
LDMFD | LDMED | LDMFA | LDMEA | LDMIA | LDMIB | LDMDA | LDMDB |
STMFD | STMED | STMFA | STMEA | STMIA | STMIB | STMDA | STMDB
<<cond>> Rn <<!>> Rlist <<^>>
SWP | SWPB <<cond>> Rd Rm [ Rn ]
SWI <<cond>> expression
CDP <<cond>> CP# operation CRd CRn CRm info
LDC | LDCL | STC | STCL <<cond>> CP# CRd address
MCR | MRC <<cond>> CP# operation Rd CRn CRm info
Two pseudo instructions MVL
and ADR
are also
available. NOP
is supported as a synonym for:
MOV R0, R0
No switches are accepted for NOP
.
Parameter | Explanation |
---|---|
<<cond>> | Optional conditional execution code, i.e. .NE. See Control Structures. |
<<S>> | Optional suffix .S to set the processor status flags. |
<<P>> | Optional suffix .P to modify the PSR in 26-bit modes. |
<<!>> |
Optional ! enables write-back of the base register in loads and
stores.
|
<<^>> |
Optional ^ sets the status flags when loading the PC from memory
with the LDMxx instructions. Can also be used to force loading and
storing of user bank registers in non-user modes.
|
Rd, RdLo, RdHi, Rm, Rn, Rs | Equates to a valid register number. See Register naming. Some instructions, notably the multiplication set, place restrictions on the combinations of registers allowed. |
op2 | Is either one of the operands produced by the ARM's barrel shifter or an immediate constant. See Shift operations and Immediate constants. |
expression |
For the B and BL instructions this is a label name or an
expression evaluating to a branch address. The address is
converted to make it pc relative, allowing for the effects of
pipelining on the program counter. For the SWI instruction it is
the number of the SWI to be called.
|
#expression | Evaluates to a 32-bit value. This has to be a valid Immediate constant. |
address | A valid address specification. See Addressing modes. |
Rlist | A list of registers enclosed by braces. See Register lists. |
psr | Is the CPSR or SPSR register names. |
psrf | Is the CPSR_flag or SPSR_flag register names. Only the N, Z, C and V flags are written into the status register. Use the forms _C _X _S _F to indicate which sections are to be restored. |
CP# | The unique number of the required coprocessor. |
CRd, CRn, CRm | Equates to a valid coprocessor register number. See "Register Naming" below. |
operation | Is evaluated to a constant. |
info | Is evaluated to a constant. |
There are fifteen general-purpose registers available in user
mode, plus the program counter. These are named R0
through
R15
. R15
is the program counter. Coprocessor
registers are named
CR0
through CR15
. The Current Program Status
Register and Saved Program Status Register are named CPSR
and SPSR
respectively. If transferring just the status
flags then CPSR_flg
and SPSR_flg
can be used.
As of v6.2 the notation
CPSR_flag
is superceded by
CPSR _c _x _s _f
where the valid field definers are:
_C _X _S _F _CXSF _FSXC _ALL
Standard ARM names are also available. SP
refers to R13
(commonly used as a stack pointer), LINK
refers to R14
(the link register), and PC
refers to R15
(the
program counter).
Forth register names can be used in place of the standard register
names. These are TOS
, LP
, UP
, RSP
and PSP
. As mentioned earlier these can be assigned to
different ARM registers.
All register names can be used with or without a trailing comma. This makes for code that is more readable to the seasoned ARM programmer. Character case is not important.
Rather then specify the name of a register whose contents are to be used in an operation, it is possible with many instructions to specify a numeric value which is encoded with the opcode mnemonic at assembly time.
When the #
is encountered, the assembler recognises that the
following input is to be interpreted as a numeric value. The value
itself can be prefixed with the usual number base selectors such
as #
for decimal, $
for hexadecimal, %
for
binary and @
for octal:
ADD R2, R3, # $32 \ Add $32 to contents of R3
\ and place result in R2
Note that in the UK, there may be confusion with some printers
between the hash symbol #
and the pound symbol.
There are restrictions regarding the range of immediate constants
that can be used. As mentioned before each instruction and its
operands are encoded as a single 32-bit value on the ARM.
Obviously some of the 32 bits are going to be given over to the
instruction type, suffixes, and destination register etc. leaving
only 12-bits to represent the constant. 12 bits does not allow
many immediate constants to be used, so this is split into two
fields. One, 8-bits wide, specifies the constant while the other
4-bit wide field specifies a value to shift the constant by (this
is actually a rotate right by the shift value times two places).
This widens the range of immediate constants that can be used, but
has the restriction that not every number in the full 32-bit range
can be used. Note that the range of negative immediate constants
that can be represented is very limited as these appear to the ARM
to be very large numbers i.e. -1 = $FFFFFFFF, and the larger a
number is the harder it is to represent using the method described
above. Judicial use of such instructions as CMN
(compare
negative), MVN
(move inverted data - not negated!) and RSB
(reverse subtract) can get around this problem.
Cortex only: If the literal pool is enabled you can load a register with a 32-bit immediate value using the form:
LDR Rx, @= imm32
e.g.
LDR R4, @= $12345678
You must flush the literal pool yourself using *\fo{FLUSHLITPOOL ( -- )
after END-CODE
or ENDPROC
.
Most data processing instructions allow operand two (the second source operand) to be specified as a shifted register. Here the contents of the register can be shifted at run-time by either a fixed amount or by the contents of another register. This can be done with one of the ARM's shift instructions, e.g.
ADD R0, R2, R7 LSL # 4 \ R7 logically shifted left by
\ 4 places
BIC R2, R4, R7 ASR R6 \ R7 arithmetically shifted
\ right by the contents of R6
Note that the contents of the register being shifted are not changed by the shift. The shifted value is only used during the instruction to calculate the new value to be stored in the destination register.
Note also that a shift by zero bits causes no change in the carry flag!
Shift operations supported by the ARM are:
Instruction | Purpose |
---|---|
LSL # n or LSL Rn | Logical shift left |
ASL # n or ASL Rn | Arithmetic shift left (identical to LSL) |
LSR # n or LSR Rn | Logical shift right |
ASR # n or ASR Rn | Arithmetic shift right |
ROR # n or ROR Rn | Rotate right |
RRX | Rotate right with extend - (no shift value or register is needed as the shift is by one place only) |
Note that as with immediate constants, if the shift is by a fixed
amount it should be preceded by the #
symbol to inform the
assembler that it is not dealing with a register.
The ARM data processing instructions all work on the contents of
registers and immediate operands. To transfer data to and from
single registers and memory either the LDR
, STR
,
LDC
or STC
instructions and their variants have
to be used. Addresses can be specified in three ways.
Pre-indexed addressing allows an offset to be added to (or subtracted from) an address held in a base register to form the address from which data is to be transferred. The address has the following format:
[ Rn <<offset>> ]
Where Rn
is the base register name and the optional offset is
either:
The address expression must be terminated by a ]
. The
initial [
is not strictly necessary but leads to code
that is more readable for experienced ARM programmers.
A simple or shifted register offset needs to be prefixed with
++
or --
indicating whether the contents of the
register should be added to or subtracted from the base
register. Immediate constants do not use the 8/4-bit field
format but rather range from -4095 to 4095. Shifted registers
can only be shifted by a constant preceded by the #
symbol
and not by the contents of another register.
The address calculated by combining the base and offset registers
is often useful in subsequent loads and stores, especially when a
sequence of memory locations are to be accessed. Use the !
operator after the closing ]
to enable the write back feature of
the ARM. This will write the calculated address back into the base
register for subsequent instructions to use.
Instruction | Address |
---|---|
LDR Rd, [ Rn ] | Load from Rn. Treated as LDR Rd, [ Rn, # 0 ] |
LDR Rd, [ Rn, ++ Rm ] | Load from Rn plus Rm |
LDR Rd, [ Rn, -- Rm ] ! | Load from Rn minus Rm with write back |
LDR Rd, [ Rn, ++ Rm LSL # 5 ] ! | Load from Rn plus Rm shifted logically left five places with write back |
LDR Rd, [ Rn, # 20 ] | Load from Rn plus twenty |
LDR Rd, [ Rn, # -40 ] ! | Load from Rn minus forty with write back |
Post-indexed addresses have the following form:
[ Rn ], <<offset>>
Post-indexed addressing adds the offset to the base register Rn
after the data has been transferred from the address held in the
base register. This implies that write back always occurs so it is
not necessary to specify it. It can be used however to force
non-privileged mode for the transfer cycle (same as the T suffix
on some ARM assemblers).
The offset is specified in exactly the same way as for pre-indexed addressing. Examples of post-indexed addressing are:
Instruction | Address |
---|---|
LDR Rd, [ Rn ], ++ Rm | Load from Rn then add Rm to Rn |
LDR Rd, [ Rn ], -- Rm | Load from Rn then subtract Rm from Rn |
LDR Rd, [ Rn ], ++ Rm LSL # 5 | Load from Rn then add Rm, shifted logically left five places, to Rn |
LDR Rd, [ Rn ], -- Rm LSL # 5 | Load from Rn then subtract Rm, shifted logically left five places, from Rn |
LDR Rd, [ Rn ], # 20 | Load from Rn then add 20 to Rn |
LDR Rd, [ Rn ], # -40 | Load from Rn then subtract 40 from Rn |
The assembler also recognises addresses specified as either an absolute number or an assembler label, e.g.
LDR R2, # $600 \ Load from memory location $600
LDR R2, label \ Load from the address marked by label
Addresses specified using PC relative addressing are actually converted into pre-indexed addresses that load from the program counter (R15) plus or minus an immediate constant. This means that the address of the desired memory location has to lie within +/-4096 bytes of the address of the instruction referencing it. The assembler will take into account the effects of pipelining on the program counter when calculating the value of the offset.
The instructions LDRB
and STRB
plus LDRH
and
STRH
can be used to transfer bytes or half words between
memory and registers. Byte loads and stores only utilise the
bottom 8-bits of the destination register and half words only the
bottom 16-bits. The contents of the rest of the register are
ignored on a store, and zeroed on a load from memory. Unlike word
memory transfers, byte loads and stores do not have to be aligned,
but half word transfers should be aligned to a two-byte boundary.
Multiple registers can be loaded from and stored to memory using
the LDM
and STM
instructions. The format is:
LDMxx Rd, <<!>> { Ra, Rb, Rx-Ry, ... } <<^>>
STMxx Rd, <<!>> { Ra, PC, LINK, Re-Rf } <<^>>
{ R0 R1 R2 R6 R12 }
{ R0-R2, R6, R12 }
{ R6, R12, R0-R2 }
Each register can only be specified once.
The optional final ^
sets the status flags when
loading the PC from memory with the LDMxx
instruction.
It can also be used to force loading and storing of user bank
registers in non-user modes.
MVL and ADR
As indicated earlier, a common source of problems when programming
with ARM assembler is the restriction placed on the range of
immediate constants that can be used with the data processing
instructions. To get around this the pseudo instruction MVL
can be
used to move any signed/unsigned 32-bit number into a register.
MVL R2, # 127653
The MVL
pseudo instruction will attempt to use a single
MOV
or MVN
instruction if possible, but may
generate up to four ARM instructions or two Cortex instructions
to get the value into the register.
For Cortex, you can also use
MVL32 R2, # $12345678
to load a 32 bit value into a register. This is useful when
you are referencing the data address of a VALUE
or
a Forth word, e.g.
$12345678 value foo
Proc MyISR
...
mvl32 r0, # ' foo >body \ load data address
For Cortex, branch destination addresses that are loaded into the PC must have bit0 set to 1. To this, use:
MVL32+1 R2, # <value>
The value can be forward referenced.
ADR
is a pseudo instruction (macro) for ARM, but is a
real instruction for Thumb-2. The ARM ADR
pseudo
instruction performs is used to move a 32-bit address into
a register.
ADR label
Due to the possibility that a label might be forward referenced
and need 'fixing up' later on in the compilation, the ADR
pseudo instruction will always generate a MOV
and three
ORR
instructions.
The compiler allows you to add in-line assembler inside colon definitions, and to add high level phrases inside code definitions.
Inline assembler code can be compiled inside a colon definition
using [ASM
and ASM]
. Use these in the form:
: <name>
... [ASM <assembler-code> ASM] ...
;
High level code can be compiled inside a CODE
definition using
[FORTH
and FORTH]
. Use these in the form:
CODE <name>
... [FORTH <high-level> FORTH] ...
END-CODE
Note that the optimiser is not flushed by the switches into
assembler. This can (and should) be achieved by placing
[O/F]
before [ASM
and FORTH]
.
This glossary details the lwords provided within the cross-assembler to control the use of the assembler.
: ARM32 \ --
Select ARM7 mode for assembler
: ArmArch5 \ --
Select ARMv5 mode for assembler
: Thumb-1 \ --
Select full Thumb-1 mode for assembler and code generator.
: Thumb-2 \ --
Select full Thumb-2 mode for assembler and code generator.
: Cortex-M0 \ --
Select Cortex-M0 for assembler and code generator.
: Cortex-M1 \ --
Select Cortex-M1 for assembler and code generator.
: Cortex-M3 \ --
Select Cortex-M3 for assembler and code generator.
: Cortex-M4 \ --
Select Cortex-M4 (includes integer DSP) for the assembler
and code generator.
: Cortex-M4F \ --
Select Cortex-M4F (includes integer DSP and single-precision
VFP) for the assembler and code generator.
: Cortex-M7 \ --
Select Cortex-M7 (includes integer DSP, single and double
precision VFP) for the assembler and code generator.
: ARM32? \ -- flag
Return true if in 32 bit ARM mode.
: ArmArch5? \ -- flag
Return true if in 32 bit ARMv5 mode.
: Thumb1? \ -- flag
Return true if in Thumb-1 mode.
: Thumb2? \ -- flag
Return true if in Thumb-2 mode.
: Thumb? \ -- flag
Return true if in either Thumb mode.
: Cortex-M0? \ -- flag
Return true if the Cortex-M0 instruction set has been selected.
: Cortex-M1? \ -- flag
Return true if the Cortex-M1 instruction set has been selected.
: Cortex-M0/M1? \ -- flag
Return true if the Cortex M0 or M1 instruction sets have been
selected.
: Cortex-M3? \ -- flag
Return true if the Cortex-M3 instruction set has been selected.
: Cortex-M4? \ -- flag
Return true if the Cortex-M4 instruction set has been selected.
: Cortex-M4F? \ -- flag
Return true if the Cortex-M4 instruction set has been selected.
: Cortex-M7? \ -- flag
Return true if the Cortex-M7 instruction set has been selected.
: IDSP? \ -- flag
Return true if the Cortex integer DSP instructions are present.
: Not-M0/M1? \ -- flag
Return true if the Cortex M0 or M1 instruction sets have not
been selected.
: M0/M1? \ -- flag
Return true if Thumb-2 and Cortex M0 or M1 has been selected.
: .f32 \ --
Indicates that the data in the Sn/Dn/Qm registers is 32 bit.
: .f64 \ --
Indicates that the data in the Dn/Qm registers is 64 bit.
: .s8 \ --
Indicates that the data in the Sn/Dn/Qm registers is signed 8 bit.
: .s16 \ --
Indicates that the data in the Sn/Dn/Qm registers is signed 16 bit.
: .s32 \ --
Indicates that the data in the Sn/Dn/Qm registers is signed 32 bit.
: .s64 \ --
Indicates that the data in the Dn/Qm registers is signed 64 bit.
: .u8 \ --
Indicates that the data in the Sn/Dn/Qm registers is unsigned 8 bit.
: .u16 \ --
Indicates that the data in the Sn/Dn/Qm registers is unsigned 16 bit.
: .u32 \ --
Indicates that the data in the Sn/Dn/Qm registers is unsigned 32 bit.
: .u64 \ --
Indicates that the data in the Dn/Qm registers is unsigned 64 bit.
: .i8 \ --
Indicates that the data in the Sn/Dn/Qm registers is signed 8 bit.
: .i16 \ --
Indicates that the data in the Sn/Dn/Qm registers is signed 16 bit.
: .i32 \ --
Indicates that the data in the Sn/Dn/Qm registers is signed 32 bit.
: .i64 \ --
Indicates that the data in the Dn/Qm registers is signed 64 bit.
: .8 \ --
Indicates that the data in the Sn/Dn/Qm registers is 8 bit.
: .16 \ --
Indicates that the data in the Sn/Dn/Qm registers is 16 bit.
: .32 \ --
Indicates that the data in the Sn/Dn/Qm registers is 32 bit.
: .64 \ --
Indicates that the data in the Dn/Qm registers is 64 bit.
: [n] ( u -- ) <VFPindex> ! ;
Set an index for VMOV.
: [0] ( -- ) 0 [n] ;
Set index 0 for VMOV.
: [1] ( -- ) 1 [n] ;
Set index 1 for VMOV.
: [2] ( -- ) 2 [n] ;
Set index 2 for VMOV.
: [3] ( -- ) 3 [n] ;
Set index 3 for VMOV.
: [4] ( -- ) 4 [n] ;
Set index 4 for VMOV.
: [5] ( -- ) 5 [n] ;
Set index 5 for VMOV.
: [6] ( -- ) 6 [n] ;
Set index 6 for VMOV.
: [7] ( -- ) 7 [n] ;
Set index 7 for VMOV.
: dxb \ b -- ; lay byte
Lay an 8-bit byte into the instruction stream.
No alignment is performed. Use in the form:
dxb $55
: dxw \ w -- ; lay 16 bits
Lay a 16-bit word into the instruction stream.
No alignment is performed. Use in the form:
dxw $55AA
: dxl \ l -- ; lay 32 bit long
Lay a 32-bit dword into the instruction stream.
No alignment is performed. Use in the form:
dxl $11223344
: db \ b -- ; lay byte
Lay a single byte inline.
Obsolete, will be removed in a future release,
use DXB
instead.
: dw \ b -- ; lay 16 bits
Lay a 16 bit item inline. No alignment is performed.
Obsolete, will be removed in a future release,
use DXW
instead.
: dd \ l -- ; lay 32 bit long
Lay a 32 bit item inline. No alignment is performed.
Obsolete, will be removed in a future release,
use DXL
instead.
: dl \ l -- ; lay 32 bit long
Lay a 32 bit item inline. No alignment is performed.
Obsolete, will be removed in a future release,
use DXL
instead.
: align4 \ --
Forces the PC to a four byte boundary.
: $ \ -- chere
Returns the current value of the PC.
: ;CODE \ --
A defining word used in the form:
: <namex> CREATE ... ;CODE ... END-CODE
Stops compilation, and enables the assembler. This word is used
with CREATE
to produce defining words whose run-time portion is
written in code, in the same way that CREATE ... DOES>
is used to create high level defining words .
The data structure is defined between CREATE
and ;CODE
and the run-time action is defined between ;CODE
and
END-CODE
. The current value of the data stack pointer
is saved by ;CODE
for later use by END-CODE
for
error checking. When <namex>
executes the address of
the data area will be found on the processor stack, from which it
must be removed.
: ASMCODE \ --
Starts a section of assembler code and turns on the assembler, but
without generating a dictionary header. This action is
particularly useful for generating the start-up code. Examples of
this can be found in CODEARM.FTH.
ASMCODE ... END-CODE
: CODE \ --
A defining word used in the form:
CODE <name> ... END-CODE
Creates a dictionary entry for <name> to be defined by a following
sequence of assembly language words. Words thus defined are called
code definitions. CODE
stores the current data stack pointer for
later error checking by END-CODE
.
: END-CODE \ --
Terminates a code definition and checks the data stack pointer
against the value stored when ;CODE
or CODE
is
executed. The assembler is disabled. See: CODE
;CODE
.
: IS-ACTION-OF \ addr --
Used to tell the cross-compiler that the given address is to be
used as the run time action of the word whose name follows.
Usually found in code definitions, but can also be used for
high-level definitions. For example:
ASMCODE
HERE IS-ACTION-OF CONSTANT
...
END-CODE
ASSEMBLER
HERE IS-ACTION-OF <<high-level-definer>>
B DODOES
END-CODE
] ... EXIT [
: PROC \ -- ; PROC <label-name>
Starts a section of assembler code and turns on the assembler,
defining a label. This action is particularly useful for
generating interrupts or shared subroutines. Examples of
this can be found in Cortex/CodeCortex.fth.
: !call \ dest ^ins --
Patch a branch opcode at ^ins to branch to target
address dest. The opcode portion is not changed, so
that this word works with both B and BL. Note that this
word may be redefined in some target code for ARM7 and ARM9
devices.