ARM Cortex Cross assembler

The MPE cross compiler has a built-in cross-assembler. This gives you the ability to define new Forth words in assembler as well as in Forth. You can also assemble code to anywhere in memory. This section is not a treatise on ARM Cortex assembly language programming. The essential document for Cortex-M is the ARMv7-M Architecture Reference Manual, which is available from www.arm.com as a PDF file document reference ARM DDI 0403. You may have to register to download it. The ARM 32 bit instruction set is documented in ARM DDI 0100. A copy is provided in the Docs\ARM folder in PDF format. The ARMv6 instruction set used in the Raspberry Pi is documented as an appendix to the ARMv7-A Architecture Reference Manual.

The full Thumb-2 instruction set for Cortex-M3/M4 is supported. The instruction notation is very similar to that provided with the MPE ARM compiler. The Cortex assembler notation is very close to that referred to by ARM as UAL (Unified Assembly Language), which was designed to improve portability between ARM and Cortex at the assembler source code level.

This assembler can be switched between Cortex and ARM instruction sets in order to support the Cortex-A profile. Note that the Thumb-2 instruction set can be regarded as an encoding of the ARM instruction set with better code density and features to support system programming. That ARM have achieved this objective is indicated by the fact that with only minor extensions to the compiler (see "Intrinsics" in the code generator chapter), there is no assembly code in the Forth start-up files.

By default, the assembler and VFX code generator are set to use the legacy ARM instruction set with the TOS register set to R10. This is the configuration used by VFX Forth for ARM Linux.

Why write in assembler?

Forth is compact and quick, so why write in assembler? An assembler definition is normally faster than a group of corresponding Forth words. For Cortex CPUs, hand coding can improve performance by keeping more data in registers in loops and by taking more advantage of conditional execution.

That having been said, MPE does not write Cortex or ARM code (even interrupt handlers) in assembler except in very rare cases.

Creating Forth words in assembler

Forth words can easily be defined in assembler. They increase the execution speed of your code and can sometimes make your code smaller.

Defining assembler words

Forth words written in assembler follow a similar form to a word written in Forth. Instead of a colon you have CODE. Instead of semi-colon you have END-CODE. For example:


CODE <name>
  ...
  ...
  NEXT,
END-CODE

creates a word called <name>. Any assembler code between the CODE and END-CODE will be assembled into the word. When executed, the macro NEXT, will stop the execution of the assembler and return to the calling word.

Writing assembler words

The syntax used for the opcodes has been kept similar to the standard ARM syntax. See the list at the end of the chapter for a comparison of ARM versus Forth syntax.

As a company, ARM changes or extends the instruction sets in use at any time as new cores are designed. By normal Forth standards, this assembler is a huge piece of code. Forgive us for deviations from the expected instruction syntax, in places our assembler notation reflects ease of implementation.

Register names

Registers are defined as follows.

Rn

The familiar integer registers R0..R15

CRn

Coprocessor registers CR0..CR15

xPSR

Processor status registers, e.g. CPSR and SPSR

Sn

Single-precision (32 bit) floating point or vector register S0..S31

Dn

Double-precision (64 bit) floating point or vector register D0..D15/31

Qn

Quadruple (128 bit) vector register, Q0..Q15

Note that the floating point and vector registers overlap. The mapping between the registers is as follows:

Where the instruction cannot distinguish between floating point and integer operation, use the

Preserving the Forth registers

The Forth interpreter and compiler use some of the target processor's registers. These must be preserved if they are used in the assembler. They can be saved on the stack, in memory or in other registers and restored at the end of the word. The ARM registers that are used are shown in the table below.

Cortex register usage

Cortex register Forth register Notes
R15 or PC IP The program counter. Altering this register will cause the processor to jump to a new address.
R14 or LINK _ The link register. When a subroutine is entered via the BL instruction, the return address is cached in R14. If a further BL is to be executed within the subroutine, remember to save the contents of R14, usually on the return stack. Note that in Thumb mode, bit 0 will be set to 1 to indicate that the CPU is in Thumb mode.
R13 or SP RSP Forth return stack pointer, do not change this without good reason. This stack holds return addresses. Note that when entering a subroutine or word via the BL instruction the return address is cached in R14, the link register. R13 is used in many ARM systems as a stack pointer.
R12 PSP (1) Forth data stack pointer, do not change this without good reason. Use it for passing parameters between words. When writing assembler code, use PSP rather than R12. Future versions of the compiler may use a different register for the data stack pointer.
R11 UP Pointer to the base of the current User Area.
R10 TOS (1) Currently the default register for TOS for ARM32, but this may change when interworking ARM and Cortex code.
R9 LP Local variable frame pointer.
R8 -- Currently unused, but we have plans for it.
R7 TOS (2) Instead of holding the top item of the Forth data stack in main memory, it is held in a register. This allows many simple operations to execute faster, and it also reduces the amount of memory traffic. For hosted systems such as VFX Forth for Linux, TOS will be in R10 by default, but for code density in the Thumb-2 instruction set, R7 is a better choice.
R6 PSP (2) For best code density with the Thumb-2 instruction set, R6 is a better choice than R10 as the PSP.
R0..R6 _ Scratch

ARM register usage

ARM register Forth register Notes
R15 or PC IP The ARM program counter. Altering this register will cause the processor to jump to a new address.
R14 or LINK _ The ARM link register. When a subroutine is entered via the BL instruction, the return address is cached in R14. If a further BL is to be executed within the subroutine, remember to save the contents of R14, usually on the return stack.
R13 or SP RSP Forth return stack pointer, do not change this without good reason. This stack holds return addresses. Note that when entering a subroutine or word via the BL instruction the return address is cached in R14, the link register. R13 is used in many ARM systems as a stack pointer.
R12 PSP Forth data stack pointer, do not change this without good reason. Use it for passing parameters between words.
R11 UP Pointer to the base of the current User Area.
R10 TOS Instead of holding the top item of the Forth data stack in main memory, it is held in a register. This allows many simple operations to execute faster, and it also reduces the amount of memory traffic.
R9 LP Local variable frame pointer.
R0..R8 _ Scratch

Executing an assembler word

A Forth word written in assembler is executed in the same way as a word written in Forth. It is executed in the same way as a normal word, by stating its name.

Assembling into memory

Assembler code can be assembled into memory and not in a Forth word. To do this you need to:

To turn on the assembler, use the word AsmCode. To switch back to Forth use the word End-Code. Between these two words, any assembler will be assembled. The assembled code will be placed in the dictionary without a header. The code can be executed by the use of labels. This is often used to define low-level interrupts. See the chapter on Interrupts for more details on writing low-level interrupts.

Creating defining words in assembler

The cross compiler allows you to define the run-time (DOES>) part of a defining word in assembler. To do this use ;CODE in the form:


: <name>
  CREATE
    ...
  ;CODE
    ...
END-CODE

An example is shown below:


: VARIABLE    \ <spaces>name -- ; -- addr
  CREATE                     \ Create header
    0 ,                      \ Initial value
  ;CODE                      \ Run-time action
\ Cortex version
    str tos, [ psp, # -4 ] ! \ save TOS
    ldr tos, [ link, # -1 ]  \ get const val/addr from afer BL
    pop     { pc }
END-CODE
\ ARM version
    stmfd psp ! { tos }      \ Save TOS
    ldr tos, [ link ], # 4   \ get pointer to data
    ldr tos, [ tos ]         \ get variable address
    NEXT,
END-CODE

Structured programming

Three facilities are available to give you the advantages of structured programming, in assembler:

Control structures

There are assembler equivalents to the Forth control structures. The available structures are:


AHEAD, ... THEN, or ENDIF,
cc IF, ... THEN, or ENDIF,
cc IF, ... ELSE, ... THEN, or ENDIF,
BEGIN, ... cc UNTIL,
BEGIN, ... cc WHILE, ... REPEAT
BEGIN, ... AGAIN,

where cc is one of the condition codes in the table below.

ARM Forth Condition ARM Forth Condition
.CS CS, carry set .NE NE, not equal or non-zero
.CC CC, carry clear .GE GE, greater than or equal
.PL PL, plus - positive or zero .LT LT, less than
.MI MI, minus - negative .GT GT, greater than
.VS VS, overflow set .LS LS, unsigned less than or equal (same)
.VC VC, overflow clear .HS HS, unsigned greater than or equal (same). Same as CS
.LE LE, less than or equal .LO LO, unsigned less than. Same as CC
.EQ EQ, equal or zero .HI HI, unsigned greater than
.AL * Always (default)

Cortex IT instruction

The Thumb-2 instruction set does not support the conditional execution facilities of the ARM instruction set. Instead, it provides the IT instruction.

In order to avoid performance penalties caused by taken branches and associated cache flushes, the Thumb-2 instruction set provides the ITxxx <cond> instruction. This permits up to four instructions to be executed depending on the condition flags at the start. The first instruction is executed if <cond> is true. The next three instructions are executed if <cond> is true and x=T or if <cond> is false and x=E. The following example illustrates the use of the IT instruction.


CODE WITHIN?     \ n1 n2 n3 -- flag
\ Return TRUE if N1 is within the range N2..N3.
\ This word uses signed arithmetic.
  ldmfd   psp ! { r0, r1 }
  mov .s  r2, # 0
  mov .s  r3, # 0
  cmp     r1, r0
  it .ge  \ next instruction if condition met
    mov     r2, # 1
  cmp     r1, tos
  it .le  \ next instruction if condition met
    mov     r3, # 1
  tst     r2, r3
  ite .ne
    mvn     tos, # 0  \ if condition met
    mov     tos, # 0  \ if condition not met
  next,
END-CODE

ARM conditional instructions

The ARM is different from many processors in that many instructions can be executed conditionally depending on the processor status flags, by appending one of the mnemonics in the table above to the instruction. An instruction without a condition suffix is assumed to use .AL. Note that most instructions (except the test and compare instructions) do not set the status flags by default. This has to be done with the .S suffix:


ADD .S R0, R1, R2          \ Add, set condition codes
ADD .NE .S R0, R1, R2      \ if NE and set condition codes

CS, IF,                    \ do between IF, and ENDIF, if CS set
  ...
ENDIF,

It is often quicker to avoid short jumps in code such as those typically generated by IF, statements, by the use of conditionally executed instructions. Skipping several instructions is generally faster than using a branch instruction as this involves flushing the processor pipeline. See the file CODEARM.FTH for examples of conditional execution.

Labels

Labels can be used to mark a place in assembler code. That place can then be referenced in other areas of code.

Creating a label

Labels can be defined by using the command L: <name>. It is used in the form:

l: <name>

where <name> is what you want to call the label.

Referencing a label

A label is referenced by stating its name. For example,

  B .EQ <name>

Local labels

If you need to use labels within a code definition, you may use the local labels provided. These are used just as normal labels in the assembler, but some restrictions apply:

Creating a local label

L$1:

Referencing a local label

To reference a local label, type its name. For example,

B L$1

assembles code for a branch to L$1:.

Creating macros

A macro is a word that lays down code 'in-line' within an assembler definition. Macros are used when there is a repetitive use of a series of opcodes.

Defining a macro

The easiest way to create a macro is by using MACRO:. The macro below can be used as a divide step operation.


macro: Udiv63/31_step   \ --
  adc .s  r1, r1, r1
  adc .s  r0, tos, r0, lsl # 1
  sub .cc r0, r0, tos
;m

The assembler's prefix notation leads to some peculiarities when writing macros.

  1. Opcode words, e.g. ADC above, assemble the code for the previous instruction, so you may not know the stack conditions. Put literal data on the return stack in the macro, and retrieve it as needed.
  2. The assembler wordlist is first in the search order, so be careful that a normal Forth word name is not a name in the assembler.
  3. Register names are executable words. To pass registers by number use R# ( n -- ) for general purpose registers, CR# ( n -- ) for coprocessor registers, and SR# ( n -- ) for status registers (0=CPSR, 1=SPSR).

A macro can also be defined using colon and semi-colon before CROSS-COMPILE is executed. It must be defined in the cross compiler's ASM-ACCESS vocabulary. The place to create a macro is in the control file and it must be defined before CROSS-COMPILE. As an example, the macro NEXT, is shown below. NEXT, is defined as a macro, so each time it is used, its code is laid down. This makes it quicker than calling a subroutine.


\ switch to cross compilers assembler vocab
FORTH ALSO C-C ALSO ASSEMBLER
ALSO ASM-ACCESS DEFINITIONS
\ define NEXT
: NEXT,   \ -- ; lay in-line next code
  bx  lr
;
\ switch back to normal forth vocabulary
ONLY FORTH DEFINITIONS

Using a macro

A macro is used by stating its name. For example, in a CODE definition, NEXT, is a macro.

Debugging

It is possible to disassemble compiled words using:

  XDASM <name>
  DIS <name>

This can be done during compilation by including an XDASM statement in the control file, or interactively after compilation by including the word INTERACTIVE before FINIS.

CPU selection

The instruction set of the processor is extended on various processor cores. The selection available in this assembler is for ARM32 with/without Thumb-1 and the Cortex-M0, M1, M3 and M4.

Number bases

The number base in the Forth assembler can be indicated by BINARY, DECIMAL, and HEX. In addition, numbers prefixed by the '$' '#' and '%' characters are treated as special cases. These characters affect the number base for that number only. Note that the characters '$' and '%' follow Motorola usage. Note also that the '#' symbol attached to a number is not the same as the word # word that indicates immediate addressing.

Symbol Base Example
$ hex $55AA
# decimal #1234
% binary %1011001

Thumb-2 instruction set

Thumb-2 literals are encoded in a number of ways. The term imm refers to an immediate value that is directly encoded across a number of bit fields. The term const refers to an immediate value that comes from a restricted but wide range. See ARM DDI 0403 for the gory details. Items in square brackets are optional, e.g. [.s] indicates that .s to set the conditions flags is optional.

When one of the ITxxx instruction applies, do not apply .s. The assembler will do what it has to.

The term <shift> means one of


.LSL # n    logical left shift by n
.LSR # n    logical right shift by n
.ASR # n    arithmetic right shift by n
.ROR # n    rotate right by n
.RRX        1 bit rotate right, carry in to new bit 31,
            old bit 0 to carry out.

You can leave out <shift> and it will be encoded as LSL # 0. Note that when .S applies, the rules for the final value of the carry flag are a bit arcane.

You can force an instruction to use the 16 bit form by using the .n indicator. You can force the 32 bit form using the .w indicator. Without either of these, the assembler will choose the shortest form. Using R0..R7 (the low registers) and .S usually generates the shortest code. Except in a few cases, using R8..R15 (the high registers) will generate a 32 bit instruction.

Where a register appears twice in an instruction (usually Rd), that particular encoding (usually a 16 bit form) is only generated when the same register appears twice.

Use of the PC (R15) or SP (R13) registers may cause assembler errors as these register fields are frequently used to handle special instructions. Consult ARM DDI 0403.

Base instruction set

This is the Cortex-M3 instruction set. Note that the Cortex-M0/M1 instruction set is a subset of this. In ARM terminology the Cortex-M0/M1 is defined in ARMv6-M.

In some cases using .s leads to shoter code because a 16 bit Thumb-1 encoding is available. When coding for Cortex-M0 remember that the .s is often required.


ADC [.s]    Rd, Rn, # <const>
ADC [.s]    Rd, Rn, Rm
ADC [.s]    Rd, Rn, Rm <shift>

ADD .s      Rd, Rn, # <imm3>
ADD .s      Rd, Rd, # <imm8>
ADD [.s]    Rd, Rn, # <const>
ADD [.s]    Rd, Rn, # <imm12>
ADD .s      Rd, Rn, Rm
ADD         Rd, Rd, Rm
ADD [.s]    Rd, Rn, Rm <shift>
ADD         Rd, SP, # <imm8>
ADD         SP, SP, # <imm7>
ADD [.s]    Rd, SP, # <const>
ADD [.s]    Rd, SP, # <imm12>
ADD         Rd, SP, Rd
ADD         SP, SP, Rm
ADD         Rd, SP, Rm <shift>

ADR         Rd, <label>

AND         Rd, Rn, # <const>
AND         Rd, Rd, Rm
AND [.s]    Rd, Rn, Rm <shift>

ASR .s      Rd, Rn, # <imm5>
ASR [.s]    Rd, Rn, # <imm5>
ASR .s      Rd, Rd, Rm
ASR [.s]    Rd, Rn, Rm

B <cond>    <label>
B           <label>

BFC         Rd, # <lsb> # <width>
BFI         Rd, Rn, # <lsb> # <width>

BIC [.s]    Rd, Rn, # <const>
BIC .s      Rd, Rd, Rm
BIC [.s]    Rd, Rn, Rm <shift>

BKPT        # <imm8>

BL          <label>

BLX         Rm

BX          Rm

CBNZ        Rn, <label>
CBZ         Rn, <label>

CDP         <copro> <opc1> CRd, CRn, CRm, <opc2>
CDP2        <copro> <opc1> CRd, CRn, CRm, <opc2>

CLREX

CLZ         Rd, Rm

CMN         Rn, # <const>
CMN         Rn, Rm
CMN         Rn, Rm <shift>

CMP         Rn, # <imm8>
CMP         Rn, # <const>
CMP         Rn, Rm
CMP         Rn, Rm <shift>

CPS .ie     [.i] [.f]
CPS .id     [.i] [.f]

DBG         # <opt4>
DMB         # <opt4>
DSB         # <opt4>

EOR [.s]    Rd, Rn, # <const>
EOR .s      Rd, Rd, Rm
EOR [.s]    Rd, Rn, Rm <shift>

ISB         # <opt4>

IT <cond>
IT <cond>
ITT <cond>
ITE <cond>
ITTT <cond>
ITET <cond>
ITTE <cond>
ITEE <cond>
ITTTT <cond>
ITETT <cond>
ITTET <cond>
ITEET <cond>
ITTTE <cond>
ITETE <cond>
ITTEE <cond>
ITEEE <cond>

LDC         <copro> CRd, [ Rn, # +/-<imm8> ]
LDC         <copro> CRd, [ Rn, # +/-<imm8> ] !
LDC         <copro> CRd, [ Rn ], # +/-<imm8>
LDC         <copro> CRd, [ Rn ], {} <option>
LDC         <copro> CRd, <label>
LDCL        <copro> CRd, [ Rn, # +/-<imm8> ]
LDCL        <copro> CRd, [ Rn, # +/-<imm8> ] !
LDCL        <copro> CRd, [ Rn ], # +/-<imm8>
LDCL        <copro> CRd, [ Rn ], {} <option>
LDCL        <copro> CRd, <label>
LDC2        <copro> CRd, [ Rn, # +/-<imm8> ]
LDC2        <copro> CRd, [ Rn, # +/-<imm8> ] !
LDC2        <copro> CRd, [ Rn ], # +/-<imm8>
LDC2        <copro> CRd, [ Rn ], {} <option>
LDC2        <copro> CRd, <label>
LDC2L       <copro> CRd, [ Rn, # +/-<imm8> ]
LDC2L       <copro> CRd, [ Rn, # +/-<imm8> ] !
LDC2L       <copro> CRd, [ Rn ], # +/-<imm8>
LDC2L       <copro> CRd, [ Rn ], {} <option>
LDC2L       <copro> CRd, <label>

LDM         Rn, [!] { ra, rb ... rn }
LDMIA       Rn, [!] { ra, rb ... rn }
LDMFD       Rn, [!] { ra, rb ... rn }
LDMDB       Rn, [!] { ra, rb ... rn }
LDMEA       Rn, [!] { ra, rb ... rn }

LDR         Rt, [ Rn, # <imm5*4> ]
LDR         Rt, [ SP, # <imm8*4> ]
LDR         Rt, [ Rn, # <imm12> ]
LDR         Rt, [ Rn, # -<imm8> ]
LDR         Rt, [ Rn ], # +/-<imm8>
LDR         Rt, [ Rn, # +/-<imm8> ] !
LDR         Rt, <label>
LDR         Rt, [ Rn ++ Rm ]
LDR         Rt, [ Rn ++ Rm, LSL # <imm2> ]
LDR         Rt, @= <imm32>    \ loads from literal pool
                              \ Use FLUSHLITPOOL to lay pool.

LDRB        Rt, [ Rn, # <imm5> ]
LDRB        Rt, [ Rn, # <imm12> ]
LDRB        Rt, [ Rn, # -<imm8> ]
LDRB        Rt, [ Rn ], # +/-<imm8>
LDRB        Rt, [ Rn, # +/-<imm8> ] !
LDRB        Rt, <label>
LDRB        Rt, [ Rn ++ Rm ]
LDRB        Rt, [ Rn ++ Rm, LSL # <imm2> ]

LDRBT       Rt, [ Rn, # +<imm8> ]

LDRD        Rt, Rt2, [ Rn, # -<imm8> ]
LDRD        Rt, Rt2, [ Rn ], # +/-<imm8>
LDRD        Rt, Rt2, [ Rn, # +/-<imm8> ] !
LDRD        Rt, Rt2, <label>

LDREX       Rt, [ Rn, # +<imm8> ]
LDREXB      Rt, [ Rn ]
LDREXH      Rt, [ Rn ]

LDRH        Rt, [ Rn, # <imm5*2> ]
LDRH        Rt, [ Rn, # <imm12> ]
LDRH        Rt, [ Rn, # -<imm8> ]
LDRH        Rt, [ Rn ], # +/-<imm8>
LDRH        Rt, [ Rn, # +/-<imm8> ] !
LDRH        Rt, <label>
LDRH        Rt, [ Rn ++ Rm ]
LDRH        Rt, [ Rn ++ Rm, LSL # <imm2> ]

LDRHT       Rt, [ Rn, # +<imm8> ]

LDRSB       Rt, [ Rn, # <imm12> ]
LDRSB       Rt, [ Rn, # -<imm8> ]
LDRSB       Rt, [ Rn ], # +/-<imm8>
LDRSB       Rt, [ Rn, # +/-<imm8> ] !
LDRSB       Rt, <label>
LDRSB       Rt, [ Rn ++ Rm ]
LDRSB       Rt, [ Rn ++ Rm, LSL # <imm2> ]

LDRSBT      Rt, [ Rn, # +<imm8> ]

LDRSH       Rt, [ Rn, # <imm12> ]
LDRSH       Rt, [ Rn, # -<imm8> ]
LDRSH       Rt, [ Rn ], # +/-<imm8>
LDRSH       Rt, [ Rn, # +/-<imm8> ] !
LDRSH       Rt, <label>
LDRSH       Rt, [ Rn ++ Rm ]
LDRSH       Rt, [ Rn ++ Rm, LSL # <imm2> ]

LDRSHT      Rt, [ Rn, # +<imm8> ]

LDRT        Rt, [ Rn, # +<imm8> ]

LSL         Rd, Rm, # <imm5>
LSL [.s]    Rd, Rm, # <imm5>
LSL .s      Rd, Rd, Rm
LSL [.s]    Rd, Rn, Rm

LSR         Rd, Rm, # <imm5>
LSR [.s]    Rd, Rm, # <imm5>
LSR .s      Rd, Rd, Rm
LSR [.s]    Rd, Rn, Rm

MCR         <copro> <opc1> Rt, CRn, CRm, <opc2>
MCR2        <copro> <opc1> Rt, CRn, CRm, <opc2>
MCRR        <copro> <opc1> Rt, Rt2, CRm, <opc2>
MCRR2       <copro> <opc1> Rt, Rt2, CRm, <opc2>

MLA         Rd, Rn, Rm, Ra
MLS         Rd, Rn, Rm, Ra

MOV .s      Rd, # <imm8>
MOV [.s]    Rd, # <const>
MOV         Rd, # <imm16>
MOV         Rd, Rm
MOV [.s]    Rd, Rm
MOV [.s]    Rd, Rm <shiftop> # n
MOV [.s]    Rd, Rm <shiftop> Rs
MOV [.s]    Rd, Rm .RRX
MOVT        Rd, # <imm16>

MRC         <copro> <opc1> Rt, CRn, CRm, <opc2>
MRC2        <copro> <opc1> Rt, CRn, CRm, <opc2>
MRRC        <copro> <opc1> Rt, Rt2, CRm, <opc2>
MRRC2       <copro> <opc1> Rt, Rt2, CRm, <opc2>

MRS         Rd, <SYSm8>
MSR         <SYSm8> Rn
<SYSm8> is a special register number in the range 0..255

MUL .s      Rd, Rn, Rd
MUL [.s]    Rd, Rn, Rm

MVN [.s]    Rd, # <const>
MVN .s      Rd, Rm
MVN [.s]    Rd, Rm <shift>

NEG .s      Rd, Rm

NOP [.n/.w]

ORN [.s]    Rd, Rm, # <const>
ORN [.s]    Rd, Rn, Rm <shift>

ORR [.s]    Rd, Rm, # <const>
ORR .s      Rd, Rd, Rm
ORR [.s]    Rd, Rn, Rm <shift>

PLD         [ Rn, # +<imm12> ]
PLD         [ Rn, # -<imm8> ]
PLD         <label>
PLD         [ Rn ++ Rm, LSL # <imm2> ]
PLDW        [ Rn, # +<imm12> ]
PLDW        [ Rn, # -<imm8> ]
PLI         [ Rn, # +<imm12> ]
PLI         [ Rn, # -<imm8> ]
PLI         <label>
PLI         [ Rn ++ Rm, LSL # <imm2> ]

POP         { ra, rb ... rn }
PUSH        { ra, rb ... rn }

RBIT        Rd, Rm
REV         Rd, Rm
REV16       Rd, Rm
REVSH       Rd, Rm

ROR [.s]    Rd, Rm, # <imm5>
ROR .s      Rd, Rd, Rm
ROR [.s]    Rd, Rn, Rm

RRX         Rd, Rm

RSB .s      Rd, Rn, # 0
RSB [.s]    Rd, Rn, # <const>
RSB [.s]    Rd, Rn, Rm <shift>

SBC [.s]    Rd, Rn, # <const>
SBC .s      Rd, Rd, Rm
SBC [.s]    Rd, Rn, Rm <shift>

SBFX        Rd, Rn, # <lsb> # <width>

SDIV        Rd, Rn, Rm

SEV

SMLAL       Rdlo, Rdhi, Rn, Rm
SMULL       Rdlo, Rdhi, Rn, Rm

SSAT        Rd, # <imm5> Rn <shift>

STC         <copro> CRd, [ Rn, # +/-<imm8> ]
STC         <copro> CRd, [ Rn, # +/-<imm8> ] !
STC         <copro> CRd, [ Rn ], # +/-<imm8>
STC         <copro> CRd, [ Rn ], {} <option>
STCL        <copro> CRd, [ Rn, # +/-<imm8> ]
STCL        <copro> CRd, [ Rn, # +/-<imm8> ] !
STCL        <copro> CRd, [ Rn ], # +/-<imm8>
STCL        <copro> CRd, [ Rn ], {} <option>
STC2        <copro> CRd, [ Rn, # +/-<imm8> ]
STC2        <copro> CRd, [ Rn, # +/-<imm8> ] !
STC2        <copro> CRd, [ Rn ], # +/-<imm8>
STC2        <copro> CRd, [ Rn ], {} <option>
STC2L       <copro> CRd, [ Rn, # +/-<imm8> ]
STC2L       <copro> CRd, [ Rn, # +/-<imm8> ] !
STC2L       <copro> CRd, [ Rn ], # +/-<imm8>
STC2L       <copro> CRd, [ Rn ], {} <option>

STM         Rn, [!] { ra, rb ... rn }
STMIA       Rn, [!] { ra, rb ... rn }
STMEA       Rn, [!] { ra, rb ... rn }
STMDB       Rn, [!] { ra, rb ... rn }
STMFD       Rn, [!] { ra, rb ... rn }

STR         Rt, [ Rn, # <imm5*4> ]
STR         Rt, [ SP, # <imm8*4> ]
STR         Rt, [ Rn, # <imm12> ]
STR         Rt, [ Rn, # -<imm8> ]
STR         Rt, [ Rn ], # +/-<imm8>
STR         Rt, [ Rn, # +/-<imm8> ] !
STR         Rt, [ Rn ++ Rm ]
STR         Rt, [ Rn ++ Rm, LSL # <imm2> ]

STRB        Rt, [ Rn, # <imm5> ]
STRB        Rt, [ Rn, # <imm12> ]
STRB        Rt, [ Rn, # -<imm8> ]
STRB        Rt, [ Rn ], # +/-<imm8>
STRB        Rt, [ Rn, # +/-<imm8> ] !
STRB        Rt, [ Rn ++ Rm ]
STRB        Rt, [ Rn ++ Rm, LSL # <imm2> ]

STRBT       Rt, [ Rn, # <imm8> ]

STRD        Rt, Rt2, [ Rn, # -<imm8> ]
STRD        Rt, Rt2, [ Rn ], # +/-<imm8>
STRD        Rt, Rt2, [ Rn, # +/-<imm8> ] !

STREX       Rd, Rt, [ Rn, # +<imm8> ]
STREXB      Rd, Rt, [ Rn ]
STREXH      Rd, Rt, [ Rn ]

STRH        Rt, [ Rn, # <imm5*2> ]
STRH        Rt, [ Rn, # <imm12> ]
STRH        Rt, [ Rn, # -<imm8> ]
STRH        Rt, [ Rn ], # +/-<imm8>
STRH        Rt, [ Rn, # +/-<imm8> ] !
STRH        Rt, [ Rn ++ Rm ]
STRH        Rt, [ Rn ++ Rm, LSL # <imm2> ]

STRHT       Rt, [ Rn, # +<imm8> ]

STRT        Rt, [ Rn, # <imm8> ]

SUB .s      Rd, Rn, # <imm3>
SUB .s      Rd, Rd, # <imm8>
SUB [.s]    Rd, Rn, # <const>
SUB [.s]    Rd, Rn, # <imm12>
SUB .s      Rd, Rn, Rm
SUB [.s]    Rd, Rn, Rm <shift>
SUB         SP, SP, # <imm7*4>
SUB [.s]    Rd, SP, # <const>
SUB [.s]    Rd, SP, # <imm12>
SUB         Rd, SP, Rm <shift>

SVC         # <imm8>

SXTB        Rd, Rm
SXTB        Rd, Rm, <rotation>  ( 0/8/16/24 bits )
SXTH        Rd, Rm
SXTH        Rd, Rm, <rotation>  ( 0/8/16/24 bits )

TBB         [ Rn, Rm ]
TBH         [ Rn, Rm ]

TEQ         Rn, # <const>
TEQ         Rn, Rm <shift>

TST         Rn, # <const>
TST         Rn, Rm
TST         Rn, Rm <shift>

UBFX        Rd, Rn, # <lsb> # <width>

UDIV        Rd, Rn, Rm

UMLAL       Rdlo, Rdhi, Rn, Rm
UMULL       Rdlo, Rdhi, Rn, Rm

USAT        Rd, # <imm5> Rn <shift>

UXTB        Rd, Rm
UXTB        Rd, Rm, <rotation>  ( 0/8/16/24 bits )
UXTH        Rd, Rm
UXTH        Rd, Rm, <rotation>  ( 0/8/16/24 bits )

WFE
WFI
YIELD

Integer DSP

These instructions are integer DSP instructions added to Cortex-M4.


PKHBT       Rd, Rn, Rm
PKHBT       Rd, Rn, Rm .lsl # <imm5>
PKHTB       Rd, Rn, Rm
PKHTB       Rd, Rn, Rm .asr # <imm5>

QADD        Rd, Rn, Rm
QADD16      Rd, Rn, Rm
QADD8       Rd, Rn, Rm
QASX        Rd, Rn, Rm
QDADD       Rd, Rn, Rm
QDSUB       Rd, Rn, Rm
QSAX        Rd, Rn, Rm
QSUB        Rd, Rn, Rm
QSUB16      Rd, Rn, Rm
QSUB8       Rd, Rn, Rm

SADD16      Rd, Rn, Rm
SADD8       Rd, Rn, Rm
SASX        Rd, Rn, Rm
SEL         Rd, Rn, Rm
SHADD16     Rd, Rn, Rm
SHADD8      Rd, Rn, Rm
SHASX       Rd, Rn, Rm
SHSAX       Rd, Rn, Rm
SHSUB16     Rd, Rn, Rm
SHSUB8      Rd, Rn, Rm

SMLABB      Rd, Rn, Rm, Ra
SMLABT      Rd, Rn, Rm, Ra
SMLATB      Rd, Rn, Rm, Ra
SMLATT      Rd, Rn, Rm, Ra
SMLAD       Rd, Rn, Rm, Ra
SMLADX      Rd, Rn, Rm, Ra
SMLALBB     Rd, Rn, Rm, Ra
SMLALBT     Rd, Rn, Rm, Ra
SMLALTB     Rd, Rn, Rm, Ra
SMLALTT     Rd, Rn, Rm, Ra
SMLALD      RdLo, RdHi, Rn, Rm
SMLALDX     RdLo, RdHi, Rn, Rm
SMLAWB      Rd, Rn, Rm, Ra
SMLAWT      Rd, Rn, Rm, Ra
SMLSD       Rd, Rn, Rm, Ra
SMLSDX      Rd, Rn, Rm, Ra
SMLSLD      RdLo, RdHi, Rn, Rm
SMLSLDX     RdLo, RdHi, Rn, Rm

SMMLA       Rd, Rn, Rm, Ra
SMMLAR      Rd, Rn, Rm, Ra
SMMLS       Rd, Rn, Rm, Ra
SMMLSR      Rd, Rn, Rm, Ra
SMMUL       Rd, Rn, Rm
SMMULR      Rd, Rn, Rm
SMUAD       Rd, Rn, Rm
SMUADX      Rd, Rn, Rm
SMULBB      Rd, Rn, Rm
SMULBT      Rd, Rn, Rm
SMULTB      Rd, Rn, Rm
SMULTT      Rd, Rn, Rm
SMULWB      Rd, Rn, Rm
SMULWT      Rd, Rn, Rm
SMUSD       Rd, Rn, Rm
SMUSDX      Rd, Rn, Rm

SSAT16      Rd, # <imm4> Rn
SSAX        Rd, Rn, Rm
SSUB16      Rd, Rn, Rm
SSUB8       Rd, Rn, Rm
SXTAB       Rd, Rn, Rm
SXTAB       Rd, Rn, Rm, <rotation>  ( 0/8/16/24 bits )
SXTAB16     Rd, Rn, Rm
SXTAB16     Rd, Rn, Rm, <rotation>  ( 0/8/16/24 bits )
SXTAH       Rd, Rn, Rm
SXTAH       Rd, Rn, Rm, <rotation>  ( 0/8/16/24 bits )
SXTB16      Rd, Rm
SXTB16      Rd, Rm, <rotation>  ( 0/8/16/24 bits )

UADD16      Rd, Rn, Rm
UADD8       Rd, Rn, Rm
UASX        Rd, Rn, Rm
UHADD16     Rd, Rn, Rm
UHADD8      Rd, Rn, Rm
UHASX       Rd, Rn, Rm
UHSAX       Rd, Rn, Rm
UHSUB16     Rd, Rn, Rm
UHSUB8      Rd, Rn, Rm
UMAAL       RdLo, RdHi, Rn, Rm

UQADD16     Rd, Rn, Rm
UQADD8      Rd, Rn, Rm
UQASX       Rd, Rn, Rm
UQSAX       Rd, Rn, Rm
UQSUB16     Rd, Rn, Rm
UQSUB8      Rd, Rn, Rm

USAD8       Rd, Rn, Rm
USADA8      Rd, Rn, Rm, Ra
USAT16      Rd, # <imm4> Rn
USAX        Rd, Rn, Rm
USUB16      Rd, Rn, Rm
USUB8       Rd, Rn, Rm

UXTAB       Rd, Rn, Rm
UXTAB       Rd, Rn, Rm, <rotation>  ( 0/8/16/24 bits )
UXTAB16     Rd, Rn, Rm
UXTAB16     Rd, Rn, Rm, <rotation>  ( 0/8/16/24 bits )
UXTAH       Rd, Rn, Rm
UXTAH       Rd, Rn, Rm, <rotation>  ( 0/8/16/24 bits )
UXTB16      Rd, Rm
UXTB16      Rd, Rm, <rotation>  ( 0/8/16/24 bits )

Floating point for Cortex-Mx and ARM32

These are single-precision floating point instructions added for the Cortex-M4F instruction set upwards and for the ARM32 instruction set. Double precision instructions are available for systems that support them.


VABS .f32   Sd, Sm
VABS .f64   Dd, Dm
VADD .f32   Sd, Sn, Sm
VADD .f64   Dd, Dn, Dm
VCMP .f32   Sd, Sm
VCMP .f32   Sd, # 0
VCMP .f64   Dd, Dm
VCMP .f64   Dd, # 0
VCMPE .f32  Sd, Sm
VCMPE .f32  Sd, # 0
VCMPE .f64  Dd, Dm
VCMPE .f64  Dd, # 0

VCVT .si32<f32     Sd, Sm
VCVT .ui32<f32     Sd, Sm
VCVT .f32<si32     Sd, Sm
VCVT .f32<ui32     Sd, Sm
VCVT .si32<f64     Sd, Dm
VCVT .ui32<f64     Sd, Dm
VCVT .f64<si32     Dd, Sm
VCVT .f64<ui32     Dd, Sm
VCVT .xs32<f32     Sd, Sd, # fbits
VCVT .xu32<f32     Sd, Sd, # fbits
VCVT .xs16<f32     Sd, Sd, # fbits
VCVT .xu16<f32     Sd, Sd, # fbits
VCVT .xs32<f64     Dd, Dd, # fbits
VCVT .xu32<f64     Dd, Dd, # fbits
VCVT .xs16<f64     Dd, Dd, # fbits
VCVT .xu16<f64     Dd, Dd, # fbits
VCVT .f32<xs32     Sd, Sd, # fbits
VCVT .f32<xu32     Sd, Sd, # fbits
VCVT .f32<xs16     Sd, Sd, # fbits
VCVT .f32<xu16     Sd, Sd, # fbits
VCVT .f64<xs32     Dd, Dd, # fbits
VCVT .f64<xu32     Dd, Dd, # fbits
VCVT .f64<xs16     Dd, Dd, # fbits
VCVT .f64<xu16     Dd, Dd, # fbits
VCVT .f32<f16      Qd, Dm             \ ASIMD
VCVT .f16<f32      Dd, Qm             \ ASIMD
VCVT .f64<f32      Dd, Sm
VCVT .f32<f64      Sd, Dm

VCVTR .si32<f32    Sd, Sm
VCVTR .ui32<f32    Sd, Sm
VCVTR .si32<f64    Sd, Dm
VCVTR .ui32<f64    Sd, Dm
VCVTB .f32<f16     Sd, Sm
VCVTB .f16<f32     Sd, Sm
VCVTT .f32<f16     Sd, Sm
VCVTT .f16<f32     Sd, Sm

VDIV .f64          Dd, Dn, Dm
VDIV .f32          Sd, Sn, Sm
VFMA .f64          Dd, Dn, Dm
VFMA .f32          Sd, Sn, Sm
VFMS .f64          Dd, Dn, Dm
VFMS .f32          Sd, Sn, Sm
VFNMA .f64         Dd, Dn, Dm
VFNMA .f32         Sd, Sn, Sm
VFNMS .f64         Dd, Dn, Dm
VFNMS .f32         Sd, Sn, Sm

VLDMIA             Rn, { Dx-Dy }
VLDMIA             Rn ! { Dx-Dy }
VLDMDB             Rn ! { Sx-Sy }
VLDR               Dd, [ Rn, # imm ]
VLDR               Dd, label
VLDR               Sd, [ Rn, # imm ]
VLDR               Sd, label

VMLA .f64          Dd, Dn, Dm
VMLA .f32          Sd, Sn, Sm
VMLS .f64          Dd, Dn, Dm
VMLS .f32          Sd, Sn, Sm
VMOV .f64          Dd, # imm
VMOV .f32          Sd, # imm
VMOV .f64          Dd, Dm
VMOV .f32          Sd, Sm
VMOV .32           Dd [0/1] Rt
VMOV .32           Rt, Dn [0/1]
VMOV               Sn, Rt
VMOV               Rt, Sn
VMOV               Sm, Sm1, Rt, Rt2       \ Sm1=Sm+1
VMOV               Rt, Rt2, Sm, Sm1       \ Sm1=Sm+1
VMOV               Dm, Rt, Rt2
VMOV               Rt, Rt2, Dm
VMRS               Rt, FPSCR
VMSR               Rt, FPSCR
VMUL .f64          Dd, Dn, Dm
VMUL .f32          Sd, Sn, Sm

VNEG .f64          Dd, Dm
VNEG .f32          Sd, Sm
VNMLA .f64         Dd, Dn, Dm
VNMLA .f32         Sd, Sn, Sm
VNMLS .f64         Dd, Dn, Dm
VNMLS .f32         Sd, Sn, Sm
VNMUL .f64         Dd, Dn, Dm
VNMUL .f32         Sd, Sn, Sm

VPOP               { Dx-Dy }
VPOP               { Sx-Sy }
VPUSH              { Dx-Dy }
VPUSH              { Sx-Sy }

VSQRT .f64         Dd, Dm
VSQRT .f32         Sd, Sm
VSTMIA             Rn, { Dx-Dy }
VSTMIA             Rn ! { Dx-Dy }
VSTMDB             Rn ! { Sx-Sy }
VSTMIA             Rn, { Dx-Dy }
VSTR               Dd, [ Rn, # imm ]
VSTR               Dd, label
VSTR               Sd, [ Rn, # imm ]
VSTR               Sd, label
VSUB .f64          Dd, Dn, Dm
VSUB .f32          Sd, Sn, Sm

ARM instruction set

The ARM instruction set is mostly highly orthogonal. All data processing instructions work on the contents of registers and immediate constants only. Any data held in memory has to be loaded into a register, manipulated, then saved back to memory using one of the memory transfer instructions. This may appear to be restrictive, but due to the large number of general-purpose registers available for scratch storage, memory read/writes can be kept to a minimum. The assembler is of the prefix variety, with the instruction mnemonic preceding its parameters. Valid instructions are:


B | BL <<cond>> expression

BLX expression
BLX Rm

MOV | MVN <<cond>> <<S>> Rd op2
CMN | CMP | TEQ | TST <<cond>> <<P>> Rn op2
ADC | ADD | AND | BIC | EOR | ORR | RSB | RSC | SBC | SUB <<cond>>
<<S>> Rd Rn op2

MRS <<cond>> Rd psr
MSR <<cond>> psr Rm
MSR <<cond>> psrf Rm
MSR <<cond>> psrf #expression

MUL <<cond>> <<S>> Rd Rm Rn
MLA <<cond>> <<S>> Rd Rm Rs Rn

UMULL | SMULL | UMLAL | SMLAL <<cond>> <<S>> RdLo RdHi Rm Rs

LDR | LDRB | LDRH | STR | STRB | STRH <<cond>> Rd address <<!>>

LDMFD | LDMED | LDMFA | LDMEA | LDMIA | LDMIB | LDMDA | LDMDB |
STMFD | STMED | STMFA | STMEA | STMIA | STMIB | STMDA | STMDB
<<cond>> Rn <<!>> Rlist <<^>>

SWP | SWPB <<cond>> Rd Rm [ Rn ]

SWI <<cond>> expression

CDP <<cond>> CP# operation CRd CRn CRm info

LDC | LDCL | STC | STCL <<cond>> CP# CRd address

MCR | MRC <<cond>> CP# operation Rd CRn CRm info

Two pseudo instructions MVL and ADR are also available. NOP is supported as a synonym for:

  MOV  R0, R0

No switches are accepted for NOP.

Parameter Explanation
<<cond>> Optional conditional execution code, i.e. .NE. See Control Structures.
<<S>> Optional suffix .S to set the processor status flags.
<<P>> Optional suffix .P to modify the PSR in 26-bit modes.
<<!>> Optional ! enables write-back of the base register in loads and stores.
<<^>> Optional ^ sets the status flags when loading the PC from memory with the LDMxx instructions. Can also be used to force loading and storing of user bank registers in non-user modes.
Rd, RdLo, RdHi, Rm, Rn, Rs Equates to a valid register number. See Register naming. Some instructions, notably the multiplication set, place restrictions on the combinations of registers allowed.
op2 Is either one of the operands produced by the ARM's barrel shifter or an immediate constant. See Shift operations and Immediate constants.
expression For the B and BL instructions this is a label name or an expression evaluating to a branch address. The address is converted to make it pc relative, allowing for the effects of pipelining on the program counter. For the SWI instruction it is the number of the SWI to be called.
#expression Evaluates to a 32-bit value. This has to be a valid Immediate constant.
address A valid address specification. See Addressing modes.
Rlist A list of registers enclosed by braces. See Register lists.
psr Is the CPSR or SPSR register names.
psrf Is the CPSR_flag or SPSR_flag register names. Only the N, Z, C and V flags are written into the status register. Use the forms _C _X _S _F to indicate which sections are to be restored.
CP# The unique number of the required coprocessor.
CRd, CRn, CRm Equates to a valid coprocessor register number. See "Register Naming" below.
operation Is evaluated to a constant.
info Is evaluated to a constant.

Register naming

There are fifteen general-purpose registers available in user mode, plus the program counter. These are named R0 through R15. R15 is the program counter. Coprocessor registers are named CR0 through CR15. The Current Program Status Register and Saved Program Status Register are named CPSR and SPSR respectively. If transferring just the status flags then CPSR_flg and SPSR_flg can be used. As of v6.2 the notation

  CPSR_flag

is superceded by

CPSR _c _x _s _f

where the valid field definers are:

  _C _X _S _F _CXSF _FSXC _ALL

Standard ARM names are also available. SP refers to R13 (commonly used as a stack pointer), LINK refers to R14 (the link register), and PC refers to R15 (the program counter).

Forth register names can be used in place of the standard register names. These are TOS, LP, UP, RSP and PSP. As mentioned earlier these can be assigned to different ARM registers.

All register names can be used with or without a trailing comma. This makes for code that is more readable to the seasoned ARM programmer. Character case is not important.

Immediate constants

Rather then specify the name of a register whose contents are to be used in an operation, it is possible with many instructions to specify a numeric value which is encoded with the opcode mnemonic at assembly time.

When the # is encountered, the assembler recognises that the following input is to be interpreted as a numeric value. The value itself can be prefixed with the usual number base selectors such as # for decimal, $ for hexadecimal, % for binary and @ for octal:


ADD R2, R3, # $32   \ Add $32 to contents of R3
                    \ and place result in R2

Note that in the UK, there may be confusion with some printers between the hash symbol # and the pound symbol.

There are restrictions regarding the range of immediate constants that can be used. As mentioned before each instruction and its operands are encoded as a single 32-bit value on the ARM. Obviously some of the 32 bits are going to be given over to the instruction type, suffixes, and destination register etc. leaving only 12-bits to represent the constant. 12 bits does not allow many immediate constants to be used, so this is split into two fields. One, 8-bits wide, specifies the constant while the other 4-bit wide field specifies a value to shift the constant by (this is actually a rotate right by the shift value times two places). This widens the range of immediate constants that can be used, but has the restriction that not every number in the full 32-bit range can be used. Note that the range of negative immediate constants that can be represented is very limited as these appear to the ARM to be very large numbers i.e. -1 = $FFFFFFFF, and the larger a number is the harder it is to represent using the method described above. Judicial use of such instructions as CMN (compare negative), MVN (move inverted data - not negated!) and RSB (reverse subtract) can get around this problem.

Cortex only: If the literal pool is enabled you can load a register with a 32-bit immediate value using the form:


  LDR   Rx, @= imm32

e.g.


  LDR   R4, @= $12345678

You must flush the literal pool yourself using *\fo{FLUSHLITPOOL ( -- ) after END-CODE or ENDPROC.

Shift operations

Most data processing instructions allow operand two (the second source operand) to be specified as a shifted register. Here the contents of the register can be shifted at run-time by either a fixed amount or by the contents of another register. This can be done with one of the ARM's shift instructions, e.g.


ADD R0, R2, R7 LSL # 4   \ R7 logically shifted left by
                         \ 4 places
BIC R2, R4, R7 ASR R6    \ R7 arithmetically shifted
                         \ right by the contents of R6

Note that the contents of the register being shifted are not changed by the shift. The shifted value is only used during the instruction to calculate the new value to be stored in the destination register.

Note also that a shift by zero bits causes no change in the carry flag!

Shift operations supported by the ARM are:

Instruction Purpose
LSL # n or LSL Rn Logical shift left
ASL # n or ASL Rn Arithmetic shift left (identical to LSL)
LSR # n or LSR Rn Logical shift right
ASR # n or ASR Rn Arithmetic shift right
ROR # n or ROR Rn Rotate right
RRX Rotate right with extend - (no shift value or register is needed as the shift is by one place only)

Note that as with immediate constants, if the shift is by a fixed amount it should be preceded by the # symbol to inform the assembler that it is not dealing with a register.

Addressing modes

The ARM data processing instructions all work on the contents of registers and immediate operands. To transfer data to and from single registers and memory either the LDR, STR, LDC or STC instructions and their variants have to be used. Addresses can be specified in three ways.

Pre-indexed addressing

Pre-indexed addressing allows an offset to be added to (or subtracted from) an address held in a base register to form the address from which data is to be transferred. The address has the following format:

[ Rn <<offset>> ]

Where Rn is the base register name and the optional offset is either:

The address expression must be terminated by a ]. The initial [ is not strictly necessary but leads to code that is more readable for experienced ARM programmers.

A simple or shifted register offset needs to be prefixed with ++ or -- indicating whether the contents of the register should be added to or subtracted from the base register. Immediate constants do not use the 8/4-bit field format but rather range from -4095 to 4095. Shifted registers can only be shifted by a constant preceded by the # symbol and not by the contents of another register.

The address calculated by combining the base and offset registers is often useful in subsequent loads and stores, especially when a sequence of memory locations are to be accessed. Use the ! operator after the closing ] to enable the write back feature of the ARM. This will write the calculated address back into the base register for subsequent instructions to use.

Instruction Address
LDR Rd, [ Rn ] Load from Rn. Treated as LDR Rd, [ Rn, # 0 ]
LDR Rd, [ Rn, ++ Rm ] Load from Rn plus Rm
LDR Rd, [ Rn, -- Rm ] ! Load from Rn minus Rm with write back
LDR Rd, [ Rn, ++ Rm LSL # 5 ] ! Load from Rn plus Rm shifted logically left five places with write back
LDR Rd, [ Rn, # 20 ] Load from Rn plus twenty
LDR Rd, [ Rn, # -40 ] ! Load from Rn minus forty with write back

Post-indexed addressing

Post-indexed addresses have the following form:

  [ Rn ], <<offset>>

Post-indexed addressing adds the offset to the base register Rn after the data has been transferred from the address held in the base register. This implies that write back always occurs so it is not necessary to specify it. It can be used however to force non-privileged mode for the transfer cycle (same as the T suffix on some ARM assemblers).

The offset is specified in exactly the same way as for pre-indexed addressing. Examples of post-indexed addressing are:

Instruction Address
LDR Rd, [ Rn ], ++ Rm Load from Rn then add Rm to Rn
LDR Rd, [ Rn ], -- Rm Load from Rn then subtract Rm from Rn
LDR Rd, [ Rn ], ++ Rm LSL # 5 Load from Rn then add Rm, shifted logically left five places, to Rn
LDR Rd, [ Rn ], -- Rm LSL # 5 Load from Rn then subtract Rm, shifted logically left five places, from Rn
LDR Rd, [ Rn ], # 20 Load from Rn then add 20 to Rn
LDR Rd, [ Rn ], # -40 Load from Rn then subtract 40 from Rn

PC relative addressing

The assembler also recognises addresses specified as either an absolute number or an assembler label, e.g.

LDR R2, # $600          \ Load from memory location $600
LDR R2, label           \ Load from the address marked by label

Addresses specified using PC relative addressing are actually converted into pre-indexed addresses that load from the program counter (R15) plus or minus an immediate constant. This means that the address of the desired memory location has to lie within +/-4096 bytes of the address of the instruction referencing it. The assembler will take into account the effects of pipelining on the program counter when calculating the value of the offset.

Byte and half word addressing

The instructions LDRB and STRB plus LDRH and STRH can be used to transfer bytes or half words between memory and registers. Byte loads and stores only utilise the bottom 8-bits of the destination register and half words only the bottom 16-bits. The contents of the rest of the register are ignored on a store, and zeroed on a load from memory. Unlike word memory transfers, byte loads and stores do not have to be aligned, but half word transfers should be aligned to a two-byte boundary.

Register lists

Multiple registers can be loaded from and stored to memory using the LDM and STM instructions. The format is:

LDMxx Rd, <<!>> { Ra, Rb, Rx-Ry, ... } <<^>>
STMxx Rd, <<!>> { Ra, PC, LINK, Re-Rf } <<^>>

{ R0 R1 R2 R6 R12 }
{ R0-R2, R6, R12 }
{ R6, R12, R0-R2 }

Each register can only be specified once.

The optional final ^ sets the status flags when loading the PC from memory with the LDMxx instruction. It can also be used to force loading and storing of user bank registers in non-user modes. MVL and ADR

As indicated earlier, a common source of problems when programming with ARM assembler is the restriction placed on the range of immediate constants that can be used with the data processing instructions. To get around this the pseudo instruction MVL can be used to move any signed/unsigned 32-bit number into a register.

  MVL R2, # 127653

The MVL pseudo instruction will attempt to use a single MOV or MVN instruction if possible, but may generate up to four ARM instructions or two Cortex instructions to get the value into the register.

For Cortex, you can also use

  MVL32 R2, # $12345678

to load a 32 bit value into a register. This is useful when you are referencing the data address of a VALUE or a Forth word, e.g.


$12345678 value foo
Proc MyISR
  ...
  mvl32  r0, # ' foo >body  \ load data address

For Cortex, branch destination addresses that are loaded into the PC must have bit0 set to 1. To this, use:

  MVL32+1 R2, # <value>

The value can be forward referenced.

ADR is a pseudo instruction (macro) for ARM, but is a real instruction for Thumb-2. The ARM ADR pseudo instruction performs is used to move a 32-bit address into a register.

  ADR label

Due to the possibility that a label might be forward referenced and need 'fixing up' later on in the compilation, the ADR pseudo instruction will always generate a MOV and three ORR instructions.

Switching between Forth and Assembler

The compiler allows you to add in-line assembler inside colon definitions, and to add high level phrases inside code definitions.

Inline assembler code can be compiled inside a colon definition using [ASM and ASM]. Use these in the form:


  : <name>
    ... [ASM <assembler-code> ASM] ...
  ;

High level code can be compiled inside a CODE definition using [FORTH and FORTH]. Use these in the form:


  CODE <name>
    ... [FORTH <high-level> FORTH] ...
  END-CODE

Note that the optimiser is not flushed by the switches into assembler. This can (and should) be achieved by placing [O/F] before [ASM and FORTH].

Glossary

This glossary details the lwords provided within the cross-assembler to control the use of the assembler.

  • Bit31 always 0
  • : ARM32         \ --
    Select ARM7 mode for assembler

    : ArmArch5      \ --
    Select ARMv5 mode for assembler

    : Thumb-1       \ --
    Select full Thumb-1 mode for assembler and code generator.

    : Thumb-2       \ --
    Select full Thumb-2 mode for assembler and code generator.

    : Cortex-M0     \ --
    Select Cortex-M0 for assembler and code generator.

    : Cortex-M1     \ --
    Select Cortex-M1 for assembler and code generator.

    : Cortex-M3     \ --
    Select Cortex-M3 for assembler and code generator.

    : Cortex-M4     \ --
    Select Cortex-M4 (includes integer DSP) for the assembler and code generator.

    : Cortex-M4F    \ --
    Select Cortex-M4F (includes integer DSP and single-precision VFP) for the assembler and code generator.

    : Cortex-M7     \ --
    Select Cortex-M7 (includes integer DSP, single and double precision VFP) for the assembler and code generator.

    : ARM32?        \ -- flag
    Return true if in 32 bit ARM mode.

    : ArmArch5?     \ -- flag
    Return true if in 32 bit ARMv5 mode.

    : Thumb1?       \ -- flag
    Return true if in Thumb-1 mode.

    : Thumb2?       \ -- flag
    Return true if in Thumb-2 mode.

    : Thumb?        \ -- flag
    Return true if in either Thumb mode.

    : Cortex-M0?    \ -- flag
    Return true if the Cortex-M0 instruction set has been selected.

    : Cortex-M1?    \ -- flag
    Return true if the Cortex-M1 instruction set has been selected.

    : Cortex-M0/M1? \ -- flag
    Return true if the Cortex M0 or M1 instruction sets have been selected.

    : Cortex-M3?    \ -- flag
    Return true if the Cortex-M3 instruction set has been selected.

    : Cortex-M4?    \ -- flag
    Return true if the Cortex-M4 instruction set has been selected.

    : Cortex-M4F?   \ -- flag
    Return true if the Cortex-M4 instruction set has been selected.

    : Cortex-M7?    \ -- flag
    Return true if the Cortex-M7 instruction set has been selected.

    : IDSP?         \ -- flag
    Return true if the Cortex integer DSP instructions are present.

    : Not-M0/M1?    \ -- flag
    Return true if the Cortex M0 or M1 instruction sets have not been selected.

    : M0/M1?        \ -- flag
    Return true if Thumb-2 and Cortex M0 or M1 has been selected.

    : .f32          \ --
    Indicates that the data in the Sn/Dn/Qm registers is 32 bit.

    : .f64          \ --
    Indicates that the data in the Dn/Qm registers is 64 bit.

    : .s8           \ --
    Indicates that the data in the Sn/Dn/Qm registers is signed 8 bit.

    : .s16          \ --
    Indicates that the data in the Sn/Dn/Qm registers is signed 16 bit.

    : .s32          \ --
    Indicates that the data in the Sn/Dn/Qm registers is signed 32 bit.

    : .s64          \ --
    Indicates that the data in the Dn/Qm registers is signed 64 bit.

    : .u8           \ --
    Indicates that the data in the Sn/Dn/Qm registers is unsigned 8 bit.

    : .u16          \ --
    Indicates that the data in the Sn/Dn/Qm registers is unsigned 16 bit.

    : .u32          \ --
    Indicates that the data in the Sn/Dn/Qm registers is unsigned 32 bit.

    : .u64          \ --
    Indicates that the data in the Dn/Qm registers is unsigned 64 bit.

    : .i8           \ --
    Indicates that the data in the Sn/Dn/Qm registers is signed 8 bit.

    : .i16          \ --
    Indicates that the data in the Sn/Dn/Qm registers is signed 16 bit.

    : .i32          \ --
    Indicates that the data in the Sn/Dn/Qm registers is signed 32 bit.

    : .i64          \ --
    Indicates that the data in the Dn/Qm registers is signed 64 bit.

    : .8            \ --
    Indicates that the data in the Sn/Dn/Qm registers is 8 bit.

    : .16           \ --
    Indicates that the data in the Sn/Dn/Qm registers is 16 bit.

    : .32           \ --
    Indicates that the data in the Sn/Dn/Qm registers is 32 bit.

    : .64           \ --
    Indicates that the data in the Dn/Qm registers is 64 bit.

    : [n]           ( u -- )  <VFPindex> !  ;
    Set an index for VMOV.

    : [0]           ( -- )  0 [n]  ;
    Set index 0 for VMOV.

    : [1]           ( -- )  1 [n]  ;
    Set index 1 for VMOV.

    : [2]           ( -- )  2 [n]  ;
    Set index 2 for VMOV.

    : [3]           ( -- )  3 [n]  ;
    Set index 3 for VMOV.

    : [4]           ( -- )  4 [n]  ;
    Set index 4 for VMOV.

    : [5]           ( -- )  5 [n]  ;
    Set index 5 for VMOV.

    : [6]           ( -- )  6 [n]  ;
    Set index 6 for VMOV.

    : [7]           ( -- )  7 [n]  ;
    Set index 7 for VMOV.

    : dxb            \ b -- ; lay byte
    Lay an 8-bit byte into the instruction stream. No alignment is performed. Use in the form:

      dxb $55

    : dxw            \ w -- ; lay 16 bits
    Lay a 16-bit word into the instruction stream. No alignment is performed. Use in the form:

      dxw $55AA

    : dxl            \ l -- ; lay 32 bit long
    Lay a 32-bit dword into the instruction stream. No alignment is performed. Use in the form:

      dxl $11223344

    : db            \ b -- ; lay byte
    Lay a single byte inline. Obsolete, will be removed in a future release, use DXB instead.

    : dw            \ b -- ; lay 16 bits
    Lay a 16 bit item inline. No alignment is performed. Obsolete, will be removed in a future release, use DXW instead.

    : dd            \ l -- ; lay 32 bit long
    Lay a 32 bit item inline. No alignment is performed. Obsolete, will be removed in a future release, use DXL instead.

    : dl            \ l -- ; lay 32 bit long
    Lay a 32 bit item inline. No alignment is performed. Obsolete, will be removed in a future release, use DXL instead.

    : align4        \ --
    Forces the PC to a four byte boundary.

    : $             \ -- chere
    Returns the current value of the PC.

    : ;CODE         \ --
    A defining word used in the form:

            : <namex>  CREATE ... ;CODE ... END-CODE

    Stops compilation, and enables the assembler. This word is used with CREATE to produce defining words whose run-time portion is written in code, in the same way that CREATE ... DOES> is used to create high level defining words .

    The data structure is defined between CREATE and ;CODE and the run-time action is defined between ;CODE and END-CODE. The current value of the data stack pointer is saved by ;CODE for later use by END-CODE for error checking. When <namex> executes the address of the data area will be found on the processor stack, from which it must be removed.

    : ASMCODE       \ --
    Starts a section of assembler code and turns on the assembler, but without generating a dictionary header. This action is particularly useful for generating the start-up code. Examples of this can be found in CODEARM.FTH.

      ASMCODE ... END-CODE

    : CODE          \ --
    A defining word used in the form:

     CODE <name> ... END-CODE

    Creates a dictionary entry for <name> to be defined by a following sequence of assembly language words. Words thus defined are called code definitions. CODE stores the current data stack pointer for later error checking by END-CODE.

    : END-CODE      \ --
    Terminates a code definition and checks the data stack pointer against the value stored when ;CODE or CODE is executed. The assembler is disabled. See: CODE ;CODE.

    : IS-ACTION-OF  \ addr --
    Used to tell the cross-compiler that the given address is to be used as the run time action of the word whose name follows. Usually found in code definitions, but can also be used for high-level definitions. For example:

    
            ASMCODE
            HERE IS-ACTION-OF CONSTANT
              ...
            END-CODE
            ASSEMBLER
            HERE IS-ACTION-OF <<high-level-definer>>
            B DODOES
            END-CODE
            ] ... EXIT [
    

    : PROC          \ -- ; PROC <label-name>
    Starts a section of assembler code and turns on the assembler, defining a label. This action is particularly useful for generating interrupts or shared subroutines. Examples of this can be found in Cortex/CodeCortex.fth.

    : !call         \ dest ^ins --
    Patch a branch opcode at ^ins to branch to target address dest. The opcode portion is not changed, so that this word works with both B and BL. Note that this word may be redefined in some target code for ARM7 and ARM9 devices.