ARM Assembler

ARM processors are RISC (Reduced Instruction Set Computer) chips used widely in mobile phones and other embedded devices. The use of an ARM processor in the Raspberry Pi will promote the study of RISC processors in schools. The ARM registers and instruction sets are different from those we have described for x86 Intel processors. This section will compare ARM assembler with Intel syntax, so you should be familiar with the material in our in-line assembler tutorial. It is our intention to write assembler code for the Raspberry Pi, but because of its restricted availability we will start by testing our code in a simulator.

We installed Cygwin (which gives us some Unix-type functionality within Windows) then GNUARM. From the GNUARM home page we selected the "FILES" page and executed the GCC-4.1 toolchain setup file labelled

binutils-2.17, gcc-4.1.1-c-c++, newlib-1.14.0, insight-6.5, setup.exe [25.1MB].

A most useful reference for the code was the Tonc Whirlwind Tour of ARM Assembly (for Gameboy Advance - GBA).

You can choose how you use the registers, but you may need to preserve the contents of certain registers by pushing them before use then popping them afterwards. By convention, registers r0 to r3 and r11 may legitimately be "corrupted" by a routine. The following table shows conventional uses of registers.

Table 1. Uses of Registers
Name Alternative name Description
r0 - r3   Used to hold arguments for procedures and as scratch registers (for temporary storage). R0 is used to return the result of a function.
r4 - r9 v1 - v6 General purpose or storage of variables
r10 sl, v7 Stack limit pointer, used by assemblers for stack checking when this option is selected by the user.
r11 fp, v8 Frame pointer. From Jack Crenshaw's section on local variables when translating procedures, "Formal parameters are addressed as positive offsets from the frame pointer, and locals as negative offsets".
r12 ip Intra-Procedure-call scratch. Used with r0 - r3 for temporary storage and original contents do not need to be preserved.
r13 sp Stack pointer
r14 lr Link register, holding the return address from a function
r15 pc Program counter, holding the address of the next instruction

The instruction set is summarised in a quick reference card. We tabulate below selected instructions that you are most likely to use at first, together with their equivalents in Intel syntax.

Table 2. Selected Operations
ARM Mnemonic Intel Mnemonic Function
ADD

ADD
INC

Addition
SUB

SUB
DEC

Subtraction
RSB   Reverse subtraction
MUL IMUL Multiplication
     
AND AND Bitwise AND
ORR OR Bitwise OR
EOR XOR Bitwise exclusive OR
MVN NOT Bitwise NOT
     
TST TEST Test (performs bitwise AND and sets flags according to the result)
CMP CMP Compare
B JMP Unconditional jump/branch
BEQ JE Jump/branch if equal
     
PUSH PUSH Push onto stack
POP POP Pop from stack

MOV
LDR
STR

MOV Transfer (copy)
MOVEQ CMOVE Copy if equal
     
ADR LEA Load address
     
BL CALL/RET Call a subroutine then return

The following bullet points highlight features of ARM assembly language that are strikingly different from Intel syntax.

  • The result register (the first register following the mnemonic) can be different from the two operands. For example, the instruction ADD r0, r1, r2 will add r2 to r1 and store the result in r0.
  • The suffix "S" added to mnemonics such as MOV makes the operation affect flags.
  • A conditional suffix such as EQ (if equal), NE (if not equal), GT (if greater than), GE (if greater than or equal) can be added to most mnemonics.
  • If you need to operate on data in a memory location you must load it into a register first. (ARM processors have a load/store architecture).
  • For storing (mnemonic STR) the source precedes the destination.
  • When loading, you can load directly from memory (e.g. ldr r1, num1), but when saving, you must put the address in the destination register and indirect address using square brackets (e.g. str r0, [r3]).
  • There are restrictions on the immediate values that you can load directly with MOV, but you can use the syntax ldr rx, =immediate value e.g. ldr r0, =625.
  • You can apply shifts to the second operand "cheaply" as part of an operation.
  • Only recent ARM processors handle the division operator. You need to branch to a library routine instead. You can use other routines such as puts for outputting a string and printf for outputting a string with formatted parameters.
Sample code follows these tabulated commands for processing it. The arm-elf commands require that the paths of the bin folders of Cygwin and GNUARM are in the list of paths among your Environment Variables. Change the working directory to that of your source files e.g. C:\ARM.
Table 3. Useful Commands
Command at Cygwin prompt Result
cd C:\ARM Changes current directory to C:\ARM
arm-elf-gcc -o temp.elf temp.s Assembles temp.s
arm-elf-run temp.elf Simulates the running of temp.elf
arm-elf-gcc -S -o div.s div.c Compiles div.c as far as the ARM assembler div.s (showing how compiler uses registers and routines)
scan_format:	
.asciz "%d"
out_format:
.asciz "Sum: %d    Difference: %d    Product: %d\n"
instr1:
.asciz "Enter first integer."
instr2:
.asciz "Enter second integer."
.align	2
num1:
.word 0
num2:
.word 0
        
.global	main	
main:	
push {ip, lr} @Used with pop at end of main, allowing program to end.

ldr r0, =instr1  
bl puts
ldr r1, =num1
ldr r0, =scan_format
bl scanf

ldr r0, =instr2  
bl puts
ldr r1, =num2
ldr r0, =scan_format
bl scanf

ldr r4, num1
ldr r5, num2

add r1, r4, r5
sub r2, r4, r5
mul r3, r4, r5

ldr r0, =out_format
bl printf

pop {ip, pc} @Used with push at start of Main, allowing program to end.

The above code would not assemble on the Raspberry Pi. The amended code below assembles and runs on the simulator and on the Raspberry Pi. Instead of loading a variable directly with ldr r4, num1, we need to load the address of the variable into another register (ldr r6 =num1) then use indirect addressing (ldr r4, [r6]). We have added .data before the data declarations and .text to mark the start of the code section.

Commands for the Raspberry Pi

You can transfer files between a PC and the Pi without networking by using either the SD Card or a USB memory stick. The boot directory /boot is visible in windows and is useful for transferring files between a computer running Windows and the Pi using the SD card.

  • cd /usr/bin changes the current directory to the one containing gcc.
  • gcc -o ~/temp /boot/temp.s compiles the assembler file temp.s in the boot directory to the executable temp in your home directory.
  • cd Changes the current directory to your home directory.
  • ./temp executes the file temp in the current directory.

This screenshot shows a way of compiling without changing the current directory. The 80 KB program scrot is for capturing screenshots. You can install it on your Pi with the command sudo apt-get install scrot. Our screenshot is saved to a USB memory stick for cropping on a PC. (We installed usbmount with the command sudo apt-get install usbmount so that it detects our flash drive, mounts it to /media/usb then unmounts it upon removal).

Screenshot of our first Pi assembler program in action

Screenshot of our first Pi assembler program in action

ARM Assembler Code for the Raspberry Pi

.data
.align 2
scan_format:
.asciz "%d"
.align 2
out_format:
.asciz "Sum: %d    Difference: %d    Product: %d\n"
.align 2
instr1:
.asciz "Enter first integer."
instr2:
.asciz "Enter second integer."
.align  2
num1:
.word 0
num2:
.word 0
.text
.global main
main:
push {ip, lr} @Used with pop at end of main, allowing program to end.

ldr r0, =instr1
bl puts
ldr r1, =num1
ldr r0, =scan_format
bl scanf

ldr r0, =instr2
bl puts
ldr r1, =num2
ldr r0, =scan_format
bl scanf

ldr r6, =num1  @load address of num1 into r6
ldr r4, [r6]  @load value of num1 into r4

ldr r6, =num2
ldr r5, [r6]

add r1, r4, r5
sub r2, r4, r5
mul r3, r4, r5

ldr r0, =out_format
bl printf

pop {ip, pc} @Used with push at start of Main, allowing program to end.

The next version (for the simulator and Pi) shows how you can save values to memory. The arm processor is well-supplied with registers for storing variables, but if you run out of registers you can load the address of a variable into a register (r3 in the code below) and then use indirect addressing to save data. The use of memory locations instead of registers will slow down the execution.


.data
.align 2
scan_format:
.asciz "%d"
.align 2
out_format:
.asciz "Sum: %d    Difference: %d    Product: %d\n"
.align 2
instr1:
.asciz "Enter first integer."
instr2:
.asciz "Enter second integer."
.align  2
num1:
.word 0
num2:
.word 0
sum:
.word 0
difference:
.word 0
product:
.word 0

.text
.global main

main:	
push {ip, lr}
ldr r0, =instr1  
bl puts
ldr r1, =num1
ldr r0, =scan_format
bl scanf

ldr r0, =instr2  
bl puts
ldr r1, =num2
ldr r0, =scan_format
bl scanf
ldr r6, =num1
ldr r1, [r6]
ldr r6, =num2
ldr r2,  [r6]
add r0, r1, r2

@store r0 in sum
ldr r3, =sum
str r0, [r3] @The source r0 precedes the destination

sub r0, r1, r2
ldr r3, =difference
str r0, [r3]

mul r0, r1, r2
ldr r3, =product
str r0, [r3]

ldr r6, =sum
ldr r1, [r6]
ldr r6, =difference
ldr r2, [r6]
ldr r6, =product
ldr r3, [r6]
ldr r0, =out_format
bl printf

pop {ip, pc}

See how ARM assembler code for the simulator is generated from TINY source code using programs TINY11ELF and TINY14ELF. These programs generate assembler code that arm-elf-gcc assembles and links to create .elf executables that run in the arm-elf-run simulator. We need to make minor modifications to them so that their output code will assemble using the gcc translator on the Raspberry Pi.

Programming - a skill for life!

Getting started with the MASM assembler, MASM demonstrations (including console games and floating point numbers) and ARM assembly language