Introduction
My journey into the realm of embedded electronics starts with this post! Curious about assembly programming, I decided to learn by writing my own “Hello World!” application. A Raspberry Pi was gathering dust on my desk so I wanted to put it to good use. Therefore, I ssh’ed into my Pi, opened my favourite text editor, vim, and wrote my first assembly program.
GNU assembler
The GNU assembler transforms your assembly code into low-level machine code. Several assemblers exist for ARM, some of them have been listed on the following wikipedia page. The choice here was pretty straightforward, working on a Raspberry Pi with a Linux OS, the GNU assembler (gas) felt like the straightforward choice. Installing the build-essential package should do the trick.
sudo apt install build-essential
Assembly language program
The following assembly language program was written in a hello.s file. Since we will be using the GNU assembler, the program needs to follow the appropriate syntax. The complete syntax and other information can be found in the GNU assembler user guide.
1.global _start
2
3_start:
4 MOV R0, #1
5 LDR R1, =msg
6 MOV R2, #13
7 MOV R7, #4
8 SWI 0
9
10 MOV R7, #1
11 SWI 0
12
13msg:
14 .asciz "Hello, ARM!\n"
We’ll go over some basic concepts we can discover in this basic program.
Assembler directives
We notice an assembler directive on the first line: .global _start. An assembler directive can be recognized easily since it starts with a period ("."). On the first line of the program we already have such a directive. The .global directive will make the symbol that follows, in this case _start, visible to the GNU Linker. The relevance of this will be described in the chapter about the GNU linker below.
The other directive we see is on line 14. The .asciz “string” directive assembles the string followed by a zero byte into consecutive addresses. A parallel can be made with a “C”-style string that requires a null character at the end. Many C functions actually expect a null-terminated string. It is also possible to use the directive .ascii which does not terminate the string with a null character. Using .asciz would therefore be safer if this string would be used in other code that expects null-termination.
Note: When I initially made the string longer, the output got truncated. Likely because I didn’t update the MOV R2, #13 to match the new string length. It reminded me how easy it is to forget we’re manually managing string length in raw syscalls.
Labels
2 labels are being used in this program: _start: and msg:. Labels can be recognized as a symbol followed by a colon “:”. The symbol is representing the current value of the active location counter. The latter being used by the assembler to assign memory addresses to the instructions. So we could say that the label is like a pointer to the corresponding instruction. Additional information can also be found in the detailed explanation of labels in the user guide.
Instructions
MOV
The MOV instructions used in this example, all take the form of
MOV Rd, #imm16
where Rd is the destination register and imm16 is a 16-bit value, in the range of 0-65535. The value from imm16 will be moved into the destination register Rd. So MOV R0, #1 will move 1 in register R0.
LDR
The LDR instruction LDR R1, =msg loads the address of msg in r1 using pseudo-instruction syntax as they don’t directly map into machine code.
SWI
The SWI instruction causes a SWI exception, and, apart from some processor state changes, what I found interesting, is, that the execution in this case switches to the SWI vector and the corresponding interrupt service routine is executed. This then ties into syscall()! Looking into the manual pages for syscall, we find the table shown below. Note how each architecture is using different instructions to invoke these syscalls. For an arm/EABI architecture, we see our SWI 0 instruction appearing and register r7 is used to define the system call number. Registers r0-r6 are used to pass parameter values to these systemcalls.
Arch/ABI Instruction System Ret Ret Error Notes
call # val val2
───────────────────────────────────────────────────────────────────
alpha callsys v0 v0 a4 a3 1, 6
arc trap0 r8 r0 - -
arm/OABI swi NR - r0 - - 2
arm/EABI swi 0x0 r7 r0 r1 -
arm64 svc #0 w8 x0 x1 -
blackfin excpt 0x0 P0 R0 - -
i386 int $0x80 eax eax edx -
ia64 break 0x100000 r15 r8 r9 r10 1, 6
m68k trap #0 d0 d0 - -
microblaze brki r14,8 r12 r3 - -
mips syscall v0 v0 v1 a3 1, 6
nios2 trap r2 r2 - r7
parisc ble 0x100(%sr2, %r0) r20 r28 - -
powerpc sc r0 r3 - r0 1
powerpc64 sc r0 r3 - cr0.SO 1
riscv ecall a7 a0 a1 -
s390 svc 0 r1 r2 r3 - 3
s390x svc 0 r1 r2 r3 - 3
superh trap #0x17 r3 r0 r1 - 4, 6
sparc/32 t 0x10 g1 o0 o1 psr/csr 1, 6
sparc/64 t 0x6d g1 o0 o1 psr/csr 1, 6
tile swint1 R10 R00 - R01 1
x86-64 syscall rax rax rdx - 5
x32 syscall rax rax rdx - 5
xtensa syscall a2 a2 - -
In our assembly program we passed the values 1, the address of msg, 13 and 4 to respectively r0, r1, r2 and r7. Before calling SWI 0 the first time. The second time we called SWI 0, we had only moved 1 in register r7. Following the system call numbers used and their definitions can be found in the file /usr/include/arm-linux-gnueabihf/asm/unistd-eabi.h on my Raspberry Pi, we can conclude that syscalls write() and exit() were done.
...
#define __NR_exit (__NR_SYSCALL_BASE + 1)
...
#define __NR_write (__NR_SYSCALL_BASE + 4)
...
Now, we are able to get the corresponding manual pages for these system calls which are documented in section 2 of the manual (man 2 write, man 2 exit). The interfaces are as follows:
ssize_t write(int fd, const void *buf, size_t count);
void _exit(int status);
We can readily map the registers we passed to the SWI instruction to the parameters of these system calls.
Running the assembler
As soon as you have the program written down, you can assemble it using the following command:
as -o hello.o hello.s
The -o flag is followed by the name of the output file. When the output file is dumped in a hexadecimal format, we see we’ve just created a .ELF file. You can give it a go using the following command:
hexdump -C hello.o
and expect something like the following:
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 28 00 01 00 00 00 00 00 00 00 00 00 00 00 |..(.............|
00000020 74 01 00 00 00 00 00 05 34 00 00 00 00 00 28 00 |t.......4.....(.|
00000030 09 00 08 00 01 00 a0 e3 20 10 9f e5 0d 20 a0 e3 |........ .... ..|
00000040 04 70 a0 e3 00 00 00 ef 01 70 a0 e3 00 00 00 ef |.p.......p......|
00000050 48 65 6c 6c 6f 2c 20 41 52 4d 21 0a 00 00 00 00 |Hello, ARM!.....|
00000060 1c 00 00 00 41 11 00 00 00 61 65 61 62 69 00 01 |....A....aeabi..|
00000070 07 00 00 00 08 01 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 03 00 01 00 00 00 00 00 00 00 00 00 |................|
000000a0 00 00 00 00 03 00 03 00 00 00 00 00 00 00 00 00 |................|
000000b0 00 00 00 00 03 00 04 00 01 00 00 00 00 00 00 00 |................|
000000c0 00 00 00 00 00 00 01 00 04 00 00 00 1c 00 00 00 |................|
000000d0 00 00 00 00 00 00 01 00 08 00 00 00 1c 00 00 00 |................|
000000e0 00 00 00 00 00 00 01 00 08 00 00 00 2c 00 00 00 |............,...|
000000f0 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 |................|
00000100 00 00 00 00 03 00 05 00 0b 00 00 00 00 00 00 00 |................|
00000110 00 00 00 00 10 00 01 00 00 24 61 00 6d 73 67 00 |.........$a.msg.|
00000120 24 64 00 5f 73 74 61 72 74 00 00 00 2c 00 00 00 |$d._start...,...|
00000130 02 01 00 00 00 2e 73 79 6d 74 61 62 00 2e 73 74 |......symtab..st|
00000140 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 2e |rtab..shstrtab..|
00000150 72 65 6c 2e 74 65 78 74 00 2e 64 61 74 61 00 2e |rel.text..data..|
00000160 62 73 73 00 2e 41 52 4d 2e 61 74 74 72 69 62 75 |bss..ARM.attribu|
00000170 74 65 73 00 00 00 00 00 00 00 00 00 00 00 00 00 |tes.............|
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 00 00 00 00 1f 00 00 00 |................|
000001a0 01 00 00 00 06 00 00 00 00 00 00 00 34 00 00 00 |............4...|
000001b0 30 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 |0...............|
000001c0 00 00 00 00 1b 00 00 00 09 00 00 00 40 00 00 00 |............@...|
000001d0 00 00 00 00 2c 01 00 00 08 00 00 00 06 00 00 00 |....,...........|
000001e0 01 00 00 00 04 00 00 00 08 00 00 00 25 00 00 00 |............%...|
000001f0 01 00 00 00 03 00 00 00 00 00 00 00 64 00 00 00 |............d...|
00000200 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................|
00000210 00 00 00 00 2b 00 00 00 08 00 00 00 03 00 00 00 |....+...........|
00000220 00 00 00 00 64 00 00 00 00 00 00 00 00 00 00 00 |....d...........|
00000230 00 00 00 00 01 00 00 00 00 00 00 00 30 00 00 00 |............0...|
00000240 03 00 00 70 00 00 00 00 00 00 00 00 64 00 00 00 |...p........d...|
00000250 12 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................|
00000260 00 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 |................|
00000270 00 00 00 00 78 00 00 00 a0 00 00 00 07 00 00 00 |....x...........|
00000280 09 00 00 00 04 00 00 00 10 00 00 00 09 00 00 00 |................|
00000290 03 00 00 00 00 00 00 00 00 00 00 00 18 01 00 00 |................|
000002a0 12 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................|
000002b0 00 00 00 00 11 00 00 00 03 00 00 00 00 00 00 00 |................|
000002c0 00 00 00 00 34 01 00 00 40 00 00 00 00 00 00 00 |....4...@.......|
000002d0 00 00 00 00 01 00 00 00 00 00 00 00 |............|
000002dc
The ELF file format looks like something really interesting to deep dive into, but that will be subject for another post.
Linking and executing
After assembling the code, you can feed it into the linker to obtain an executable and then execute it. Just perform the following commands in your bash shell where .o file is located
ld -o hello hello.o
./hello
I didn’t want to end this post without showing how to obtain the executable. But some other posts will follow related to these last 2 commands.
Further research and study
I was pretty amazed how far the study of this simple assembly program would take me. It has gotten me from discovering the GNU assembler into a better understanding of the invocation of system calls through the SWI vector. A hexadecimal dump of the .o file had me discover the ELF format, which I would love to understand better. This post has already gotten a bit lengthy, so I’ll keep the ELF format for a next post. The GNU linker and GNU debugger will certainly be featured as well!
This was Steven documenting his embedded journeys! Don’t hesitate to leave me a note, provide some feedback or ask for clarifications in the comments below!
Coming up: The GNU linker, ELF file internals, and the debugger!
comments powered by Disqus