GNU Linker: Dissecting ELF Executables on Raspberry Pi
This post is a continuation on Hello World in ARM Assembly on Raspberry Pi and Inside the ELF: What the ARM Assembler Really Generates on Raspberry Pi. You might want to reference these articles for a better positioning of some of the examples used in this post.
So the assembler took our previously written assembly program and changed it into an object file, an ELF file. But now that we already have this binary file, why is this just an intermediate step in the compilation process? Why do we need to feed this object file to the GNU linker in order to obtain an executable file? The latter being, just another binary ELF. Let’s examine what the linker is doing…
The GNU linker
For this post, we only have a single object file. But for larger codebases, which can include multiple libraries, it is way more efficient not to reassemble everything as soon as something changes. So by having intermediary object files, whenever some code changes, only the related object files need to change. The linker will afterwards take care to link all object files into a single executable. Here, we use the GNU linker to perform this linking, which the following cmdline command will do:
ld -o hello hello.o
While exploring the ELF format when writing the post about ELF format of the object file, I already learned how the ELF format is used for executable files, so there should be no surprise that
readelf -a hello
actually returns information about the file:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x10054
Start of program headers: 52 (bytes into file)
Start of section headers: 536 (bytes into file)
Flags: 0x5000200, Version5 EABI, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 5
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00010054 000054 000030 00 AX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 000084 000012 00 0 0 1
[ 3] .symtab SYMTAB 00000000 000098 000100 10 4 8 4
[ 4] .strtab STRTAB 00000000 000198 00004d 00 0 0 1
[ 5] .shstrtab STRTAB 00000000 0001e5 000031 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00010000 0x00010000 0x00084 0x00084 R E 0x10000
Section to Segment mapping:
Segment Sections...
00 .text
There is no dynamic section in this file.
There are no relocations in this file.
There are no unwind sections in this file.
Symbol table '.symtab' contains 16 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00010054 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 FILE LOCAL DEFAULT ABS hello.o
4: 00010054 0 NOTYPE LOCAL DEFAULT 1 $a
5: 00010070 0 NOTYPE LOCAL DEFAULT 1 msg
6: 00010070 0 NOTYPE LOCAL DEFAULT 1 $d
7: 00010080 0 NOTYPE LOCAL DEFAULT 1 $d
8: 00020084 0 NOTYPE GLOBAL DEFAULT 1 _bss_end__
9: 00020084 0 NOTYPE GLOBAL DEFAULT 1 __bss_start__
10: 00020084 0 NOTYPE GLOBAL DEFAULT 1 __bss_end__
11: 00010054 0 NOTYPE GLOBAL DEFAULT 1 _start
12: 00020084 0 NOTYPE GLOBAL DEFAULT 1 __bss_start
13: 00020084 0 NOTYPE GLOBAL DEFAULT 1 __end__
14: 00020084 0 NOTYPE GLOBAL DEFAULT 1 _edata
15: 00020084 0 NOTYPE GLOBAL DEFAULT 1 _end
No version information found in this file.
Attribute Section: aeabi
File Attributes
Tag_ARM_ISA_use: Yes
This outpout confirms that the file is an executable ELF for ARM with a defined entry point, minimal section headers, and one loadable segment.
In the comparison image below, you can see how the type changes, new headers appear, and certain symbols emerge. We’ll walk through each of these changes.
Relocatable file vs Executable file
As was expected, we see how the type change from a relocatable file into an executable file.
Entry point address
The entry point address is the virtual address to which the system transfers control according to the updated chapter 4 of the gabi. Since the object file is not meant to be executed directly, its entry point is undefined and defaults to zero. The GNU linker defines a command to define the entry point, which will reference the first executable instruction in the output file. In the GNU linker documentation, we find how the entry point is determined by the linker:
In descending order of priority
* the `-e' entry command-line option;
* the ENTRY(symbol) command in a linker control script;
* the value of the symbol start, if present;
* the address of the first byte of the .text section, if present;
* The address 0.
Since we didn’t pass the -e entry on the command line, the next possibility is to have an ENTRY(symbol) in the linker script. The latter being exactly what is present in the linker script: ENTRY(_start). Feel free to inspect your system’s default linker script using:
ld --verbose
So, based on this, we can say _start is the entry point symbol. And, by convention, this is what we want to have in our assembly programs as symbol indicating the startup routine. From the symbol table entry 11 shown in the readelf output, we notice that it’s value is set to 0x10054, which is exactly the value we have for our Entry point address.
Program headers
A program header in an object file is typical for an executable file (for all details about the program header, I can refer to this). We see from 3 that the program header size is 32 bytes and that only 1 program header table entry is available. Since the size of the ELF header is 52 bytes (cf. field “Size of this header” in the image) and the start of program headers is at 52 bytes into the file, we can deduct that the program header will immediately follow the ELF header.
So the program header should have the following format:
typedef struct {
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;
When we get the binary information of the executable through a hexdump and we zoom in on the bytes of the program headers then we obtain the following information:
...
00000030 XX XX XX XX 01 00 00 00 00 00 00 00 00 00 01 00 |................|
00000040 00 00 01 00 84 00 00 00 84 00 00 00 05 00 00 00 |................|
00000050 00 00 01 00 XX XX XX XX XX XX XX XX XX XX XX XX |........ .... ..|
...
So we can parse this binary data with the struct defined above:
Name | Length | Value | Interpretation |
---|---|---|---|
p_type | Elf32_Word | 0x00 0x00 0x00 0x01 | The type of segment this program header entry describes, 1 stands for PT_LOAD. Which is a loadable segment. |
p_offset | Elf32_Off | 0x00 0x00 0x00 0x00 | Offset from the beginning of the file at which the first byte of the segment resides. |
p_vaddr | Elf32_Addr | 0x00 0x01 0x00 0x00 | Virtual address at which the first byte of the segment sits in memory. |
p_paddr | Elf32_Addr | 0x00 0x01 0x00 0x00 | For systems that use physical addressing. |
p_filesz | Elf32_Word | 0x00 x00 0x00 0x84 | Number of bytes in the file image of the segment. |
p_memsz | Elf32_Word | 0x00 x00 0x00 0x84 | Number of bytes in the memory image of the segment. |
p_flags | Elf32_Word | 0x00 x00 0x00 0x05 | These flags determine the segment permissions, they are set 1 (PF_X) and 4 (PF_R), which mean Execute and Read respectively. |
p_align | Elf32_Word | 0x00 0x01 0x00 0x00 | The value to which the segments are aligned in memory and in the file. |
Which is exactly what can be read from the “Program headers” (cf. 6) from the readelf output for the executable file.
Section Headers
We see that we have less section headers, the ones that remain are “.text”, “.ARM.attributes”, “.symtab”, “.strtab” and “.shstrtab”. When we look at the symbol table, then we particularly notice how the number of symbols have increased. Notable is the _start symbol that was already discussed above, which references the entry point address.
.text
So the .text section holds the executable instructions of a program (cf. definition of the symbols here). And indeed, running
objdump -d hello
results in the following about
hello: file format elf32-littlearm
Disassembly of section .text:
00010054 <_start>:
10054: e3a00001 mov r0, #1
10058: e59f1020 ldr r1, [pc, #32] ; 10080 <msg+0x10>
1005c: e3a0200c mov r2, #12
10060: e3a07004 mov r7, #4
10064: ef000000 svc 0x00000000
10068: e3a07001 mov r7, #1
1006c: ef000000 svc 0x00000000
00010070 <msg>:
10070: 6c6c6548 .word 0x6c6c6548
10074: 41202c6f .word 0x41202c6f
10078: 0a214d52 .word 0x0a214d52
1007c: 00000000 .word 0x00000000
10080: 00010070 .word 0x00010070
where we can easily find back the assembly program that was defined in the first post. The only difference being that the SWI instruction was replaced by an SVC instruction. However, these instructions are actually the same. But, from the ARM documentation, we learn that SVC (Supervisor Call) was previously SWI (Software Interrupt).
So in summary, the linker puts together our code and symbols, creates program headers, defines an entry point and outputs a file that the OS can load and execute.
What’s next to explore?
One detail surprised me: the p_offset in the program header is 0. But isn’t that where the ELF header lives?
What still has me somewhat puzzled, is how the entry point address is defined exactly. The entry point address was set to 0x00010054 and when I had a look at the linker script, I noticed a configuration somewhere stating . = SEGMENT_START(“text-segment”, 0x00010000) + SIZEOF_HEADERS. So I wouldn’t be to surprised to find out there was a relation between these 2.
What is the relevance of these new symbols?
I’ll likely revisit these in a future update as my understanding deepens.
Further research and study
I went deeper in the executable ELF format. There is still a lot to figure out, but I’m happy with my current understanding of what the linker is doing. If there are any blanks or problems along the way, I’m certain that the relevant documentation will be found. I would still like to know how the binary information maps to the assembly language though.
Coming up: Evaluating how the assembly language got translated to the binary file. Getting a bit more hands-on with a distance sensor!
comments powered by Disqus