Embedded Journeys

GNU Linker: Dissecting ELF Executables on Raspberry Pi

This post is a continuation on Hello World in ARM Assembly on Raspberry Pi and Inside the ELF: What the ARM Assembler Really Generates on Raspberry Pi. You might want to reference these articles for a better positioning of some of the examples used in this post.

So the assembler took our previously written assembly program and changed it into an object file, an ELF file. But now that we already have this binary file, why is this just an intermediate step in the compilation process? Why do we need to feed this object file to the GNU linker in order to obtain an executable file? The latter being, just another binary ELF. Let’s examine what the linker is doing…

The GNU linker

For this post, we only have a single object file. But for larger codebases, which can include multiple libraries, it is way more efficient not to reassemble everything as soon as something changes. So by having intermediary object files, whenever some code changes, only the related object files need to change. The linker will afterwards take care to link all object files into a single executable. Here, we use the GNU linker to perform this linking, which the following cmdline command will do:

ld -o hello hello.o

While exploring the ELF format when writing the post about ELF format of the object file, I already learned how the ELF format is used for executable files, so there should be no surprise that

readelf -a hello

actually returns information about the file:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x10054
  Start of program headers:          52 (bytes into file)
  Start of section headers:          536 (bytes into file)
  Flags:                             0x5000200, Version5 EABI, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         1
  Size of section headers:           40 (bytes)
  Number of section headers:         6
  Section header string table index: 5

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00010054 000054 000030 00  AX  0   0  4
  [ 2] .ARM.attributes   ARM_ATTRIBUTES  00000000 000084 000012 00      0   0  1
  [ 3] .symtab           SYMTAB          00000000 000098 000100 10      4   8  4
  [ 4] .strtab           STRTAB          00000000 000198 00004d 00      0   0  1
  [ 5] .shstrtab         STRTAB          00000000 0001e5 000031 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  y (purecode), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00010000 0x00010000 0x00084 0x00084 R E 0x10000

 Section to Segment mapping:
  Segment Sections...
   00     .text

There is no dynamic section in this file.

There are no relocations in this file.

There are no unwind sections in this file.

Symbol table '.symtab' contains 16 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00010054     0 SECTION LOCAL  DEFAULT    1
     2: 00000000     0 SECTION LOCAL  DEFAULT    2
     3: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.o
     4: 00010054     0 NOTYPE  LOCAL  DEFAULT    1 $a
     5: 00010070     0 NOTYPE  LOCAL  DEFAULT    1 msg
     6: 00010070     0 NOTYPE  LOCAL  DEFAULT    1 $d
     7: 00010080     0 NOTYPE  LOCAL  DEFAULT    1 $d
     8: 00020084     0 NOTYPE  GLOBAL DEFAULT    1 _bss_end__
     9: 00020084     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start__
    10: 00020084     0 NOTYPE  GLOBAL DEFAULT    1 __bss_end__
    11: 00010054     0 NOTYPE  GLOBAL DEFAULT    1 _start
    12: 00020084     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start
    13: 00020084     0 NOTYPE  GLOBAL DEFAULT    1 __end__
    14: 00020084     0 NOTYPE  GLOBAL DEFAULT    1 _edata
    15: 00020084     0 NOTYPE  GLOBAL DEFAULT    1 _end

No version information found in this file.
Attribute Section: aeabi
File Attributes
  Tag_ARM_ISA_use: Yes

This outpout confirms that the file is an executable ELF for ARM with a defined entry point, minimal section headers, and one loadable segment.

In the comparison image below, you can see how the type changes, new headers appear, and certain symbols emerge. We’ll walk through each of these changes.

diff-elf-obj-exe-annotated

Relocatable file vs Executable file

As was expected, we see how the type change from a relocatable file into an executable file.

Entry point address

The entry point address is the virtual address to which the system transfers control according to the updated chapter 4 of the gabi. Since the object file is not meant to be executed directly, its entry point is undefined and defaults to zero. The GNU linker defines a command to define the entry point, which will reference the first executable instruction in the output file. In the GNU linker documentation, we find how the entry point is determined by the linker:

In descending order of priority

* the `-e' entry command-line option;
* the ENTRY(symbol) command in a linker control script;
* the value of the symbol start, if present;
* the address of the first byte of the .text section, if present;
* The address 0.

Since we didn’t pass the -e entry on the command line, the next possibility is to have an ENTRY(symbol) in the linker script. The latter being exactly what is present in the linker script: ENTRY(_start). Feel free to inspect your system’s default linker script using:

ld --verbose

So, based on this, we can say _start is the entry point symbol. And, by convention, this is what we want to have in our assembly programs as symbol indicating the startup routine. From the symbol table entry 11 shown in the readelf output, we notice that it’s value is set to 0x10054, which is exactly the value we have for our Entry point address.

Program headers

A program header in an object file is typical for an executable file (for all details about the program header, I can refer to this). We see from 3 that the program header size is 32 bytes and that only 1 program header table entry is available. Since the size of the ELF header is 52 bytes (cf. field “Size of this header” in the image) and the start of program headers is at 52 bytes into the file, we can deduct that the program header will immediately follow the ELF header.

So the program header should have the following format:

typedef struct {
	Elf32_Word	p_type;
	Elf32_Off	p_offset;
	Elf32_Addr	p_vaddr;
	Elf32_Addr	p_paddr;
	Elf32_Word	p_filesz;
	Elf32_Word	p_memsz;
	Elf32_Word	p_flags;
	Elf32_Word	p_align;
} Elf32_Phdr;

When we get the binary information of the executable through a hexdump and we zoom in on the bytes of the program headers then we obtain the following information:

...
00000030  XX XX XX XX 01 00 00 00  00 00 00 00 00 00 01 00  |................|
00000040  00 00 01 00 84 00 00 00  84 00 00 00 05 00 00 00  |................|
00000050  00 00 01 00 XX XX XX XX  XX XX XX XX XX XX XX XX  |........ .... ..|
...

So we can parse this binary data with the struct defined above:

NameLengthValueInterpretation
p_typeElf32_Word0x00 0x00 0x00 0x01The type of segment this program header entry describes, 1 stands for PT_LOAD. Which is a loadable segment.
p_offsetElf32_Off0x00 0x00 0x00 0x00Offset from the beginning of the file at which the first byte of the segment resides.
p_vaddrElf32_Addr0x00 0x01 0x00 0x00Virtual address at which the first byte of the segment sits in memory.
p_paddrElf32_Addr0x00 0x01 0x00 0x00For systems that use physical addressing.
p_fileszElf32_Word0x00 x00 0x00 0x84Number of bytes in the file image of the segment.
p_memszElf32_Word0x00 x00 0x00 0x84Number of bytes in the memory image of the segment.
p_flagsElf32_Word0x00 x00 0x00 0x05These flags determine the segment permissions, they are set 1 (PF_X) and 4 (PF_R), which mean Execute and Read respectively.
p_alignElf32_Word0x00 0x01 0x00 0x00The value to which the segments are aligned in memory and in the file.

Which is exactly what can be read from the “Program headers” (cf. 6) from the readelf output for the executable file.

Section Headers

We see that we have less section headers, the ones that remain are “.text”, “.ARM.attributes”, “.symtab”, “.strtab” and “.shstrtab”. When we look at the symbol table, then we particularly notice how the number of symbols have increased. Notable is the _start symbol that was already discussed above, which references the entry point address.

.text

So the .text section holds the executable instructions of a program (cf. definition of the symbols here). And indeed, running

objdump -d hello

results in the following about

hello:     file format elf32-littlearm


Disassembly of section .text:

00010054 <_start>:
   10054:       e3a00001        mov     r0, #1
   10058:       e59f1020        ldr     r1, [pc, #32]   ; 10080 <msg+0x10>
   1005c:       e3a0200c        mov     r2, #12
   10060:       e3a07004        mov     r7, #4
   10064:       ef000000        svc     0x00000000
   10068:       e3a07001        mov     r7, #1
   1006c:       ef000000        svc     0x00000000

00010070 <msg>:
   10070:       6c6c6548        .word   0x6c6c6548
   10074:       41202c6f        .word   0x41202c6f
   10078:       0a214d52        .word   0x0a214d52
   1007c:       00000000        .word   0x00000000
   10080:       00010070        .word   0x00010070

where we can easily find back the assembly program that was defined in the first post. The only difference being that the SWI instruction was replaced by an SVC instruction. However, these instructions are actually the same. But, from the ARM documentation, we learn that SVC (Supervisor Call) was previously SWI (Software Interrupt).

So in summary, the linker puts together our code and symbols, creates program headers, defines an entry point and outputs a file that the OS can load and execute.

What’s next to explore?

One detail surprised me: the p_offset in the program header is 0. But isn’t that where the ELF header lives?

What still has me somewhat puzzled, is how the entry point address is defined exactly. The entry point address was set to 0x00010054 and when I had a look at the linker script, I noticed a configuration somewhere stating . = SEGMENT_START(“text-segment”, 0x00010000) + SIZEOF_HEADERS. So I wouldn’t be to surprised to find out there was a relation between these 2.

What is the relevance of these new symbols?

I’ll likely revisit these in a future update as my understanding deepens.

Further research and study

I went deeper in the executable ELF format. There is still a lot to figure out, but I’m happy with my current understanding of what the linker is doing. If there are any blanks or problems along the way, I’m certain that the relevant documentation will be found. I would still like to know how the binary information maps to the assembly language though.

Coming up: Evaluating how the assembly language got translated to the binary file. Getting a bit more hands-on with a distance sensor!

comments powered by Disqus