ELF Binary Internals 1 : ELF Basics

In this post I will share details on ELF binary basics.

So let is begin with a very simple hello world program in C


#include<stdio.h>

void main(){
printf("\nHello World\n");
}

As I am on a 64 bit Linux system I will compile the binary for both 32bit and 64bit mode.

We will compile this code with gcc by issuing the command 

for 64 bit  -> gcc hello.c -o hello64
for 32 bit  -> gcc hello.c  -m32 -o hello32 , (  in case we get error we can install gcc multilib by issuing command  sudo apt-get install gcc-multilib )

If we issue file command on the binary we created we would see the following output 

pentest@ubuntu:~/Desktop$ file hello64 
hello64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=797fa6ea8a92b773eb5106c822a76788441ceac1, not stripped

pentest@ubuntu:~/Desktop$ file hello32 
hello32: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=ba188ad09ee9ff9ac774833b8a7c87d8afbc443a, not stripped


So let us try to understand what all these mean (we will analyze the result of 64 bit binary)


hello64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=797fa6ea8a92b773eb5106c822a76788441ceac1, not stripped

hello64: This is the filename of the binary on which we are executing the file command

ELF - Executable and Linking  Format or Executable and Linkable Format - This mean that the binary type is associated with mainly UNIX type operating system , like Linux, Solaris but also supports certain non UNIX operating system.

64bit  - This gives us information tells about the architecture of the binary that it is 64 bit. So if it is a 32 bit binary it will be shown as 32

If we are in a 64bit machine and if we want to create a 32 bit binary we can pass the -m32 option 


LSB  -  Least Significant Byte  - It means the binary is in little endian format. In Intel architecture you will often find this as LSB. However in architectures like PowerPC , SPARC and so on it is possible to get this in big endian format i.e MSB ( Most Significant Byte )


Shared object - This result can either be Shared Object or Relocatable or Executable. Let us see how these are different from each other and how we can generate them using gcc.

The two terms which we are going to use here are PIC ( Position Independent Code ) and PIE ( Position Independent Executable . When we are planning to create a library that can be called by many process, we need to make it a PIC so that they can be loaded in the memory at any virtual address and just because they are position independent it can be accessed with relative offsets without worrying about the clashes of fixed locations in memory. We can create a PIE when 


Shared Object - By default the gcc compiler compiles the source code with -fPIC which makes address of the sections in the program relative to each other.


Executable - This mean this is not a PIE application. This loads with absolute address and thus we can find no reference of .plt.got sections here as the program is loaded in memory with fixed address .We can disable PIE with -no-pie option in gcc and thus we will get a executable object file.


Relocatable - This means this is just an object code without any linking of libraries or files that are necessary for the execution. There are some steps involved when we make a program that can be executable ( Please Note:  The term executable here means here is to make it run or execute and should not be confused with the above executable object type ). To make make an executable from source program the following process is involved. Preprocessing -> Compilation -> Object File Creation -> Linking. Normally in gcc we do in one step like gcc hello.c -o hello.out but however we can do in 2 steps like 


gcc -c hello.c  ; this will create an object file called hello.o

This is how the disassembly of main looks like in object code.

0000000000000000 <main>:

   0: 55                    push   rbp
   1: 48 89 e5              mov    rbp,rsp
   4: 48 8d 3d 00 00 00 00 lea    rdi,[rip+0x0]        # b <main+0xb>
   b: e8 00 00 00 00        call   10 <main+0x10>
  10: 90                    nop
  11: 5d                    pop    rbp
  12: c3                    ret    


This program cannot run or do anything because the object code doesn't have the necessary linked objects or libraries required for execution.

We can generate an executable binary from object code using the command

gcc hello.o -o hello-executable 

Now if we run objdump on the binary we can see lots of sections getting created with the location to the linkers. This is how the disassembly of main looks like after linking

000000000000063a <main>:

 63a: 55                    push   rbp
 63b: 48 89 e5              mov    rbp,rsp
 63e: 48 8d 3d 8f 00 00 00 lea    rdi,[rip+0x8f]        # 6d4 <_IO_stdin_used+0x4>
 645: e8 c6 fe ff ff        call   510 <puts@plt>
 64a: 90                    nop
 64b: 5d                    pop    rbp
 64c: c3                    ret    
 64d: 0f 1f 00              nop    DWORD PTR [rax]

version 1 (SYSV) - This means that it uses version 1 and the target operating system for the binary is SYSTEM V. There can be other possible values for this for example FreeBSD, HP-UX , etc,. I didn't get enough resource from where I can find more details on the version 1 result and how it can affect something.

dynamically linked, interpreter /lib/ld-linux.so.2, - It means that the binary uses some dynamically linked libraries. There is 2 possible values possible for this. Dynamically linked and Statically Linked


Dynamically Linked - It means the linker actually uses a reference to load dynamically linked libraries in memory during execution of the program from the location /lib/ld-linux.so.2


We can verify it by running the ldd on the binary


pentest@ubuntu:~/Desktop$ ldd hello64

linux-vdso.so.1 (0x00007fff96bc2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f425eb2a000)

/lib64/ld-linux-x86-64.so.2 (0x00007f425f11d000)

Statically Linked - It means that the binary has been packed along with the libraries. So there is no dynamically linked libraries inside it. So if a binary is compiled with -shared option in gcc it will create a statically linked binary. So if we run ldd command on the binary it will tell that there is not a dynamic executable.

pentest@ubuntu:~/Desktop$ ldd helloStatic 


not a dynamic executable

There is a huge difference in sizes of the binary when compiled with -shared option ( i.e statically )


-rwxrwxr-x 1 pentest pentest   8296 Feb  6 08:55 hello64

-rwxrwxr-x 1 pentest pentest 844704 Feb  7 09:28 helloStatic

At this point you might feel confused between the relocation of the binary that we discussed before and the linking which we are discussing now. Well when we talk about shared object or executable or relocatable object type, then we are actually dealing how the program will be loaded in memory but when we talk about linking, then it is all about how the external libraries will be linked to binaries - either dynamically via some shared resources or statically by packing it with the actual binary.

So we can make this statement , an executable object type may have dynamic linked libraries. Than means even if we disable PIE we can still get an executable with dynamically linked libraries.

pentest@ubuntu:~/Desktop$ file helloNOPIE
helloNOPIE: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2d986bca273f541af7a48ffb51f4d5fd22177c22, not stripped
pentest@ubuntu:~/Desktop$ ldd helloNOPIE
linux-vdso.so.1 (0x00007ffffe990000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcb5e8c0000)

/lib64/ld-linux-x86-64.so.2 (0x00007fcb5ecb1000)

for GNU/Linux 3.2.0 - The minimum kernel version required to execute the binary


BuildID[sha1]=ba188ad09ee9ff9ac774833b8a7c87d8afbc443a - This ID is assigned to the binary during the build phase. Possibly during the linking phase as this is not visible in object code after compilation.


not stripped - This means that the certain but not all debugging information are available. It can also have a value stripped in case we remove the debug symbols. A stripped binary is smaller in size than an not stripped binary. When we strip a binary we remove some extra sections from a binary that is not relevant or required for execution but was added for making debugging easy.

We can verify using gdb and we can keep debug symbols in a binary by compiling with -g option

pentest@ubuntu:~/Desktop$ gcc hello.c -g -o helloDebugSymbols
pentest@ubuntu:~/Desktop$ gdb -q ./helloDebugSymbols 
Reading symbols from ./helloDebugSymbols...done.
(gdb) info functions
All defined functions:

File hello.c:
void main();

Non-debugging symbols:
0x00000000000004e8  _init
0x0000000000000510  puts@plt
0x0000000000000520  __cxa_finalize@plt
0x0000000000000530  _start
0x0000000000000560  deregister_tm_clones
0x00000000000005a0  register_tm_clones
0x00000000000005f0  __do_global_dtors_aux
0x0000000000000630  frame_dummy
0x0000000000000650  __libc_csu_init
0x00000000000006c0  __libc_csu_fini
0x00000000000006c4  _fini

Now we will try the same with Stripped Binary and we can see that as there there is no debug symbols there is no reference to the function void main() as per source code. However there are still certain debug information available. For example I can find the address of main function.

pentest@ubuntu:~/Desktop$ gcc hello.c -o helloNoDebugSymbols
pentest@ubuntu:~/Desktop$ gdb -q ./helloNoDebugSymbols 
Reading symbols from ./helloNoDebugSymbols...(no debugging symbols found)...done.
(gdb) info functions
All defined functions:

Non-debugging symbols:
0x00000000000004e8  _init
0x0000000000000510  puts@plt
0x0000000000000520  __cxa_finalize@plt
0x0000000000000530  _start
0x0000000000000560  deregister_tm_clones
0x00000000000005a0  register_tm_clones
0x00000000000005f0  __do_global_dtors_aux
0x0000000000000630  frame_dummy
0x000000000000063a  main
0x0000000000000650  __libc_csu_init
0x00000000000006c0  __libc_csu_fini
0x00000000000006c4  _fini

We can strip it down further using strip function

pentest@ubuntu:~/Desktop$ strip -s helloNoDebugSymbols -o helloNoDebugSymbolsStripped
pentest@ubuntu:~/Desktop$ gdb -q ./helloNoDebugSymbolsStripped
Reading symbols from ./helloNoDebugSymbolsStripped...(no debugging symbols found)...done.
(gdb) info functions
All defined functions:

Non-debugging symbols:
0x0000000000000510  puts@plt

0x0000000000000520  __cxa_finalize@plt

So that's all for this blog post. In my further posts I will talk in more details about each of the part of elf binary in more details

http://man7.org/linux/man-pages/man5/elf.5.html https://mropert.github.io/2018/02/02/pic_pie_sanitizers/
https://stackoverflow.com/questions/23033529/elf-file-generation-commands-and-options
https://www.tutorialspoint.com/gnu_debugger/gdb_debugging_symbols.htm
https://www.akashtrehan.com/different-kinds-of-executables/
http://sco.com/developers/gabi/latest/ch4.intro.html
https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
https://stackoverflow.com/questions/5311515/gcc-fpic-option