In this post I will share details on ELF binary basics.
So let is begin with a very simple hello world program in C
#include<stdio.h>
void main(){
printf("\nHello World\n");
}
So let us try to understand what all these mean (we will analyze the result of 64 bit binary)
hello64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=797fa6ea8a92b773eb5106c822a76788441ceac1, not stripped
The two terms which we are going to use here are PIC ( Position Independent Code ) and PIE ( Position Independent Executable . When we are planning to create a library that can be called by many process, we need to make it a PIC so that they can be loaded in the memory at any virtual address and just because they are position independent it can be accessed with relative offsets without worrying about the clashes of fixed locations in memory. We can create a PIE when
Shared Object - By default the gcc compiler compiles the source code with -fPIC which makes address of the sections in the program relative to each other.
Executable - This mean this is not a PIE application. This loads with absolute address and thus we can find no reference of .plt.got sections here as the program is loaded in memory with fixed address .We can disable PIE with -no-pie option in gcc and thus we will get a executable object file.
Relocatable - This means this is just an object code without any linking of libraries or files that are necessary for the execution. There are some steps involved when we make a program that can be executable ( Please Note: The term executable here means here is to make it run or execute and should not be confused with the above executable object type ). To make make an executable from source program the following process is involved. Preprocessing -> Compilation -> Object File Creation -> Linking. Normally in gcc we do in one step like gcc hello.c -o hello.out but however we can do in 2 steps like
gcc -c hello.c ; this will create an object file called hello.o
This is how the disassembly of main looks like in object code.
0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # b <main+0xb>
b: e8 00 00 00 00 call 10 <main+0x10>
10: 90 nop
11: 5d pop rbp
12: c3 ret
Now if we run objdump on the binary we can see lots of sections getting created with the location to the linkers. This is how the disassembly of main looks like after linking
version 1 (SYSV) - This means that it uses version 1 and the target operating system for the binary is SYSTEM V. There can be other possible values for this for example FreeBSD, HP-UX , etc,. I didn't get enough resource from where I can find more details on the version 1 result and how it can affect something.
dynamically linked, interpreter /lib/ld-linux.so.2, - It means that the binary uses some dynamically linked libraries. There is 2 possible values possible for this. Dynamically linked and Statically Linked
Dynamically Linked - It means the linker actually uses a reference to load dynamically linked libraries in memory during execution of the program from the location /lib/ld-linux.so.2
We can verify it by running the ldd on the binary
pentest@ubuntu:~/Desktop$ ldd hello64
linux-vdso.so.1 (0x00007fff96bc2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f425eb2a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f425f11d000)
Statically Linked - It means that the binary has been packed along with the libraries. So there is no dynamically linked libraries inside it. So if a binary is compiled with -shared option in gcc it will create a statically linked binary. So if we run ldd command on the binary it will tell that there is not a dynamic executable.
pentest@ubuntu:~/Desktop$ ldd helloStatic
not a dynamic executable
There is a huge difference in sizes of the binary when compiled with -shared option ( i.e statically )
-rwxrwxr-x 1 pentest pentest 8296 Feb 6 08:55 hello64
-rwxrwxr-x 1 pentest pentest 844704 Feb 7 09:28 helloStatic
At this point you might feel confused between the relocation of the binary that we discussed before and the linking which we are discussing now. Well when we talk about shared object or executable or relocatable object type, then we are actually dealing how the program will be loaded in memory but when we talk about linking, then it is all about how the external libraries will be linked to binaries - either dynamically via some shared resources or statically by packing it with the actual binary.
So we can make this statement , an executable object type may have dynamic linked libraries. Than means even if we disable PIE we can still get an executable with dynamically linked libraries.
pentest@ubuntu:~/Desktop$ file helloNOPIE
helloNOPIE: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2d986bca273f541af7a48ffb51f4d5fd22177c22, not stripped
pentest@ubuntu:~/Desktop$ ldd helloNOPIE
linux-vdso.so.1 (0x00007ffffe990000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcb5e8c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007fcb5ecb1000)
for GNU/Linux 3.2.0 - The minimum kernel version required to execute the binary
BuildID[sha1]=ba188ad09ee9ff9ac774833b8a7c87d8afbc443a - This ID is assigned to the binary during the build phase. Possibly during the linking phase as this is not visible in object code after compilation.
not stripped - This means that the certain but not all debugging information are available. It can also have a value stripped in case we remove the debug symbols. A stripped binary is smaller in size than an not stripped binary. When we strip a binary we remove some extra sections from a binary that is not relevant or required for execution but was added for making debugging easy.
We can verify using gdb and we can keep debug symbols in a binary by compiling with -g option
So let is begin with a very simple hello world program in C
#include<stdio.h>
void main(){
printf("\nHello World\n");
}
As I am on a 64 bit Linux system I will compile the binary for both 32bit and 64bit mode.
We will compile this code with gcc by issuing the command
for 64 bit -> gcc hello.c -o hello64
for 32 bit -> gcc hello.c -m32 -o hello32 , ( in case we get error we can install gcc multilib by issuing command sudo apt-get install gcc-multilib )
If we issue file command on the binary we created we would see the following output
pentest@ubuntu:~/Desktop$ file hello64
hello64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=797fa6ea8a92b773eb5106c822a76788441ceac1, not stripped
pentest@ubuntu:~/Desktop$ file hello32
hello32: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=ba188ad09ee9ff9ac774833b8a7c87d8afbc443a, not stripped
So let us try to understand what all these mean (we will analyze the result of 64 bit binary)
hello64: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=797fa6ea8a92b773eb5106c822a76788441ceac1, not stripped
hello64: This is the filename of the binary on which we are executing the file command
ELF - Executable and Linking Format or Executable and Linkable Format - This mean that the binary type is associated with mainly UNIX type operating system , like Linux, Solaris but also supports certain non UNIX operating system.
64bit - This gives us information tells about the architecture of the binary that it is 64 bit. So if it is a 32 bit binary it will be shown as 32
If we are in a 64bit machine and if we want to create a 32 bit binary we can pass the -m32 option
LSB - Least Significant Byte - It means the binary is in little endian format. In Intel architecture you will often find this as LSB. However in architectures like PowerPC , SPARC and so on it is possible to get this in big endian format i.e MSB ( Most Significant Byte )
Shared object - This result can either be Shared Object or Relocatable or Executable. Let us see how these are different from each other and how we can generate them using gcc.
If we are in a 64bit machine and if we want to create a 32 bit binary we can pass the -m32 option
LSB - Least Significant Byte - It means the binary is in little endian format. In Intel architecture you will often find this as LSB. However in architectures like PowerPC , SPARC and so on it is possible to get this in big endian format i.e MSB ( Most Significant Byte )
Shared object - This result can either be Shared Object or Relocatable or Executable. Let us see how these are different from each other and how we can generate them using gcc.
The two terms which we are going to use here are PIC ( Position Independent Code ) and PIE ( Position Independent Executable . When we are planning to create a library that can be called by many process, we need to make it a PIC so that they can be loaded in the memory at any virtual address and just because they are position independent it can be accessed with relative offsets without worrying about the clashes of fixed locations in memory. We can create a PIE when
Shared Object - By default the gcc compiler compiles the source code with -fPIC which makes address of the sections in the program relative to each other.
Executable - This mean this is not a PIE application. This loads with absolute address and thus we can find no reference of .plt.got sections here as the program is loaded in memory with fixed address .We can disable PIE with -no-pie option in gcc and thus we will get a executable object file.
Relocatable - This means this is just an object code without any linking of libraries or files that are necessary for the execution. There are some steps involved when we make a program that can be executable ( Please Note: The term executable here means here is to make it run or execute and should not be confused with the above executable object type ). To make make an executable from source program the following process is involved. Preprocessing -> Compilation -> Object File Creation -> Linking. Normally in gcc we do in one step like gcc hello.c -o hello.out but however we can do in 2 steps like
gcc -c hello.c ; this will create an object file called hello.o
This is how the disassembly of main looks like in object code.
0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # b <main+0xb>
b: e8 00 00 00 00 call 10 <main+0x10>
10: 90 nop
11: 5d pop rbp
12: c3 ret
This program cannot run or do anything because the object code doesn't have the necessary linked objects or libraries required for execution.
We can generate an executable binary from object code using the command
gcc hello.o -o hello-executable
Now if we run objdump on the binary we can see lots of sections getting created with the location to the linkers. This is how the disassembly of main looks like after linking
000000000000063a <main>:
63a: 55 push rbp
63b: 48 89 e5 mov rbp,rsp
63e: 48 8d 3d 8f 00 00 00 lea rdi,[rip+0x8f] # 6d4 <_IO_stdin_used+0x4>
645: e8 c6 fe ff ff call 510 <puts@plt>
64a: 90 nop
64b: 5d pop rbp
64c: c3 ret
64d: 0f 1f 00 nop DWORD PTR [rax]
dynamically linked, interpreter /lib/ld-linux.so.2, - It means that the binary uses some dynamically linked libraries. There is 2 possible values possible for this. Dynamically linked and Statically Linked
Dynamically Linked - It means the linker actually uses a reference to load dynamically linked libraries in memory during execution of the program from the location /lib/ld-linux.so.2
We can verify it by running the ldd on the binary
pentest@ubuntu:~/Desktop$ ldd hello64
linux-vdso.so.1 (0x00007fff96bc2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f425eb2a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f425f11d000)
pentest@ubuntu:~/Desktop$ ldd helloStatic
not a dynamic executable
There is a huge difference in sizes of the binary when compiled with -shared option ( i.e statically )
-rwxrwxr-x 1 pentest pentest 8296 Feb 6 08:55 hello64
-rwxrwxr-x 1 pentest pentest 844704 Feb 7 09:28 helloStatic
At this point you might feel confused between the relocation of the binary that we discussed before and the linking which we are discussing now. Well when we talk about shared object or executable or relocatable object type, then we are actually dealing how the program will be loaded in memory but when we talk about linking, then it is all about how the external libraries will be linked to binaries - either dynamically via some shared resources or statically by packing it with the actual binary.
So we can make this statement , an executable object type may have dynamic linked libraries. Than means even if we disable PIE we can still get an executable with dynamically linked libraries.
pentest@ubuntu:~/Desktop$ file helloNOPIE
helloNOPIE: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2d986bca273f541af7a48ffb51f4d5fd22177c22, not stripped
pentest@ubuntu:~/Desktop$ ldd helloNOPIE
linux-vdso.so.1 (0x00007ffffe990000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcb5e8c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007fcb5ecb1000)
for GNU/Linux 3.2.0 - The minimum kernel version required to execute the binary
BuildID[sha1]=ba188ad09ee9ff9ac774833b8a7c87d8afbc443a - This ID is assigned to the binary during the build phase. Possibly during the linking phase as this is not visible in object code after compilation.
not stripped - This means that the certain but not all debugging information are available. It can also have a value stripped in case we remove the debug symbols. A stripped binary is smaller in size than an not stripped binary. When we strip a binary we remove some extra sections from a binary that is not relevant or required for execution but was added for making debugging easy.
We can verify using gdb and we can keep debug symbols in a binary by compiling with -g option
pentest@ubuntu:~/Desktop$ gcc hello.c -g -o helloDebugSymbols
pentest@ubuntu:~/Desktop$ gdb -q ./helloDebugSymbols
Reading symbols from ./helloDebugSymbols...done.
(gdb) info functions
All defined functions:
File hello.c:
void main();
Non-debugging symbols:
0x00000000000004e8 _init
0x0000000000000510 puts@plt
0x0000000000000520 __cxa_finalize@plt
0x0000000000000530 _start
0x0000000000000560 deregister_tm_clones
0x00000000000005a0 register_tm_clones
0x00000000000005f0 __do_global_dtors_aux
0x0000000000000630 frame_dummy
0x0000000000000650 __libc_csu_init
0x00000000000006c0 __libc_csu_fini
0x00000000000006c4 _fini
We can strip it down further using strip function
pentest@ubuntu:~/Desktop$ strip -s helloNoDebugSymbols -o helloNoDebugSymbolsStripped
pentest@ubuntu:~/Desktop$ gdb -q ./helloNoDebugSymbolsStripped
Reading symbols from ./helloNoDebugSymbolsStripped...(no debugging symbols found)...done.
(gdb) info functions
All defined functions:
Non-debugging symbols:
0x0000000000000510 puts@plt
0x0000000000000520 __cxa_finalize@plt
http://man7.org/linux/man-pages/man5/elf.5.html https://mropert.github.io/2018/02/02/pic_pie_sanitizers/
https://stackoverflow.com/questions/23033529/elf-file-generation-commands-and-options
https://www.tutorialspoint.com/gnu_debugger/gdb_debugging_symbols.htm
https://www.akashtrehan.com/different-kinds-of-executables/
http://sco.com/developers/gabi/latest/ch4.intro.html
https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
https://stackoverflow.com/questions/5311515/gcc-fpic-option
pentest@ubuntu:~/Desktop$ gdb -q ./helloDebugSymbols
Reading symbols from ./helloDebugSymbols...done.
(gdb) info functions
All defined functions:
File hello.c:
void main();
Non-debugging symbols:
0x00000000000004e8 _init
0x0000000000000510 puts@plt
0x0000000000000520 __cxa_finalize@plt
0x0000000000000530 _start
0x0000000000000560 deregister_tm_clones
0x00000000000005a0 register_tm_clones
0x00000000000005f0 __do_global_dtors_aux
0x0000000000000630 frame_dummy
0x0000000000000650 __libc_csu_init
0x00000000000006c0 __libc_csu_fini
0x00000000000006c4 _fini
Now we will try the same with Stripped Binary and we can see that as there there is no debug symbols there is no reference to the function void main() as per source code. However there are still certain debug information available. For example I can find the address of main function.
pentest@ubuntu:~/Desktop$ gcc hello.c -o helloNoDebugSymbols
pentest@ubuntu:~/Desktop$ gdb -q ./helloNoDebugSymbols
Reading symbols from ./helloNoDebugSymbols...(no debugging symbols found)...done.
(gdb) info functions
All defined functions:
Non-debugging symbols:
0x00000000000004e8 _init
0x0000000000000510 puts@plt
0x0000000000000520 __cxa_finalize@plt
0x0000000000000530 _start
0x0000000000000560 deregister_tm_clones
0x00000000000005a0 register_tm_clones
0x00000000000005f0 __do_global_dtors_aux
0x0000000000000630 frame_dummy
0x000000000000063a main
0x0000000000000650 __libc_csu_init
0x00000000000006c0 __libc_csu_fini
0x00000000000006c4 _fini
pentest@ubuntu:~/Desktop$ strip -s helloNoDebugSymbols -o helloNoDebugSymbolsStripped
pentest@ubuntu:~/Desktop$ gdb -q ./helloNoDebugSymbolsStripped
Reading symbols from ./helloNoDebugSymbolsStripped...(no debugging symbols found)...done.
(gdb) info functions
All defined functions:
Non-debugging symbols:
0x0000000000000510 puts@plt
0x0000000000000520 __cxa_finalize@plt
So that's all for this blog post. In my further posts I will talk in more details about each of the part of elf binary in more details
http://man7.org/linux/man-pages/man5/elf.5.html https://mropert.github.io/2018/02/02/pic_pie_sanitizers/
https://stackoverflow.com/questions/23033529/elf-file-generation-commands-and-options
https://www.tutorialspoint.com/gnu_debugger/gdb_debugging_symbols.htm
https://www.akashtrehan.com/different-kinds-of-executables/
http://sco.com/developers/gabi/latest/ch4.intro.html
https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
https://stackoverflow.com/questions/5311515/gcc-fpic-option