Skip to content

Latest commit

 

History

History

intro-1

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Intro 1: What is a binary, really?

In short, a binary is the output file that the computer can actually run when you compile high level code, such as C or C++. I believe in hands on learning, so we can take a look inside one to really find out.

Consider the file hello_world.c:

# include<stdio.h>
int main() {
    printf("Hello World!\n");
}

This is your average C file, more or less. It's got a main function, some includes, and a little bit of code to be run. However, your computer can't actually run it. In order to make it usable, we must compile it:

$ gcc -m32 hello_world.c -o hello_world.bin

You can ignore the -m32 argument (we'll talk about it later), but the -o hello_world.bin simply specifies what the name of the output file is going to be.

From here, we can execute it:

$ ./hello_world.bin
Hello World!

Unsurprisingly, we get "Hello World!" as output. But let's go a bit deeper. We can open gdb (GNU Debugger) and see what's happening under the hood:

$ gdb -q ./hello_world.bin
Reading symbols from ./hello_world.bin...(no debugging symbols found)...done.
gdb-peda$ disas main
Dump of assembler code for function main:
   0x0804841d <+0>:     push   %ebp
   0x0804841e <+1>:     mov    %esp,%ebp
   0x08048420 <+3>:     and    $0xfffffff0,%esp
   0x08048423 <+6>:     sub    $0x10,%esp
   0x08048426 <+9>:     movl   $0x80484d0,(%esp)
   0x0804842d <+16>:    call   0x80482f0 <puts@plt>
   0x08048432 <+21>:    leave
   0x08048433 <+22>:    ret
End of assembler dump.
gdb-peda$ quit

Your prompt probably looks like (gdb), whereas mine is gdb-peda$. Don't worry about this, my gdb is modified.

The weird code that gdb displayed is called assembly language. It's the lowest level human readable code out there. Each line maps directly to a machine instruction. Let's break this down.

0x0804841d <+0>:     push   %ebp
0x0804841e <+1>:     mov    %esp,%ebp
0x08048420 <+3>:     and    $0xfffffff0,%esp
0x08048423 <+6>:     sub    $0x10,%esp

The hex numbers you see on the left are addresses. You can think of these just like your house address: 0x0804841d is where the instruction push %ebp lives. These first four instructions are just conventions for a function, in this case main().

0x08048426 <+9>:     movl   $0x80484d0,(%esp)
0x0804842d <+16>:    call   0x80482f0 <puts@plt>

These instructions are what actually print out "Hello World!". The program moves the address of the string "Hello World!" into the memory address that %esp points to. %esp is a register, which you can think of as a special place the processor uses for storing values it needs quick access to. Each register can hold up to four bytes, usually some memory address. Our program then calls the puts() function, which prints out whatever is at the address we supplied.

0x08048432 <+21>:    leave
0x08048433 <+22>:    ret

The last two instructions return control from our main() function back to the C library, which then does some clean up and exits the program. We'll be learning more about how these binaries function in later tutorials.