Continued: Notes from the Offensive Security course at NYU by Ian Dupont

Reverse Engineering

Basic Tools

  • file
  • strings
  • readelf -Ws for symbols or readelf -Wl for program header and segments
  • objdump -M intel --disassemble=main
  • pwntool’s ELF() function
    • can use u64() to convert the bytes give in our programs to the address
    • p64() function to convert from address to bytes
      • p64(0x5000 + 0x43f8)
      • alternative to sendline(b"hex_string_bytes")
      • can send in decimal using str(hex_address).encode()
  • checksec
  • gdb with gef or pwndbg
    • info proc mappings in gdb to see the base address
    • vmmap to see if the address belongs to the segment
  • pwntool’s gdb module (with functions like debug)
    • context.terminal = ["tmux", "splitw", "-h", "-f"]

Registers

  • each general purpose register can hold up to 8B data
  • rbp: reserved for stack base pointer
  • rsp: current stack pointer
  • Registers Reference

Calling Convention

  • Linux default is System V AMD64 ABI
  • The first is how arguments are loaded in preparation for a function call.
    • The first six arguments are stored in registers rdi, rsi, rdx, rcx, r8, r9, in that order.
  • The calling convention also states that the return value (if there is one) is loaded into rax as the function ends.
  • execution begins in the called function, which starts with a prologue
    • The prologue starts with push rbp, which pushes the existing (calling function’s) return base pointer to the stack. This is necessary so the program can restore rbp as it leaves the function.
    • rbp always points to the top of the current function’s frame, so it is moved to where rsp currently is to mark the top of the stack with mov rbp, rsp. Now we have the top of the current function’s stack marked and are ready to reserve space for our function.
  • epilogue undoes the prologue’s operations so that the program returns to the calling function with its stack set as it was at the point of a call.
    • The leave instruction moves rsp back to where it was when it pushed rbp and pops that value into rbp. Now, rbp points to the top of the prior function’s stack like it did when the program entered the function
    • The ret instruction returns from this function by popping the next value on the stack—for example, the address of the next instruction in __libc_start_call_main pushed by the call instruction—into rip.

Dealing with structures

  • recognize structures in decompiled code with large mallocs
  • structures are usually padded, can have an argument for the structures to be packed so that there’s no wastage of memory
  • structures are usually seen with weird pointer calculations
  • to figure out the size of a structure, can look at the bit shift happening before assignment to its various fields is made
  • “create structure (shortcut S)” in binja to set something as a structure and see how ugly or pretty it makes our code
  • define a struture just based on the size, see if it makes sense, then start adding the fields of the structure
  • in structures: .b = byte .w word .d double word .q quad word
  • packing usually doesn’t apply to the end: so you can guess the sizeof the last field in the struct to be whatever is left
  • if you don’t understand what’s going on with padding and can’t determine the struct, you can also run it in gdb, give the program some data, and see how it is laid out in memory. specially useful when there is an array of structures. (rewards week 3 ctf problem - see writeup for more)

Stripped binaries

  • to figure out the entry point, might be able to see _start in readelf -Wl
    • can read it to find the main function, easy to do in binja
  • statically linked stripped binaries are super hard to reverse
  • Hand Rolling: some functionality might be hand written for optimization instead of using glibc or other imports
  • syscalls can be used to determine what’s happening 0f05 instruction is syscall.
    • can be used as an entry point by searching in binja for hex string 0f05
    • look at the registers when syscall is called to figure out what’s happening
  • see strings to figure out where to start your reversing too. can determine the strings by running the program.
  • if the last value of an array in a function is being checked to be 0, likely the data type is string
  • based on memset, can retype global variables from void or char to char [0x100]

Protocol Reversing

  • can get complicated since we’re dealing with raw bytes instead of printable chars
  • understand how the data buffer packet should look like using checks that are done on the packet
  • protocols might have an end pointer which is looked at at the start since they are usually checked linearly
  • useful to create the packet header separately than the payload so that you can reuse the header and change the payload definitions (think ip header constant, tcp header or udp header as payload)
  • need to be able to write a client to interact with the protocol server
    • def prepare_header; def build_int_payload;
  • step through gdb with your binary to see what checks are happening and how to pass them to build a client

SAT Solvers

  • Z3 and Angr
    • pip install z3-solver
  • solving constrained problems
  • with z3, can define integers as BitVec(32) so that it applies bitwise operations and overflows to solve our program as well
  • things might be unsolvable with Int types, and may be solvable with BitVec type

Tips

  • set relative function offsets in binja if you’re debugging through gdb at the same time