Continued: Notes from the Offensive Security course at NYU by Ian Dupont
Reverse Engineering
Basic Tools
filestringsreadelf -Wsfor symbols orreadelf -Wlfor program header and segmentsobjdump -M intel --disassemble=main- pwntool’s
ELF()function- can use
u64()to convert the bytes give in our programs to the address p64()function to convert from address to bytesp64(0x5000 + 0x43f8)- alternative to
sendline(b"hex_string_bytes") - can send in decimal using
str(hex_address).encode()
- can use
checksecgdbwithgeforpwndbginfo proc mappingsin gdb to see the base addressvmmapto see if the address belongs to the segment
- pwntool’s
gdbmodule (with functions likedebug)context.terminal = ["tmux", "splitw", "-h", "-f"]
Registers
- each general purpose register can hold up to 8B data
rbp: reserved for stack base pointerrsp: current stack pointer- Registers Reference
Calling Convention
- Linux default is System V AMD64 ABI
- The first is how arguments are loaded in preparation for a function call.
- The first six arguments are stored in registers
rdi, rsi, rdx, rcx, r8, r9, in that order.
- The first six arguments are stored in registers
- The calling convention also states that the return value (if there is one) is loaded into
raxas the function ends. - execution begins in the called function, which starts with a prologue
- The prologue starts with
push rbp, which pushes the existing (calling function’s) return base pointer to the stack. This is necessary so the program can restorerbpas it leaves the function. rbpalways points to the top of the current function’s frame, so it is moved to whererspcurrently is to mark the top of the stack withmov rbp, rsp. Now we have the top of the current function’s stack marked and are ready to reserve space for our function.
- The prologue starts with
- epilogue undoes the prologue’s operations so that the program returns to the calling function with its stack set as it was at the point of a call.
- The
leaveinstruction movesrspback to where it was when it pushedrbpand pops that value intorbp. Now,rbppoints to the top of the prior function’s stack like it did when the program entered the function - The
retinstruction returns from this function by popping the next value on the stack—for example, the address of the next instruction in__libc_start_call_mainpushed by thecallinstruction—intorip.
- The
Dealing with structures
- recognize structures in decompiled code with large mallocs
- structures are usually padded, can have an argument for the structures to be packed so that there’s no wastage of memory
- structures are usually seen with weird pointer calculations
- to figure out the size of a structure, can look at the bit shift happening before assignment to its various fields is made
- “create structure (shortcut S)” in binja to set something as a structure and see how ugly or pretty it makes our code
- define a struture just based on the size, see if it makes sense, then start adding the fields of the structure
- in structures: .b = byte .w word .d double word .q quad word
- packing usually doesn’t apply to the end: so you can guess the sizeof the last field in the struct to be whatever is left
- if you don’t understand what’s going on with padding and can’t determine the struct, you can also run it in gdb, give the program some data, and see how it is laid out in memory. specially useful when there is an array of structures. (rewards week 3 ctf problem - see writeup for more)
Stripped binaries
- to figure out the entry point, might be able to see _start in
readelf -Wl- can read it to find the main function, easy to do in binja
- statically linked stripped binaries are super hard to reverse
- Hand Rolling: some functionality might be hand written for optimization instead of using glibc or other imports
- syscalls can be used to determine what’s happening
0f05instruction is syscall.- can be used as an entry point by searching in binja for hex string 0f05
- look at the registers when syscall is called to figure out what’s happening
- see strings to figure out where to start your reversing too. can determine the strings by running the program.
- if the last value of an array in a function is being checked to be 0, likely the data type is string
- based on memset, can retype global variables from void or char to char [0x100]
Protocol Reversing
- can get complicated since we’re dealing with raw bytes instead of printable chars
- understand how the data buffer packet should look like using checks that are done on the packet
- protocols might have an end pointer which is looked at at the start since they are usually checked linearly
- useful to create the packet header separately than the payload so that you can reuse the header and change the payload definitions (think ip header constant, tcp header or udp header as payload)
- need to be able to write a client to interact with the protocol server
- def prepare_header; def build_int_payload;
- step through gdb with your binary to see what checks are happening and how to pass them to build a client
SAT Solvers
- Z3 and Angr
pip install z3-solver
- solving constrained problems
- with z3, can define integers as BitVec(32) so that it applies bitwise operations and overflows to solve our program as well
- things might be unsolvable with Int types, and may be solvable with BitVec type
Tips
- set relative function offsets in binja if you’re debugging through gdb at the same time