Reverse Engineering

So you’ve selected reverse engineering, huh? Alright, your funeral… well let’s get you started with some helpful material to learn about reverse engineering.

Prerequisites

Before getting into reverse engineering, there are some things you will need to have. They are listed below:

Virtual Machines (A Windows one, a Linux one, and QEMU usage all come in handy)
- You won’t get a Mac VM image, if you run a Mac that uses x64 chips, Parallels is a great choice of VM
- Rather than one super bulky behemoth laptop, if you could utilize networking and ssh into different VMs hosted somewhere else, that would be a smart use of resources.
TOOLS, TOOLS, TOOLS
- Detect-It-Easy*, PE-Bear, IDA Free, Ghidra (not either, both), x64dbg, gdb, a hex editor*, sysinternals (Windows only), linux sysadmin knowledge (objdump, /proc stuff), Wireshark, dotPeek and dnSpy (.Net), Java Decompiler (IntelliJ), unpy2exe and pyREtic for Python .exe files, angr
Some experience writing programs in a compiled language like C or C++
- have a good understanding of how programs are generally written, executed, and debugged

* For each OS, so one for Linux, Windows etc

What is Reverse Engineering?

Reverse engineering is the art of figuring out a programs functionality without source code, through static analysis of disassembly instructions or dynamic analysis of how the program behaves in a debugger. This is an incredibly useful skill to possess in cybersecurity, as it is used for vulnerability assessment, exploit development and malware analysis. I would be misleading you if this short book could teach you everything you needed to know about the vast and highly technical world of reverse engineering. It can’t and not one single resource can. Reverse engineering has a steep learning curve; do not get discouraged when studying this material. It is dense, but doable.

To be a capable reverse engineer, one should know how computer programs are written and executed. Written is fair enough, the programmer types the code, e.g C, into a text editor. The programmer uses a standard of C that is C11. The programmers compiler will know C11 and if the programmer typed everything correctly, the compiler will take that C source code, add the necessary include files, link any objects, compile instructions to assembly and an executable will be made. All that’s left then is for the programmer or another user to execute that program or debug it using a debugger. Following along so far?

Executed is a very different story. Once executed, a program becomes a process. A process is defined by having one or more threads of execution. A process is the logical representation of that compiled executable in memory and can be either executing some instructions, awaiting user input or network I/O, or some other activity. A process is governed by the operating system; the OS will decide how much access to resources the process has. A resource can be anything from hard disk access, to memory usage, to CPU time etc. Understanding how a process behaves over its lifecycle and how it interacts with the OS and the computer is a vital skill for a reverser.

Unfortunately, these concepts are difficult and take time to solidify in ones head.
If time is no barrier and you really want to dive into the low level of software, here’s what you’ll need.

Programming skills: I would only recommend C, nothing else teaches you memory or low level stuff better. The C Programming Language and C Programming: A Modern Approach (both books) are excellent introductions to the language. Funnily enough, there are sites that will allow someone to download a .pdf of these books for free? Mental….

Computer Architecture skills: Having some experience with assembly language as a programming language is so beneficial that I couldn’t leave it out. I recommend this resource every time, because it’s the best.

Once you have got C and some assembly language down, try playing around with Godbolt.

Operating System skills: OSTEP is the standard here. For Windows, Windows Internals is the only book and The Linux Programming Interface is a worthwhile read for Linux, obviously.

First Things First…

What The |>h|_|(|< am I looking at?

This is the first question to be answered when reverse engineering anything.

You should be able to identify the file you are analysing. Some CTF challenges will explain that this is a script or an executable etc but until you know for certain then you can’t proceed properly. How will you know for sure what you are looking at? I’d recommend a hex editor. There isn’t a file that can’t be opened by a hex editor and certain elite malware analysts can do their whole job from a hex editor.

Once you’ve opened the file, if you start seeing some English words and programming like structure, it’s a script. But what kind of script? Is it powershell, vbscript, javascript, bash, a batch file, autoit, Python, Ruby etc? Easiest way to be certain is to google script keywords and syntax. There are tools that will do this for you, but as you grow as a reverse engineer, you will be able to read most languages on sight and these tools become redundant. Let’s see what you can do. What are these scripting languages?
1:

Obfuscated Scripts

Rarely will scripts be given to you in a clean and readable format. They will be either obfuscated (javascript, vbscript, powershell etc) or embedded within executables (autoit, python). Here’s an example of some obfuscated javascript.

The main point of obfuscation is to make the reverse engineers job much harder than it needs to be. If you encounter an obfuscated script, here’s a few things you can try:

Is it javascript? Use https://deobfuscate.io/ and see if you can get an easy win that way.
1. If that didn’t work and you’re still certain it’s javascript, spin up a simple http server, and write your obfuscated code to a <p> variable in a HTML file, and serve up the HTML file on the server. Notepad++ is a great tool for reverse engineering scripts across most platforms, and will allow a user to view their HTML files on localhost using Run -> Launch in <browser>. BEFORE RUNNING ANYTHING, make sure that you swap out any exec() or eval() calls if present in the javascript and get it to print to the HTML <p> tag instead. Here’s a list of methods you can try.
2. I prefer the document write method, but whatever works for you.
If it isn’t javascript, what do? If Powershell, debug it in Windows Powershell ISE, if vbscript try the same trick by swapping out exec for print statements and running the program with cscript.exe on Windows. If Python, Ruby, Perl etc, try using an IDE for that language and debug accordingly.

It’s not a script, wat do!?

So you’ve opened a file and you’re seeing a bunch of 0x00s, a warning about not running on MS-DOS, and further garbage?

You’re looking at a PE file from Windows. The MZat the start is the first obvious indication. It’s the file magic number and you will see it in a lot of places in Windows. Other file formats include ELF and Mach-O for *nix and Mac respectively. If you encounter a file you haven’t seen before, google the first few bytes. Those are usually the magic number and should tell you what you are looking at. If the file you are looking at is a .Net executable or a Java JAR file, they can be decompiled pretty easy using dotPeek/dnSpy and Java Auto Decompiler.

One tool that makes this easy is Detect-It-Easy which will also be able to tell you if a file is packed or obfuscated and what program did the packing/obfuscating. Highly recommended.

Game Plans:

In General:

Snapshot your VM
Get the file type
Look for strings using strings
If a script, what kind of script?
- How to execute this script safely?
- Is it obfuscated? How can I break this?
If a binary, what kind?
What was it compiled for? x86? x64? ARM?
Figure out the calling convention if it’s a binary.
- Once you get the calling convention, you will be able to see in a debugger where any function arguments are
Figure out how the binary will make system calls
- This will be different across ARM, x64, Windows, Linux etc
- Once you know how a system call is made, and how to ID the system call (MIPS makes this a pain tbh), you can google documentation for each syscall and get a better understanding of what a program is doing.
- Understand that debugging a system call isn’t going to work; you can only step in so far in user mode. Unless you suspect that the system call is hooked (iykyk), just step over them in a debugger and let the OS do it’s thing.
Encryption routines in disassembly are basically a stack frame that will take some arguments, return some value (or modify data at a pointer and return true/false to indicate success) but the difference is that encryption/decryption routines will be a stack frame that is relatively devoid of syscalls with a shit tonne of XOR operations in a loop. Some of the best places to leave a break point are at the start of one of these functions to see if you can grab a key, or at the end to see decrypted output.
There are certain syscalls in Windows and Linux that will handle encryption/decryption. Learn what they are and break on them if found and examine their arguments.

Linux is pretty straightforward with this kind of thing. It’s always an ELF, disassemble and debug. GDB is your friend. Windows is a little bit more finicky….

Windows:

Analyse the file, Detect-It-Easy, PEBear, and get info on the following:
- Snapshot your VM
- Is it packed?
- Is it a .dll, a .exe, a .sys, a .zip etc
RunDll32 will allow you to debug a .dll, or execute it in a sandbox if you have one setup
- - Just attach your debugger to the rundll32 process and step until your dll is loaded
  - This can be done through IDA Free also for a lovely disassembly/debug experieince
A .sys is a driver. A pain in the hole basically. I hope you like documentation…
- Windbg will be essential, along with a secondary windows target VM as you could need to debug the entire OS as your target driver gets loaded if static analysis fails. Windbg will work over the network from your own OS if that’s the case. Rarely should you see this, driver dev on Windows is punishing enough that releasing drivers in CTFs could BSOD player machines if a mistake is made.
.exes you should be able to debug normally using x64-dbg or IDA Free. See if you can manipulate the flow of control using the debugger by controlling what EIP/RIP points to. Sometimes this can be the fastest way to make a program do what you want, like print a flag.
When you encounter something like CreateFile or GetProcessHeap, google the function name and add “MSDN”to get the documentation for each function. Simply mapping the system calls and learning what they do can really open up your knowledge of what a program is doing.
Make sure you are comfortable with dynamic loading of syscalls. This is very common in Windows. It’ll make things harder to track in a debugger. I recommend IDA Free, as while you debug, you can mark the function pointer the dynamically loaded function is assigned to as having a name, that way you can easily see it when it used again.

What if it’s packed?

Detect It Easy should be able to tell you, down to the version used. If it can’t, examine the file in a hex editor for something like a program name. Google is your friend here. There’s a lot of unpackers available to help security analysts. If it’s upx, just use upx -d. A word of warning though. Using upx -d to unpack a Windows exe on Linux won’t work. You’ll need a windows VM

What if it’s Rust or Go and I don’t know if I’m looking at standard lib code or not?

IDA FLIRT sigs make this easy, but for a CTF, it’s generally a toy program. So, if you are having a hard time finding main I would recommend compiling a hello world in each, and taking a few minutes to find your way to main in hello world. It’ll be the same for your CTF binary.

Go strings are one large contiguous block in the binary, with string variables pointing to their assigned place in this large block. Try a grep for a flag if you must, but it’s rarely that easy.

The challenge notes say “VM” and they’ve given me a file of absolute crap?

You’re debugging a virtual machine. That program will read in the opcodes in the file of absolute crap and depending on what it reads, will call a function to do something else. There are many ways to code a VM. Some things to consider:

Debug the VM
Step carefully, watching as it reads the file of absolute crap
Does it take 2 bytes, 4 bytes from the file at a time? Does it read it in all at once and then parse it? Make notes
What bytes correspond to what function call?
- Given some time, you should be able to map out what the bytecode file is doing
Once you know how the opcode is read, evaluated and the corresponding function looked up and executed, you have reversed the VM. Depending on the challenge, you may need to get the VM to perform some action based on opcodes you provide.
These are complex challenges, and not for the easily bored.

This is a binary like a keygen except it seems I need to enter the flag correctly?

Fuck that noise. Angr.

Helpful Links

John Hammond has a pretty good YouTube series where he goes through PicoCTF reversing challenges and shows his solutions. You can follow along with that series here.

A great place to find “crackme” style challenges is on Crackmes.One where they have a big library.

Another place to practice is PicoGym where there are some good challenges.