Reverse Engineering Delphi Binaries in Ghidra with Dhrake

I have spent some time reverse engineering Delphi binaries with IDA & HexRays at work, but IDA tends to make a few mistakes and I wrote a few scripts to fix them. Then [Ghidra](https://ghidra-sre.org/) came along and I was very curious to know how it would fare against some of the Delphi malware that I know and ~~loathe~~ love. I'd say it does about as bad as IDA, and so I went on a journey to rewrite my scripts from work as Ghidra scripts. TL/DR; [The scripts are on GitHub](https://github.com/huettenhain/dhrake/). <span id="more-4953"></span> The project is called **Dhrake**, which is short for _"Delphi hand rake"_. Moving on. #### Introduction I hope that by putting the code out there, people will be able to copy & paste relevant portions of it, because the scripts do a few things that you regularly want to automate. In this accompanying article, I will focus on the initial repairs that [Dhrake](https://github.com/huettenhain/dhrake/) performs on a freshly analyzed Delphi binary in Ghidra and I will showcase these features by analyzing [the Delphi virus](https://www.virustotal.com/gui/file/93873e9ee0c14e659d11e280acd6ac109f52bc78e294953371dd58ff8f6cf787/detection) with the following SHA-256 hash: ``` 93873e9ee0c14e659d11e280acd6ac109f52bc78e294953371dd58ff8f6cf787 ``` If you load this into vanilla Ghidra 9.1.1, it decompiles the entry point as follows: ```c void entry(void) { undefined *puVar1; FUN_00407268(&DAT_00470e14); puVar1 = PTR_DAT_00475330; FUN_00458838(); *(undefined *)(*(int *)puVar1 + 0x5b) = 0; FUN_0045a57c(*(int *)puVar1,'\x01'); FUN_00458850(*(int *)puVar1,(int)PTR_PTR_LAB_00470ae0,(int **)PTR_DAT_0047525c); FUN_00458850(*(int *)puVar1,(int)PTR_PTR_FUN_00470590,(int **)PTR_DAT_00475368); FUN_004588d0(*(int *)puVar1); /* WARNING: Subroutine does not return */ FUN_00405134(); } ``` To be very precise, I had to `__declspec(noreturn)` the call `FUN_00405134` to avoid some faulty output, but that's irrelevant (unless you try to reproduce the above and get confused). I'll not torture you with any suspense here: These are not the droids we are looking for. In fact, this is framework code which likely creates two forms, the metadata of which is stored at `PTR_PTR_LAB_00470ae0` and `PTR_PTR_FUN_00470590`, respectively. The real action will be in the `FormCreate` or `FormShow` handler for one of these, but seeing as Ghidra did not infer the symbol names for all the functions, we have no idea where that would be. #### Enter IDR The good news is that there is a program that expertly extracts symbols from Delphi binaries (and actually, it can do a lot more, but said capability is all we really care about here). It's the **I**nteractive **D**elphi **R**econstructor, or **IDR**. The good news is that [it is open source](https://github.com/crypto2011/IDR), the bad news is that it is a little cumbersome to build. I really wanted a version of this that I could run outside a VM, so I decided to build one from source myself. I have [uploaded my build of IDR to GitHub](https://github.com/huettenhain/dhrake/releases), but I refuse to take any responsibility for that binary, like, at all. I do run this on my private Windows machine, outside a VM, but if you want to be on the safe side, you'll have to compile it yourself or do it inside a VM. When you open a Delphi binary with IDR, it will ask you whether it should use the _"native Knowledge Base"_. I only have a vague inkling of what that means, but _"Yes"_ is the correct answer to this question. After working for a bit, IDR will calm down and you will head to **Tools**, **IDC Generator**. It will generate an IDC file, which is a scripting language supported by IDA. Note that this file can be **huge** for large Delphi binaries. That's normal. They compress really well. Lucky for us, we don't want to run the script. We are only interested in the entries of this script that say something like: ``` MakeNameEx(0x470C74, "TfrmMain.FormCreate", 0x20); ``` That's right, delicious symbols, ripe for the taking. Creating named labels is rather straightforward using the Ghidra scripting API: 1. Try to use `getFunctionAt` in case the address is that of a function, then use its `setName` method to change the name. 2. If `getFunctionAt` returns `null`, use `currentProgram.getSymbolTable()` to get the current symbol table and use its `createLabel` method to create a label at the given address. After having renamed all the symbols based on the information from the IDC generated by IDR, Ghidra shows the following decompiled code at the entry point: ```c func_0x00458850(*(undefined4 *)temp_3f35b72ea6,VMT_470AE0_TfrmMain,gvar_0047525C); func_0x00458850(*(undefined4 *)temp_3f35b72ea6,VMT_470590_TDM,gvar_00475368); ``` So we suspect that `TfrmMain` is the name of the main form class and we search the symbol table for `TfrmMain.FormCreate`. Lucky us, it's there at offset `00470c74`: ```c void TfrmMain.FormCreate(int *param_1) { int *piVar1; undefined4 extraout_ECX; undefined4 extraout_EDX; undefined4 *in_FS_OFFSET; undefined4 uStack28; undefined *puStack24; undefined *puStack20; puStack20 = &stack0xfffffffc; puStack24 = &LAB_00470cef; uStack28 = *in_FS_OFFSET; *(undefined4 **)in_FS_OFFSET = &uStack28; piVar1 = TTimer.Create((int *)VMT_42D4DC_TTimer,'\x01',param_1); *(int **)(param_1 + 0xd8) = piVar1; TTimer.SetEnabled((int)piVar1,'\0'); TTimer.SetOnTimer(param_1[0xd8],extraout_EDX,extraout_ECX,FUN_00470d48,param_1); TTimer.SetInterval(param_1[0xd8],5000); FUN_00470638(*(undefined4 *)gvar_00475368,param_1); TTimer.SetEnabled(param_1[0xd8],'\x01'); *in_FS_OFFSET = uStack28; return; } ``` It is worth mentioning at this point that the following lines at the beginning of the code constitute what I believe to be a try/catch frame and for the sake of malware analysis, I'd recommend to mostly ignore them when you recognize the pattern. It most prominently appears at the beginning of functions: ```c puStack20 = &stack0xfffffffc; puStack24 = &LAB_00470cef; uStack28 = *in_FS_OFFSET; *(undefined4 **)in_FS_OFFSET = &uStack28; ``` For the sake of this presentation, we decide to investigate the unknown function `FUN_00470638` first. Without the repairs done by Dhrake, the call to `@LStrCatN` at offset `004706aa` will be interpreted as follows in the decompiled code: ```c TApplication.GetExeName(*(undefined4 *)Application,&local_c); ExtractFilePath((int)local_c,&local_8); TApplication.GetExeName(*(undefined4 *)Application,&local_14); ExtractFileName((int)local_14,&local_10); plVar1 = local_10; @LStrCatN((char **)&local_8,3); ``` After running Dhrake however and some refactoring, it should look as follows: ```c TApplication.GetExeName(*Application,&Tmp1); ExtractFilePath(Tmp1,&FilePath_); FilePath = FilePath_; TApplication.GetExeName(*Application,&Tmp2); ExtractFileName(Tmp2,&FileName); @LStrCatN(&FilePath_,3,__dummy_170,FileName,"g",FilePath); ``` That makes a lot more sense: The virus is building a new path which is basically the same as the path to the current executable, except the file name has the letter `g` prepended to it. Spoiler alert: This is how the virus _"infects"_ files, it simply renames them by prepending the letter `g`, hides them, and creates a copy of itself in their place. It also steals their icon to look more like the real thing. But what is `__dummy_170`? You can safely ignore it, but if you want to understand it, read on. #### How To Fix Function Signatures Fixing the function signatures of other functions such as `@LStrCmp` was rather straightforward, and I will not go into detail here, you can probably get that easily from [the code](https://github.com/huettenhain/dhrake/blob/master/DhrakeInit.java). However, I want to talk a little bit about fixing the function signature for `@LStrCatN` and its two friends, `@WStrCatN` and `@UStrCatN`, because [I had a bit of trouble with that](https://github.com/NationalSecurityAgency/ghidra/issues/1258). The function signature of `@LStrCatN` is arguably the following: ```c void @LStrCatN (char * * Result, uint Count, ...); ``` and that is also what Dhrake sets it to. Here, `Result` is passed in `EAX`, `Count` is passed in `EDX`, and then `Count` many arguments are supposed to be passed on the stack: These are the strings that are to be concatenated into the char pointer pointed to by `Result`. Sadly, Ghidra does not correctly infer these stack arguments. IDA doesn't do a perfect job either, it's just a tricky problem that requires specialized code to handle. In this case, the only option that I could think of to fix the calls was to place a function call signature override at every address where `@LStrCatN` is called. To do so, I had to: 1. List all `CALL`s to `@LStrCatN` and for each of these, 2. determine the value of `Count`, 3. and override the function call signature to reflect this value. The first part is straightforward, there are flat API functions called `getGlobalFunctions` and `getReferencesTo` which allow you to iterate over all calls to functions named `@LStrCatN`. The second part is a little tricky, but I want to talk about the third part first, because it will explain the `__dummy_170` variable we saw above. So, remember the call to `@LStrCatN` at offset `004706aa`? I should override that call with the following signature, right? ```c void @LStrCatN(char **, uint, char *, char *, char *) ``` Well ... <img decoding="async" src="https://blag.nullteilerfrei.de/wp-content/uploads/2019/12/ntf.gif" class="aligncenter size-full" style="position:relative;border:4px solid #ddd;width:300px;height:201px;z-index:20;display:block" /> <hr style="position: relative; top:-125px; border:2px solid #ddd;z-index:0;margin-bottom:-10px"> You see, `Result` is passed in `EAX`, and then `Count` is passed in `EDX`, and then the Delphi calling convention expects a third parameter to be passed in `ECX`, and only _then_ will it expect anything on the stack. And there you have it: The best idea that I could come up with is to simply have a dummy parameter fill the `ECX` slot: ```c void @LStrCatN(char **, uint, uint, char *, char *, char *) ``` So, you can literally just ignore it, or rather, you _should_ really ignore it, because it does not actually correspond to any parameter that is being passed to `@LStrCatN` at all. I would prefer to use "custom storage" for the call signature override to specify exactly where the arguments come from, but it seems like the `HighFunctionDBUtil.writeOverride` only accepts a `FunctionSignature` argument, whose implementations do not seem to allow custom storage definitions. Hence, we are stuck with the dummy variable lest we don't want to see the concatenated strings at all. Finally, I will briefly talk about determining the `Count` parameter. The **proper** way to do this is implemented in the function `getConstantCallArgument` in Dhrake: ```java private long getConstantCallArgument(Function caller, Address addr, int index) throws IllegalStateException { // This is a very reliable and slow fallback to determine the value of a constant argument // to a function call at a given address within a given function. monitor.setMessage("obtaining decompiler interface"); DecompInterface decompInterface = new DecompInterface(); decompInterface.openProgram(currentProgram); monitor.setMessage("decompiling"); DecompileResults decompileResults = decompInterface.decompileFunction(caller, 120, monitor); if (!decompileResults.decompileCompleted()) throw new IllegalStateException(); monitor.setMessage("searching for call argument"); HighFunction highFunction = decompileResults.getHighFunction(); Iterator<PcodeOpAST> pCodes = highFunction.getPcodeOps(addr); while (pCodes.hasNext()) { PcodeOpAST instruction = pCodes.next(); if (instruction.getOpcode() == PcodeOp.CALL) { Varnode argument = instruction.getInput(index); if (!argument.isConstant()) throw new IllegalStateException(); return argument.getOffset(); } } throw new IllegalStateException(); } ``` It decompiles the function that has a call to `@LStrCatN`, say, at a given address and uses the generated PCode information to obtain the `index`-th argument to that invocation, assuming that the argument is a constant value. The only real gotcha I experienced when writing this code was the fact that `argument.getOffset()` is actually the correct way to obtain the value of a constant `Varnode`. Anyway, this is **unbelievably slow** in practice, and it seems like every `CALL @LStrCatN` is always preceded by a `MOV EDX, Count` with `Count` being an immediate. Therefore, I use `getConstantCallArgument` only as a fallback and essentially, I use the following code to determine `Count`: ```java private long getStrCatCount(Function caller, Address addr) { // Usually, the second (constant) argument to a *StrCatN function is assigned to // the EDX register right before the call instruction. This method attempts to // read the value by parsing the disassembly first and falls back to a decompiler // based approach if any assumption fails. try { Instruction insn = this.getInstructionBefore(addr); if (insn == null || insn.getNumOperands() != 2) return this.getConstantCallArgument(caller, addr, 2); Object EDX[] = insn.getOpObjects(0); Object IMM[] = insn.getOpObjects(1); if (insn.getOperandRefType(0) != RefType.WRITE) return this.getConstantCallArgument(caller, addr, 2); if (EDX.length != 1 || !(EDX[0] instanceof Register)) return this.getConstantCallArgument(caller, addr, 2); if ( ( (Register) EDX[0]).getName().compareTo("EDX")) return this.getConstantCallArgument(caller, addr, 2); if (IMM.length != 1 || !(IMM[0] instanceof Scalar)) return this.getConstantCallArgument(caller, addr, 2); return ((Scalar) IMM[0]).getUnsignedValue(); } catch (IllegalStateException e) { return -1; } } ``` The code should be relatively self-explanatory. We assume that the instruction right before the `CALL` is a `MOV` which moves an immediate into a register called `EDX`. If any of these assumptions fail, we do some heavy lifting. Otherwise, we just return that immediate. #### Summary - I give up scripting Ghidra in Python, the Java API has so much more support. - The Java API is really well documented. - Writing Ghidra scripts in Eclipse is really comfortable. - Oh god what have I become

Tags: delphi - Dhrake - ghidra - Java - LStrCatN - malware - PCode - reverse engineering - script

12 Replies to “Reverse Engineering Delphi Binaries in Ghidra with Dhrake”

rattle says:

2019-12-23 at 8:08 pm

PS: If somebody with access to VT could upload ``` 93873e9ee0c14e659d11e280acd6ac109f52bc78e294953371dd58ff8f6cf787 ``` to [malshare](https://www.malshare.com) ... *that'd be **great***.

1. MC says:
  
  2020-02-03 at 12:09 pm
  
  https://malshare.com/sample.php?action=detail&hash=7765d4ee9fce0c5560e08ec40f203047
  
rattle says:

2020-02-04 at 12:12 am

Nice.

JR says:

2020-02-27 at 9:01 am

Thank you for your work! Quite useful

Sam says:

2020-02-29 at 4:38 pm

How do you run the Dhrake (Java) files in Ghidra? Do you have an extension jar-file somewhere?

1. rattle says:
  
  2020-02-29 at 5:17 pm
  
  In Ghidra, open the application menu item **Window**, and select **Script Manager**, which is the one with the green "play button" style logo. The script manager has a toolbar with small icons at the top, and the one left to the big red plus symbol should have the hover text **Script Directories**. Click that button and use it to manage and view the directories where Ghidra will search for `.java` files that will be available in the script manager. Copy the Dhrake files into one of these directories and they should appear in the category **Delphi** in the script manager, from where you can run them by clicking the green play button in the toolbar, or via context menu. I hope this works, let me know if you get stuck.
  
  1. Sam says:
    
    2020-02-29 at 8:34 pm
    
    Thanks, that's what I was looking for. It is working now.
2. rattle says:
  
  2020-02-29 at 5:19 pm
  
  I realize I didn't really answer the question. A shorter answer would be: Dhrake is not a plugin or extension, it is just two script files that are run via the script manager.
  
Ben says:

2021-10-14 at 7:59 pm

When I run drake init on latest Ghidra with a 550MB IDC I get the following error in the Ghidra console: DhrakeInit.java> Running... DhrakeInit.java> [Dhrake] file not found: C:BigOl.idc DhrakeInit.java> Finished! The file exists there as i just selected it from the popup dialogue. Any ideas what is going on?

1. rattle says:
  
  2021-10-14 at 8:32 pm
  
  Should be fixed now, although I still don't know what caused it.
  
Fred Urble says:

2025-06-06 at 5:03 am

I ran your revised IDR version, analyzed the EXE in Ghidra, ran the DHRAKEINIT, found the VMT's and pressed F8 on each one of them to run DHRAKEPARSECLASS, but looks like they did not do anything but replace the address with the name on the "addr" code line. Looking at my output decompiled code, I do see lots of "undefined 1", etc. and saw that you had that too until you did "refactoring" to get cleaner code. Can you describe how/what you mean by refactoring? Thanks!

1. rattle says:
  
  2025-06-08 at 9:36 am
  
  By 'refactoring', I only mean renaming of local variables. The symbol recovery should all come from IDR.

Reverse Engineering Delphi Binaries in Ghidra with Dhrake

12 Replies to “Reverse Engineering Delphi Binaries in Ghidra with Dhrake”

Leave a Reply Cancel reply