I have spent some time reverse engineering Delphi binaries with IDA & HexRays at work, but IDA tends to make a few mistakes and I wrote a few scripts to fix them. Then [Ghidra](https://ghidra-sre.org/) came along and I was very curious to know how it would fare against some of the Delphi malware that I know and ~~loathe~~ love. I'd say it does about as bad as IDA, and so I went on a journey to rewrite my scripts from work as Ghidra scripts. TL/DR; [The scripts are on GitHub](https://github.com/huettenhain/dhrake/). The project is called **Dhrake**, which is short for _"Delphi hand rake"_. Moving on.
#### Introduction
I hope that by putting the code out there, people will be able to copy & paste relevant portions of it, because the scripts do a few things that you regularly want to automate. In this accompanying article, I will focus on the initial repairs that [Dhrake](https://github.com/huettenhain/dhrake/) performs on a freshly analyzed Delphi binary in Ghidra and I will showcase these features by analyzing [the Delphi virus](https://www.virustotal.com/gui/file/93873e9ee0c14e659d11e280acd6ac109f52bc78e294953371dd58ff8f6cf787/detection) with the following SHA-256 hash:
You see,
93873e9ee0c14e659d11e280acd6ac109f52bc78e294953371dd58ff8f6cf787
If you load this into vanilla Ghidra 9.1.1, it decompiles the entry point as follows:
void entry(void)
{
undefined *puVar1;
FUN_00407268(&DAT_00470e14);
puVar1 = PTR_DAT_00475330;
FUN_00458838();
*(undefined *)(*(int *)puVar1 + 0x5b) = 0;
FUN_0045a57c(*(int *)puVar1,'\x01');
FUN_00458850(*(int *)puVar1,(int)PTR_PTR_LAB_00470ae0,(int **)PTR_DAT_0047525c);
FUN_00458850(*(int *)puVar1,(int)PTR_PTR_FUN_00470590,(int **)PTR_DAT_00475368);
FUN_004588d0(*(int *)puVar1);
/* WARNING: Subroutine does not return */
FUN_00405134();
}
To be very precise, I had to __declspec(noreturn)
the call FUN_00405134
to avoid some faulty output, but that's irrelevant (unless you try to reproduce the above and get confused). I'll not torture you with any suspense here: These are not the droids we are looking for. In fact, this is framework code which likely creates two forms, the metadata of which is stored at PTR_PTR_LAB_00470ae0
and PTR_PTR_FUN_00470590
, respectively. The real action will be in the FormCreate
or FormShow
handler for one of these, but seeing as Ghidra did not infer the symbol names for all the functions, we have no idea where that would be.
#### Enter IDR
The good news is that there is a program that expertly extracts symbols from Delphi binaries (and actually, it can do a lot more, but said capability is all we really care about here). It's the **I**nteractive **D**elphi **R**econstructor, or **IDR**. The good news is that [it is open source](https://github.com/crypto2011/IDR), the bad news is that it is a little cumbersome to build. I really wanted a version of this that I could run outside a VM, so I decided to build one from source myself. I have [uploaded my build of IDR to GitHub](https://github.com/huettenhain/dhrake/releases), but I refuse to take any responsibility for that binary, like, at all. I do run this on my private Windows machine, outside a VM, but if you want to be on the safe side, you'll have to compile it yourself or do it inside a VM.
When you open a Delphi binary with IDR, it will ask you whether it should use the _"native Knowledge Base"_. I only have a vague inkling of what that means, but _"Yes"_ is the correct answer to this question. After working for a bit, IDR will calm down and you will head to **Tools**, **IDC Generator**. It will generate an IDC file, which is a scripting language supported by IDA. Note that this file can be **huge** for large Delphi binaries. That's normal. They compress really well. Lucky for us, we don't want to run the script. We are only interested in the entries of this script that say something like:
MakeNameEx(0x470C74, "TfrmMain.FormCreate", 0x20);
That's right, delicious symbols, ripe for the taking. Creating named labels is rather straightforward using the Ghidra scripting API:
1. Try to use getFunctionAt
in case the address is that of a function, then use its setName
method to change the name.
2. If getFunctionAt
returns null
, use currentProgram.getSymbolTable()
to get the current symbol table and use its createLabel
method to create a label at the given address.
After having renamed all the symbols based on the information from the IDC generated by IDR, Ghidra shows the following decompiled code at the entry point:
func_0x00458850(*(undefined4 *)temp_3f35b72ea6,VMT_470AE0_TfrmMain,gvar_0047525C);
func_0x00458850(*(undefined4 *)temp_3f35b72ea6,VMT_470590_TDM,gvar_00475368);
So we suspect that TfrmMain
is the name of the main form class and we search the symbol table for TfrmMain.FormCreate
. Lucky us, it's there at offset 00470c74
:
void TfrmMain.FormCreate(int *param_1)
{
int *piVar1;
undefined4 extraout_ECX;
undefined4 extraout_EDX;
undefined4 *in_FS_OFFSET;
undefined4 uStack28;
undefined *puStack24;
undefined *puStack20;
puStack20 = &stack0xfffffffc;
puStack24 = &LAB_00470cef;
uStack28 = *in_FS_OFFSET;
*(undefined4 **)in_FS_OFFSET = &uStack28;
piVar1 = TTimer.Create((int *)VMT_42D4DC_TTimer,'\x01',param_1);
*(int **)(param_1 + 0xd8) = piVar1;
TTimer.SetEnabled((int)piVar1,'\0');
TTimer.SetOnTimer(param_1[0xd8],extraout_EDX,extraout_ECX,FUN_00470d48,param_1);
TTimer.SetInterval(param_1[0xd8],5000);
FUN_00470638(*(undefined4 *)gvar_00475368,param_1);
TTimer.SetEnabled(param_1[0xd8],'\x01');
*in_FS_OFFSET = uStack28;
return;
}
It is worth mentioning at this point that the following lines at the beginning of the code constitute what I believe to be a try/catch frame and for the sake of malware analysis, I'd recommend to mostly ignore them when you recognize the pattern. It most prominently appears at the beginning of functions:
puStack20 = &stack0xfffffffc;
puStack24 = &LAB_00470cef;
uStack28 = *in_FS_OFFSET;
*(undefined4 **)in_FS_OFFSET = &uStack28;
For the sake of this presentation, we decide to investigate the unknown function FUN_00470638
first. Without the repairs done by Dhrake, the call to @LStrCatN
at offset 004706aa
will be interpreted as follows in the decompiled code:
TApplication.GetExeName(*(undefined4 *)Application,&local_c);
ExtractFilePath((int)local_c,&local_8);
TApplication.GetExeName(*(undefined4 *)Application,&local_14);
ExtractFileName((int)local_14,&local_10);
plVar1 = local_10;
@LStrCatN((char **)&local_8,3);
After running Dhrake however and some refactoring, it should look as follows:
TApplication.GetExeName(*Application,&Tmp1);
ExtractFilePath(Tmp1,&FilePath_);
FilePath = FilePath_;
TApplication.GetExeName(*Application,&Tmp2);
ExtractFileName(Tmp2,&FileName);
@LStrCatN(&FilePath_,3,__dummy_170,FileName,"g",FilePath);
That makes a lot more sense: The virus is building a new path which is basically the same as the path to the current executable, except the file name has the letter g
prepended to it. Spoiler alert: This is how the virus _"infects"_ files, it simply renames them by prepending the letter g
, hides them, and creates a copy of itself in their place. It also steals their icon to look more like the real thing. But what is __dummy_170
? You can safely ignore it, but if you want to understand it, read on.
#### How To Fix Function Signatures
Fixing the function signatures of other functions such as @LStrCmp
was rather straightforward, and I will not go into detail here, you can probably get that easily from [the code](https://github.com/huettenhain/dhrake/blob/master/DhrakeInit.java). However, I want to talk a little bit about fixing the function signature for @LStrCatN
and its two friends, @WStrCatN
and @UStrCatN
, because [I had a bit of trouble with that](https://github.com/NationalSecurityAgency/ghidra/issues/1258). The function signature of @LStrCatN
is arguably the following:
void @LStrCatN (char * * Result, uint Count, ...);
and that is also what Dhrake sets it to. Here, Result
is passed in EAX
, Count
is passed in EDX
, and then Count
many arguments are supposed to be passed on the stack: These are the strings that are to be concatenated into the char pointer pointed to by Result
. Sadly, Ghidra does not correctly infer these stack arguments. IDA doesn't do a perfect job either, it's just a tricky problem that requires specialized code to handle. In this case, the only option that I could think of to fix the calls was to place a function call signature override at every address where @LStrCatN
is called. To do so, I had to:
1. List all CALL
s to @LStrCatN
and for each of these,
2. determine the value of Count
,
3. and override the function call signature to reflect this value.
The first part is straightforward, there are flat API functions called getGlobalFunctions
and getReferencesTo
which allow you to iterate over all calls to functions named @LStrCatN
. The second part is a little tricky, but I want to talk about the third part first, because it will explain the __dummy_170
variable we saw above. So, remember the call to @LStrCatN
at offset 004706aa
? I should override that call with the following signature, right?
void @LStrCatN(char **, uint, char *, char *, char *)
Well ...
You see,
Result
is passed in EAX
, and then Count
is passed in EDX
, and then the Delphi calling convention expects a third parameter to be passed in ECX
, and only _then_ will it expect anything on the stack. And there you have it: The best idea that I could come up with is to simply have a dummy parameter fill the ECX
slot:
void @LStrCatN(char **, uint, uint, char *, char *, char *)
So, you can literally just ignore it, or rather, you _should_ really ignore it, because it does not actually correspond to any parameter that is being passed to @LStrCatN
at all. I would prefer to use "custom storage" for the call signature override to specify exactly where the arguments come from, but it seems like the HighFunctionDBUtil.writeOverride
only accepts a FunctionSignature
argument, whose implementations do not seem to allow custom storage definitions. Hence, we are stuck with the dummy variable lest we don't want to see the concatenated strings at all.
Finally, I will briefly talk about determining the Count
parameter. The **proper** way to do this is implemented in the function getConstantCallArgument
in Dhrake:
private long getConstantCallArgument(Function caller, Address addr, int index) throws IllegalStateException {
// This is a very reliable and slow fallback to determine the value of a constant argument
// to a function call at a given address within a given function.
monitor.setMessage("obtaining decompiler interface");
DecompInterface decompInterface = new DecompInterface();
decompInterface.openProgram(currentProgram);
monitor.setMessage("decompiling");
DecompileResults decompileResults = decompInterface.decompileFunction(caller, 120, monitor);
if (!decompileResults.decompileCompleted())
throw new IllegalStateException();
monitor.setMessage("searching for call argument");
HighFunction highFunction = decompileResults.getHighFunction();
Iterator<PcodeOpAST> pCodes = highFunction.getPcodeOps(addr);
while (pCodes.hasNext()) {
PcodeOpAST instruction = pCodes.next();
if (instruction.getOpcode() == PcodeOp.CALL) {
Varnode argument = instruction.getInput(index);
if (!argument.isConstant())
throw new IllegalStateException();
return argument.getOffset();
}
}
throw new IllegalStateException();
}
It decompiles the function that has a call to @LStrCatN
, say, at a given address and uses the generated PCode information to obtain the index
-th argument to that invocation, assuming that the argument is a constant value. The only real gotcha I experienced when writing this code was the fact that argument.getOffset()
is actually the correct way to obtain the value of a constant Varnode
. Anyway, this is **unbelievably slow** in practice, and it seems like every CALL @LStrCatN
is always preceded by a MOV EDX, Count
with Count
being an immediate. Therefore, I use getConstantCallArgument
only as a fallback and essentially, I use the following code to determine Count
:
private long getStrCatCount(Function caller, Address addr) {
// Usually, the second (constant) argument to a *StrCatN function is assigned to
// the EDX register right before the call instruction. This method attempts to
// read the value by parsing the disassembly first and falls back to a decompiler
// based approach if any assumption fails.
try {
Instruction insn = this.getInstructionBefore(addr);
if (insn == null || insn.getNumOperands() != 2)
return this.getConstantCallArgument(caller, addr, 2);
Object EDX[] = insn.getOpObjects(0);
Object IMM[] = insn.getOpObjects(1);
if (insn.getOperandRefType(0) != RefType.WRITE)
return this.getConstantCallArgument(caller, addr, 2);
if (EDX.length != 1 || !(EDX[0] instanceof Register))
return this.getConstantCallArgument(caller, addr, 2);
if ( ( (Register) EDX[0]).getName().compareTo("EDX"))
return this.getConstantCallArgument(caller, addr, 2);
if (IMM.length != 1 || !(IMM[0] instanceof Scalar))
return this.getConstantCallArgument(caller, addr, 2);
return ((Scalar) IMM[0]).getUnsignedValue();
} catch (IllegalStateException e) {
return -1;
}
}
The code should be relatively self-explanatory. We assume that the instruction right before the CALL
is a MOV
which moves an immediate into a register called EDX
. If any of these assumptions fail, we do some heavy lifting. Otherwise, we just return that immediate.
#### Summary
- I give up scripting Ghidra in Python, the Java API has so much more support.
- The Java API is really well documented.
- Writing Ghidra scripts in Eclipse is really comfortable.
- Oh god what have I become
.java
files that will be available in the script manager. Copy the Dhrake files into one of these directories and they should appear in the category **Delphi** in the script manager, from where you can run them by clicking the green play button in the toolbar, or via context menu. I hope this works, let me know if you get stuck.