Complete First; Correct Later: Writing a Pascal Script Emulator



# Summary Malware uses InnoSetup and hides the installer password in obfuscated, compiled PascalScript, but this article is not about this malware. I abandoned all sanity and wrote an emulator for this arcane language to recover them. This article is about the process of reverse engineering the runtime: While IFPS has an open-source interpreter, I found it easier to recover the logic from existing code by assuming that it works. If you found this because you are interested in Inno Setup stuff and IFPS: The emulator and archive parsing is open source and [implemented in BinaryRefinery](https://binref.github.io/lib/inno/index.html). Feel free to file issues on GitHub for issues, questions, or feature requests. # Background It all starts with malware. Some malware uses [InnoSetup][], and Inno uses [PascalScript][] for scripting the installers. Some of the malicious logic is implemented in that compiled PascalScript code, and this is why in [Binary Refinery][BR], I have implemented a unit for disassembling this code, called [ifps](https://binref.github.io/#refinery.ifps), and one for dumping embedded strings, called [ifpsstr](https://binref.github.io/#refinery.ifpsstr). The pioneer work on disassembling IFPS was done by [IFPSTools][], which my code is also based on. For context, PascalScript is also referred to as IFPS because 1. that is still the magic 4-byte sequence at the start of every compiled script, and 2. it was named **I**nner**F**use **P**ascal**S**cript at some point before RemObjects took over the project. Now some malware also uses the PascalScript to protect payloads from static extraction: The Inno Setup archive can be password-protected, which means that the archived files are encrypted using a derived key. A static unpacking tool will, by default, not be able to recover the files without knowing the password. The embedded PascalScript is then utilized to construct and enter the correct password at runtime so that payloads are unpacked and executed without user interaction. In many cases, dumping the IFPS strings (via [ifpsstr](https://binref.github.io/#refinery.ifpsstr)) will help you find the password, but those are just the easy ones. Around August 2024, a fellow researcher started digging deeper into this type of malware and used refinery in the process. I did notice a sequence of issues ([#56][BR56], [#58][BR58],[#70][BR70], [#74][BR74]) being opened about IFPS by that person, but at the time I was completely unaware of the [full scope of their project][CrackBlog]. In early 2025, I independently started adding InnoSetup unpacking support to refinery at the suggestion of another fellow researcher (shout out to [@squibblydoo][squibbly] for the idea and to [Malcat][] for the code seed) and they connected me to [@gdesmar][gdesmar], the author of the aforementioned blog article. This is how I learned the full story, and the abyss of Inno malware that is out there. This is when I decided that my next 8 weekends or so would be spent writing an emulator for InnoSetup/IFPS, in order to recover hidden and obfuscated passwords from even the most obscure sample. # The Samples I was working with the following installer executables; I list them here mainly for reference. They are all available on [MalShare][] and the hashes in the below table contain the corresponding links: | SHA-256 Hash | Nr | Password | |-------------------------------------------------------------------:|:--:|----------| | [15f9eca216f9eb92dd70f86b65e4f6b19081113c5675d04c6008d216d54ed7e0](https://malshare.com/sample.php?action=detail&hash=15f9eca216f9eb92dd70f86b65e4f6b19081113c5675d04c6008d216d54ed7e0) | 1 | 02b60c12469a674bf | | [3785065d6ba8a07f248ed63deadbf04f5e35918224d414afe19f0de1bfcb0e84](https://malshare.com/sample.php?action=detail&hash=3785065d6ba8a07f248ed63deadbf04f5e35918224d414afe19f0de1bfcb0e84) | 2 | NFfB5qf2o1EJOkmBRrMvFcmj4QmKfwyNE5yoMOMRmFE4yfEEEMImIyyEYIQYFiO0 | | [73f5eee95f0d5250f5d2f7a29702700537ebe6c08861d4ddfefc09d485f0f65e](https://malshare.com/sample.php?action=detail&hash=73f5eee95f0d5250f5d2f7a29702700537ebe6c08861d4ddfefc09d485f0f65e) | 3 | Zet0 | | [aeac18c433de1a62b6b9106a9424028d4c2731d3f7b378088e7b305213432a42](https://malshare.com/sample.php?action=detail&hash=aeac18c433de1a62b6b9106a9424028d4c2731d3f7b378088e7b305213432a42) | 4 | Zet0 | | [f09c25c1b868baf93b77a7cbb3d57a2848355e495bca470db6dab70adcf73273](https://malshare.com/sample.php?action=detail&hash=f09c25c1b868baf93b77a7cbb3d57a2848355e495bca470db6dab70adcf73273) | 5 | 5D97BF9AEC584AAF4B7C0AEDF46CF882A5B1645392F958545AB2A7FC8FF8963F8 | | [f72106284904a0033fa877df21151bbb84b632163fab55789422916ed85b43a1](https://malshare.com/sample.php?action=detail&hash=f72106284904a0033fa877df21151bbb84b632163fab55789422916ed85b43a1) | 6 | kj2678лоkjfv89цкs75345в00р\5(*&Y&&^^^%##832984ол1мвырам~`ёЁ<>xhvрлджэ^(UJ<: | | [fabd429204db75e2ff9fe7fae5dc981b8c392be42a936273c99dcc41eeb0730d](https://malshare.com/sample.php?action=detail&hash=fabd429204db75e2ff9fe7fae5dc981b8c392be42a936273c99dcc41eeb0730d) | 7 | 7@8#3%5819((4f-=/72\~c0d``a221fdefc6bd&208fe1808d49606<:c89f | I did not originally know all the passwords, these have now been recovered using the [innopwd](https://binref.github.io/#refinery.innopwd) unit in refinery. What I don't know either is if these samples are malware or not, and I have not looked at the payloads at all. These installers simply represent interesting techniques to compute the Inno Setup password at runtime. # A Paradigm Shift As I mentioned earlier, [the runtime for PascalScript is open source][PascalScriptGH]. However, [the runtime code][PSRuntime] is something I consider extremely hard to read. Simply put, it is 13.2 kLOC of spaghetti code that somehow manages to manifest the nauseating smell of old tobacco. I took a step back and reconsidered my approach. Implementing an emulator from this runtime represents the paradigm: > **Correct first; complete later.** My code would literally follow the only specification in existence, so everything I implement would be correct. However, getting it to be complete seemed arduous: I would not be able to actually emulate any code and test my emulator until I had done labouring through that runtime from start to finish. The dual paradigm is also viable, though: > **Complete first, correct later.** I instead decided to implement each opcode of the IFPS VM by first guessing how it worked, sometimes just based on its name. I would then start running the emulator against existing PascalScript code. This would then reveal mistakes in my implementation and lead me closer to a correct one. The feedback loop for this approach was much faster and I am convinced that it was the more efficient choice; the downside, of course, is that the emulator as it is implemented today is likely *still incorrect*. That said, it is correct enough to handle all samples I know, and that's better than having no emulator at all. I think these two paradigms often apply when solving a problem that is subject to constraints: The most natural approach might seem to work within the confines of the constraints and just solve the problem, but sometimes you can instead consider all solutions to the problem, regardless of whether they satisfy the constraints or not, and then move within this space towards a solution that does. The first time I encountered a duality like this was in combinatorial optimization class when discussing the [Ford-Fulkerson][FordFulkerson] and [Push-Relabel][PushRelabel] algorithms for solving the $s$-$t$-Flow problem. The former always maintains a correct flow and alters it until it is optimal, but the latter can be seen as always maintaining a maximum but invalid flow, and altering it until it becomes valid. Ok, so those were my deep thoughts about paradigm shifts in solving constrained problems. You can stop here unless you deeply care about IFPS. I will now start talking about IFPS. You have been warned. # The InnoSetup API Emulating the PascalScript VM itself seemed like the hard part at first, but in retrospective it really wasn't. What ended up eating most of my time was re-implementing large portions of the [InnoSetup API][PSAPI] in Python. Here are just some examples: - Nearly all samples use functions like SetArrayLength, WStrSet, WStrGet, etcetera. These are absolutely necessary to implement; otherwise emulation is almost pointless. - Samples 3 and 4 use GetDateTimeString to compute the current second twice and subtracts the two values. The result is converted to a string and appended to the password: It has to be the letter 0. - Sample 1 uses user32::GetSystemMetrics with an argument of 44 (SM_SECURE), which [must return zero][GSM] to trigger a division by zero, which in turn is required to get to the password. I shed a lot of tears on this part so I did not want to let it go unmentioned, but at the end this is not really a reverse engineering topic, and not something that made me think deep thoughts about problem solving strategies either. You implement the functions according to their documentation, and that's essentially it. When a new sample needs more API than you are emulating, you implement the missing stuff. Rinse, repeat. # Local Variables The first thing that was important to figure out were local variables. Looking at the disassembly of almost any piece of code quickly reveals how it works, but let's pick a really simple one:
function MAKELANGID(Argument1: U08, Argument2: U08): U16
begin
  0x000   0  PushType      U08
  0x005   1  Assign        LocalVar1 := Argument2
  0x010   1  Calculate     LocalVar1 <<= 10
  0x020   1  Calculate     LocalVar1 |= Argument1
  0x02C   1  Assign        ReturnValue := LocalVar1
  0x037   1  Pop
  0x038   0  Ret
end;
Variables are created on the stack via the PushType instruction which creates a variable of the given type at the very top of the stack. Variables can then be referenced later by their offset from the stack *base*, i.e. when another variable would be created in the above example, it would be LocalVar2. My emulator therefore implements a stack of objects, each of which represents a variable. Each variable object knows its own type and handles assignments of other variables or immediate values to itself based on this type information. # Function Calls Implementing function calls was a big one for the VM. The main questions are the following: - How are arguments passed to a function when called? - Where and how are return values passed back to the caller? - Since all of this happens on the stack, who is responsible for cleaning up which parts of it after a call? To answer these questions, I implemented stack tracking in my IFPS disassembler. I assumed that, regardless of the execution path taken to reach a given instruction, the stack depth at that offset always has to be the same. In other words, the stack cannot grow or shrink uncontrollably across multiple loop iterations, for example. As this assumption was valid across several analysed scripts, I was confident that I had correctly mapped how much the stack pointer is modified by each opcode. I also assumed that within a given function, the stack pointer does not decrease below its initial value, and it made sense to track the stack depth within a function relative to that base pointer. The next observation then was that functions don't always return with an empty stack; some functions hit the Ret instruction with a stack pointer that is larger than their base pointer. However, under my first assumption it was evident that the Call instruction cannot modify the stack pointer of the caller, so I reached the following conclusion: - When a function is called, the current top of the stack becomes its base pointer. - When the function returns, the stack is reset to that base and all excess data on the stack is removed. For example, consider the following functions from sample **2**. The first number in each line of disassembly is the instruction offset, the second number (in decimal) is the computed stack depth:
function WITHINXDAYSOFBUILD(Argument1: Integer): Boolean
begin
  0x0000   0  PushType      Integer
  0x0005   1  PushType      Integer
  0x000A   2  PushType      Integer
  0x000F   3  PushType      Integer
  0x0014   4  PushVar       LocalVar1
  0x001A   5  Call          GETJULIANDAYNUMBERTODAY
  0x001F   5  Pop
  0x0020   4  PushVar       LocalVar2
  0x0026   5  Call          GETMINJULIANDAYNUMBERSYSTEM
  0x002B   5  Pop
  0x002C   4  PushVar       LocalVar3
  0x0032   5  Call          GETJULIANDAYNUMBERBUILD
  0x0037   5  Pop
  0x0038   4  Calculate     LocalVar3 -= 1
  0x0048   4  Assign        LocalVar4 := LocalVar3
  0x0053   4  Calculate     LocalVar4 += Argument1
  0x005F   4  Calculate     LocalVar4 += 1
  0x006F   4  Compare       ReturnValue := LocalVar1 >= LocalVar3
  0x0080   4  JumpFalse     JumpDestination01, ReturnValue
  0x008A   4  PushType      Boolean
  0x008F   5  Compare       LocalVar5 := LocalVar1 <= LocalVar4
  0x00A0   5  Calculate     ReturnValue &= LocalVar5
  0x00AC   5  Pop
JumpDestination01:
  0x00AD   4  JumpFalse     JumpDestination02, ReturnValue
  0x00B7   4  PushType      Boolean
  0x00BC   5  Compare       LocalVar5 := LocalVar2 <= LocalVar4
  0x00CD   5  Calculate     ReturnValue &= LocalVar5
  0x00D9   5  Pop
JumpDestination02:
  0x00DA   4  Ret
end;
The function WITHINXDAYSOFBUILD leaves 4 variables on the stack, but it is called later in DEBUGPROMPT:
procedure DEBUGPROMPT(Argument1: String)
begin
  0x0000   0  PushType      Boolean
  0x0005   1  PushType      U32
  0x000A   2  Assign        LocalVar2 := GlobalVar34
  0x0015   2  Calculate     LocalVar2 &= 16
  0x0025   2  Compare       LocalVar1 := LocalVar2 > 0
  0x003A   2  Pop
  0x003B   1  JumpFalse     JumpDestination01, LocalVar1
  0x0045   1  PushType      Boolean
  0x004A   2  PushType      Integer
  0x004F   3  Assign        LocalVar3 := 3
  0x005E   3  PushVar       LocalVar2
  0x0064   4  Call          WITHINXDAYSOFBUILD
  0x0069   4  Pop
  0x006A   3  Pop
  0x006B   2  Calculate     LocalVar1 &= LocalVar2
  0x0077   2  Pop
JumpDestination01:
  0x0078   1  SetFlag       !LocalVar1
  0x007F   1  Pop
  .....   ..  ...
We can see that the call to WITHINXDAYSOFBUILD must leave the stack depth in the caller unchanged, since otherwise we would get conflicting stack depth information for the instruction at offset 0x0078, marked JumpDestination01. Now we turn to understanding how values are passed from and to function calls. The parsing in [IFPSTools][] already helps understand this to some degree: Variables inside functions can either reference an argument or a local variable. If it references an argument, the index 0 represents the return value while indices 1 and up represent the functions in the order as they appear in the declaration. Again by studying disassembled code, I could easily deduce that the stack layout for a function call is as follows, drawing the stack as growing down:
[ Caller LocalVar1 ]  <-- Caller Stack Base
[ Caller LocalVar2 ]
        ...
[ Caller LocalVarK ]
[   Call ArgumentN ]
        ...
[   Call Argument1 ]
[   Call ReturnVal ]
in other words: Function arguments are referenced with negative indices relative to the function's stack base while local variables are referenced with positive indices. For procedures (routines without a return value), there is no variable pushed to the stack to store the return. A good illustration is the above code in WITHINXDAYSOFBUILD. With line breaks and comments to make the code easier to read:
  0x0000   0  PushType      Integer ; LocalVar1: will receive result from GETJULIANDAYNUMBERTODAY
  0x0005   1  PushType      Integer ; LocalVar2: will receive result from GETMINJULIANDAYNUMBERSYSTEM
  0x000A   2  PushType      Integer ; LocalVar3: will receive result from GETJULIANDAYNUMBERBUILD
  0x000F   3  PushType      Integer ; LocalVar4: computation result

  0x0014   4  PushVar       LocalVar1
  0x001A   5  Call          GETJULIANDAYNUMBERTODAY
  0x001F   5  Pop

  0x0020   4  PushVar       LocalVar2
  0x0026   5  Call          GETMINJULIANDAYNUMBERSYSTEM
  0x002B   5  Pop

  0x002C   4  PushVar       LocalVar3
  0x0032   5  Call          GETJULIANDAYNUMBERBUILD
  0x0037   5  Pop

  0x0038   4  Calculate     LocalVar3 -= 1
  0x0048   4  Assign        LocalVar4 := LocalVar3
  0x0053   4  Calculate     LocalVar4 += Argument1
  0x005F   4  Calculate     LocalVar4 += 1
The function first creates 4 local variables. These are also exactly the variables that are left on the stack when the function returns. It then begins calling functions and assigning their return values to these variables: First, a reference to a local variable is pushed onto the stack to store the return value. Then, the function is called (these functions take no arguments but return a result), populating this variable with its result. The caller then pops this reference from the stack and retains the return value in the destination variable. This means that while the stack is cleared of local variables created by a called function, the call arguments and return value are left on the stack for the caller to clean up manually. This is now sufficient information to implement handling of Call instructions without destroying the stack! # Exception Handling IFPS implements try/catch/finally handlers. These were fairly tricky to figure out, and there is one aspect of them that I haven't fully grasped - but it has not come up in practice yet so until then I'll just keep the guessed implementation that I have. This is not purely academic either; sample **1** deliberately triggers a floating point division by zero exception:
function InitializeSetup(): Boolean
begin
   ...   ..  ...
  0x0DB   4  PushEH        End:0x338 CatchAt:0x2ED
   ...   ..  ...
  0x26B   4  PushType      U32
  0x270   5  Assign        LocalVar5 := LocalVar1
  0x27B   5  Calculate     LocalVar5 -= LocalVar2
  0x287   5  PushType      S32
  0x28C   6  PushType      S32
  0x291   7  Assign        LocalVar7 := LocalVar3
  0x29C   7  PushVar       LocalVar6
  0x2A2   8  Call          User32::GetSystemMetrics
  0x2A7   8  Pop
  0x2A8   7  Pop
  0x2A9   6  Calculate     LocalVar5 /= LocalVar6
   ...   ..  ...
  0x2EB   4  PopEH         EndTry
  0x2ED   4  PushType      S32
  0x2F2   5  PushVar       LocalVar5
  0x2F8   6  Call          kernel32::GetTickCount
  0x2FD   6  Pop
  0x2FE   5  Assign        LocalVar3 := LocalVar5
  0x309   5  Pop
  0x30A   4  Assign        GlobalVar1 := 1
  0x316   4  Assign        GlobalVar0 := '02b60c12469a674bf'
  0x336   4  PopEH         EndCatch
JumpDestination04:
  0x338   4  PushType      Boolean
   ...   ..  ...
  0x378   4  Ret
end;
At offset 0x0DB, an exception handler is registered. The call to GetSystemMetrics at 0x2A2 returns 0, which triggers a divide-by-zero later at 0x2A9. The exception handling code begins at offset 0x2ED and is responsible for assigning the password 02b60c12469a674bf to a global variable which is later used in the InitializeWizard callback to automatically populate the password prompt. The PushEH opcode has 4 operands, named Finally1, Catch, Finally2, and End. Each of these operands references an instruction offset in the current function. Its counterpart is the PopEH opcode, which has one operand specifying the type of block (Try/Catch/Finally1/Finally2) that is ending at this point. I do not understand the push/pop semantics of the "exception stack" at all; in the emulator, I treat PushEH as the beginning of a try block, and each PopEH instruction as ending a specific type of block (i.e. ending a Try, Catch, or Finally block). The emulator then directs the control flow in the same way as you would expect these constructs to work in any procedural language that has them. From example code, it looks like Finally2 is used when a Catch handler is specified, and Finally1 is used when there is no Catch. The details remain unclear to me, though, and the emulator simply executes each Finally handler that is present. True to my plan, since the implementation works well in practice, I've had no reason to change the approach so far. It does work well enough to extract the password from sample **1** ... ¯\\_(ツ)_/¯. # Variable Assignment Edge Cases The last major issue I ran into were some edge cases for the Assign opcode, which is used to assign a value to a variable. A value can be encoded in an operand in 3 different ways: It can be an immediate, a reference to a variable, or a reference to an element of a container variable. In the last case, the index into the container can either be encoded as an integer immediate or as another variable reference. Now there are two kinds of variables that require special attention: Variables of type Pointer and containers (Array, StaticArray, or Record). A pointer is a variable that references another variable, and container types contain variables that can be accessed via an index. Note that containers can be nested, i.e. you can have a Record that contains a Record which contains a StaticArray. Based on testing against the listed reference samples, the following algorithm works for implementing Assign: - Obtain the value of the source operand. For a Pointer, its value is the value of the referenced variable. For a container, its value is the list of the values of all its members. - If the target operand is a pointer, assign the new value to the referenced variable. - If the target operand is a container, assert that the new value is a list and recursively assign the elements of that list to each member of the target container. Again, I cannot be certain that this matches the reference implementation exactly: It works well enough in practice, though, and I did have to revise the procedure several times before arriving at the above sequence of steps, which now works with all referenced samples. # Conclusion IFPS is horrible. And fun. But more importantly: When you have to solve a difficult problem, it is sometimes worth taking a step back and wondering whether there is a duality in the problem space that you can exploit to find an approach that has better qualities (i.e. shorter feedback loops): For myself, I am quite sure there are way more opportunities like this that I miss than those that I take. [BR]: https://github.com/binref/refinery/ [InnoSetup]: https://jrsoftware.org/isinfo.php [IFPSTools]: https://github.com/Wack0/IFPSTools.NET [PascalScript]: https://www.remobjects.com/ps.aspx [PascalScriptGH]: https://github.com/remobjects/pascalscript [PSAPI]: https://jrsoftware.org/ishelp/index.php?topic=scriptfunctions [PSRuntime]: https://github.com/remobjects/pascalscript/blob/master/Source/uPSRuntime.pas [BR56]: https://github.com/binref/refinery/issues/56 [BR58]: https://github.com/binref/refinery/issues/58 [BR70]: https://github.com/binref/refinery/issues/70 [BR74]: https://github.com/binref/refinery/issues/74 [GSM]: https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getsystemmetrics [CrackBlog]: https://medium.com/@gdesmar/crack-any-password-protected-innosetup-installer-5daabb52dbfb [squibbly]: https://github.com/Squiblydoo/ [gdesmar]: https://github.com/gdesmar [Malcat]: https://malcat.fr/ [MalShare]: https://www.malshare.com/ [FordFulkerson]: https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm [PushRelabel]: https://en.wikipedia.org/wiki/Push%E2%80%93relabel_maximum_flow_algorithm

Leave a Reply

Your email address will not be published. Required fields are marked *