Complete First; Correct Later: Writing a Pascal Script Emulator



# Summary Malware uses InnoSetup and hides the installer password in obfuscated, compiled PascalScript, but this article is not about this malware. I abandoned all sanity and wrote an emulator for this arcane language to recover them. This article is about the process of reverse engineering the runtime: While IFPS has an open-source interpreter, I found it easier to recover the logic from existing code by assuming that it works. If you found this because you are interested in Inno Setup stuff and IFPS: The emulator and archive parsing is open source and [implemented in BinaryRefinery](https://binref.github.io/lib/inno/index.html). Feel free to file issues on GitHub for issues, questions, or feature requests. <span id="more-6030"></span> # Background It all starts with malware. Some malware uses [InnoSetup][], and Inno uses [PascalScript][] for scripting the installers. Some of the malicious logic is implemented in that compiled PascalScript code, and this is why in [Binary Refinery][BR], I have implemented a unit for disassembling this code, called [ifps](https://binref.github.io/#refinery.ifps), and one for dumping embedded strings, called [ifpsstr](https://binref.github.io/#refinery.ifpsstr). The pioneer work on disassembling IFPS was done by [IFPSTools][], which my code is also based on. For context, PascalScript is also referred to as IFPS because 1. that is still the magic 4-byte sequence at the start of every compiled script, and 2. it was named **I**nner**F**use **P**ascal**S**cript at some point before RemObjects took over the project. Now some malware also uses the PascalScript to protect payloads from static extraction: The Inno Setup archive can be password-protected, which means that the archived files are encrypted using a derived key. A static unpacking tool will, by default, not be able to recover the files without knowing the password. The embedded PascalScript is then utilized to construct and enter the correct password at runtime so that payloads are unpacked and executed without user interaction. In many cases, dumping the IFPS strings (via [ifpsstr](https://binref.github.io/#refinery.ifpsstr)) will help you find the password, but those are just the easy ones. Around August 2024, a fellow researcher started digging deeper into this type of malware and used refinery in the process. I did notice a sequence of issues ([#56][BR56], [#58][BR58],[#70][BR70], [#74][BR74]) being opened about IFPS by that person, but at the time I was completely unaware of the [full scope of their project][CrackBlog]. In early 2025, I independently started adding InnoSetup unpacking support to refinery at the suggestion of another fellow researcher (shout out to [@squibblydoo][squibbly] for the idea and to [Malcat][] for the code seed) and they connected me to [@gdesmar][gdesmar], the author of the aforementioned blog article. This is how I learned the full story, and the abyss of Inno malware that is out there. This is when I decided that my next 8 weekends or so would be spent writing an emulator for InnoSetup/IFPS, in order to recover hidden and obfuscated passwords from even the most obscure sample. # The Samples I was working with the following installer executables; I list them here mainly for reference. They are all available on [MalShare][] and the hashes in the below table contain the corresponding links: | SHA-256 Hash | Nr | Password | |-------------------------------------------------------------------:|:--:|----------| | [`15f9eca216f9eb92dd70f86b65e4f6b19081113c5675d04c6008d216d54ed7e0`](https://malshare.com/sample.php?action=detail&hash=15f9eca216f9eb92dd70f86b65e4f6b19081113c5675d04c6008d216d54ed7e0) | 1 | `02b60c12469a674bf` | | [`3785065d6ba8a07f248ed63deadbf04f5e35918224d414afe19f0de1bfcb0e84`](https://malshare.com/sample.php?action=detail&hash=3785065d6ba8a07f248ed63deadbf04f5e35918224d414afe19f0de1bfcb0e84) | 2 | `NFfB5qf2o1EJOkmBRrMvFcmj4QmKfwyNE5yoMOMRmFE4yfEEEMImIyyEYIQYFiO0` | | [`73f5eee95f0d5250f5d2f7a29702700537ebe6c08861d4ddfefc09d485f0f65e`](https://malshare.com/sample.php?action=detail&hash=73f5eee95f0d5250f5d2f7a29702700537ebe6c08861d4ddfefc09d485f0f65e) | 3 | `Zet0` | | [`aeac18c433de1a62b6b9106a9424028d4c2731d3f7b378088e7b305213432a42`](https://malshare.com/sample.php?action=detail&hash=aeac18c433de1a62b6b9106a9424028d4c2731d3f7b378088e7b305213432a42) | 4 | `Zet0` | | [`f09c25c1b868baf93b77a7cbb3d57a2848355e495bca470db6dab70adcf73273`](https://malshare.com/sample.php?action=detail&hash=f09c25c1b868baf93b77a7cbb3d57a2848355e495bca470db6dab70adcf73273) | 5 | `5D97BF9AEC584AAF4B7C0AEDF46CF882A5B1645392F958545AB2A7FC8FF8963F8` | | [`f72106284904a0033fa877df21151bbb84b632163fab55789422916ed85b43a1`](https://malshare.com/sample.php?action=detail&hash=f72106284904a0033fa877df21151bbb84b632163fab55789422916ed85b43a1) | 6 | ``kj2678лоkjfv89цкs75345в00р\5(*&Y&&^^^%##832984ол1мвырам~`ёЁ<>xhvрлджэ^(UJ<:`` | | [`fabd429204db75e2ff9fe7fae5dc981b8c392be42a936273c99dcc41eeb0730d`](https://malshare.com/sample.php?action=detail&hash=fabd429204db75e2ff9fe7fae5dc981b8c392be42a936273c99dcc41eeb0730d) | 7 | ```7@8#3%5819((4f-=/72\~c0d``a221fdefc6bd&208fe1808d49606<:c89f``` | I did not originally know all the passwords, these have now been recovered using the [innopwd](https://binref.github.io/#refinery.innopwd) unit in refinery. What I don't know either is if these samples are malware or not, and I have not looked at the payloads at all. These installers simply represent interesting techniques to compute the Inno Setup password at runtime. # A Paradigm Shift As I mentioned earlier, [the runtime for PascalScript is open source][PascalScriptGH]. However, [the runtime code][PSRuntime] is something I consider extremely hard to read. Simply put, it is 13.2 kLOC of spaghetti code that somehow manages to manifest the nauseating smell of old tobacco. I took a step back and reconsidered my approach. Implementing an emulator from this runtime represents the paradigm: > **Correct first; complete later.** My code would literally follow the only specification in existence, so everything I implement would be correct. However, getting it to be complete seemed arduous: I would not be able to actually emulate any code and test my emulator until I had done labouring through that runtime from start to finish. The dual paradigm is also viable, though: > **Complete first, correct later.** I instead decided to implement each opcode of the IFPS VM by first guessing how it worked, sometimes just based on its name. I would then start running the emulator against existing PascalScript code. This would then reveal mistakes in my implementation and lead me closer to a correct one. The feedback loop for this approach was much faster and I am convinced that it was the more efficient choice; the downside, of course, is that the emulator as it is implemented today is likely *still incorrect*. That said, it is correct enough to handle all samples I know, and that's better than having no emulator at all. I think these two paradigms often apply when solving a problem that is subject to constraints: The most natural approach might seem to work within the confines of the constraints and just solve the problem, but sometimes you can instead consider all solutions to the problem, regardless of whether they satisfy the constraints or not, and then move within this space towards a solution that does. The first time I encountered a duality like this was in combinatorial optimization class when discussing the [Ford-Fulkerson][FordFulkerson] and [Push-Relabel][PushRelabel] algorithms for solving the $s$-$t$-Flow problem. The former always maintains a correct flow and alters it until it is optimal, but the latter can be seen as always maintaining a maximum but invalid flow, and altering it until it becomes valid. Ok, so those were my deep thoughts about paradigm shifts in solving constrained problems. You can stop here unless you deeply care about IFPS. I will now start talking about IFPS. You have been warned. # The InnoSetup API Emulating the PascalScript VM itself seemed like the hard part at first, but in retrospective it really wasn't. What ended up eating most of my time was re-implementing large portions of the [InnoSetup API][PSAPI] in Python. Here are just some examples: - Nearly all samples use functions like `SetArrayLength`, `WStrSet`, `WStrGet`, etcetera. These are absolutely necessary to implement; otherwise emulation is almost pointless. - Samples 3 and 4 use `GetDateTimeString` to compute the current second twice and subtracts the two values. The result is converted to a string and appended to the password: It has to be the letter `0`. - Sample 1 uses `user32::GetSystemMetrics` with an argument of `44` (`SM_SECURE`), which [must return zero][GSM] to trigger a division by zero, which in turn is required to get to the password. I shed a lot of tears on this part so I did not want to let it go unmentioned, but at the end this is not really a reverse engineering topic, and not something that made me think deep thoughts about problem solving strategies either. You implement the functions according to their documentation, and that's essentially it. When a new sample needs more API than you are emulating, you implement the missing stuff. Rinse, repeat. # Local Variables The first thing that was important to figure out were local variables. Looking at the disassembly of almost any piece of code quickly reveals how it works, but let's pick a really simple one: ``` function MAKELANGID(Argument1: U08, Argument2: U08): U16 begin 0x000 0 PushType U08 0x005 1 Assign LocalVar1 := Argument2 0x010 1 Calculate LocalVar1 <<= 10 0x020 1 Calculate LocalVar1 |= Argument1 0x02C 1 Assign ReturnValue := LocalVar1 0x037 1 Pop 0x038 0 Ret end; ``` Variables are created on the stack via the `PushType` instruction which creates a variable of the given type at the very top of the stack. Variables can then be referenced later by their offset from the stack *base*, i.e. when another variable would be created in the above example, it would be `LocalVar2`. My emulator therefore implements a stack of objects, each of which represents a variable. Each variable object knows its own type and handles assignments of other variables or immediate values to itself based on this type information. # Function Calls Implementing function calls was a big one for the VM. The main questions are the following: - How are arguments passed to a function when called? - Where and how are return values passed back to the caller? - Since all of this happens on the stack, who is responsible for cleaning up which parts of it after a call? To answer these questions, I implemented stack tracking in my IFPS disassembler. I assumed that, regardless of the execution path taken to reach a given instruction, the stack depth at that offset always has to be the same. In other words, the stack cannot grow or shrink uncontrollably across multiple loop iterations, for example. As this assumption was valid across several analysed scripts, I was confident that I had correctly mapped how much the stack pointer is modified by each opcode. I also assumed that within a given function, the stack pointer does not decrease below its initial value, and it made sense to track the stack depth within a function relative to that base pointer. The next observation then was that functions don't always return with an empty stack; some functions hit the `Ret` instruction with a stack pointer that is larger than their base pointer. However, under my first assumption it was evident that the `Call` instruction cannot modify the stack pointer of the caller, so I reached the following conclusion: - When a function is called, the current top of the stack becomes its base pointer. - When the function returns, the stack is reset to that base and all excess data on the stack is removed. For example, consider the following functions from sample **2**. The first number in each line of disassembly is the instruction offset, the second number (in decimal) is the computed stack depth: ``` function WITHINXDAYSOFBUILD(Argument1: Integer): Boolean begin 0x0000 0 PushType Integer 0x0005 1 PushType Integer 0x000A 2 PushType Integer 0x000F 3 PushType Integer 0x0014 4 PushVar LocalVar1 0x001A 5 Call GETJULIANDAYNUMBERTODAY 0x001F 5 Pop 0x0020 4 PushVar LocalVar2 0x0026 5 Call GETMINJULIANDAYNUMBERSYSTEM 0x002B 5 Pop 0x002C 4 PushVar LocalVar3 0x0032 5 Call GETJULIANDAYNUMBERBUILD 0x0037 5 Pop 0x0038 4 Calculate LocalVar3 -= 1 0x0048 4 Assign LocalVar4 := LocalVar3 0x0053 4 Calculate LocalVar4 += Argument1 0x005F 4 Calculate LocalVar4 += 1 0x006F 4 Compare ReturnValue := LocalVar1 >= LocalVar3 0x0080 4 JumpFalse JumpDestination01, ReturnValue 0x008A 4 PushType Boolean 0x008F 5 Compare LocalVar5 := LocalVar1 <= LocalVar4 0x00A0 5 Calculate ReturnValue &= LocalVar5 0x00AC 5 Pop JumpDestination01: 0x00AD 4 JumpFalse JumpDestination02, ReturnValue 0x00B7 4 PushType Boolean 0x00BC 5 Compare LocalVar5 := LocalVar2 <= LocalVar4 0x00CD 5 Calculate ReturnValue &= LocalVar5 0x00D9 5 Pop JumpDestination02: 0x00DA 4 Ret end; ``` The function `WITHINXDAYSOFBUILD` leaves 4 variables on the stack, but it is called later in `DEBUGPROMPT`: ``` procedure DEBUGPROMPT(Argument1: String) begin 0x0000 0 PushType Boolean 0x0005 1 PushType U32 0x000A 2 Assign LocalVar2 := GlobalVar34 0x0015 2 Calculate LocalVar2 &= 16 0x0025 2 Compare LocalVar1 := LocalVar2 > 0 0x003A 2 Pop 0x003B 1 JumpFalse JumpDestination01, LocalVar1 0x0045 1 PushType Boolean 0x004A 2 PushType Integer 0x004F 3 Assign LocalVar3 := 3 0x005E 3 PushVar LocalVar2 0x0064 4 Call WITHINXDAYSOFBUILD 0x0069 4 Pop 0x006A 3 Pop 0x006B 2 Calculate LocalVar1 &= LocalVar2 0x0077 2 Pop JumpDestination01: 0x0078 1 SetFlag !LocalVar1 0x007F 1 Pop ..... .. ... ``` We can see that the call to `WITHINXDAYSOFBUILD` must leave the stack depth in the caller unchanged, since otherwise we would get conflicting stack depth information for the instruction at offset `0x0078`, marked `JumpDestination01`. Now we turn to understanding how values are passed from and to function calls. The parsing in [IFPSTools][] already helps understand this to some degree: Variables inside functions can either reference an argument or a local variable. If it references an argument, the index `0` represents the return value while indices `1` and up represent the functions in the order as they appear in the declaration. Again by studying disassembled code, I could easily deduce that the stack layout for a function call is as follows, drawing the stack as growing down: ``` [ Caller LocalVar1 ] <-- Caller Stack Base [ Caller LocalVar2 ] ... [ Caller LocalVarK ] [ Call ArgumentN ] ... [ Call Argument1 ] [ Call ReturnVal ] ``` in other words: Function arguments are referenced with negative indices relative to the function's stack base while local variables are referenced with positive indices. For procedures (routines without a return value), there is no variable pushed to the stack to store the return. A good illustration is the above code in `WITHINXDAYSOFBUILD`. With line breaks and comments to make the code easier to read: ``` 0x0000 0 PushType Integer ; LocalVar1: will receive result from GETJULIANDAYNUMBERTODAY 0x0005 1 PushType Integer ; LocalVar2: will receive result from GETMINJULIANDAYNUMBERSYSTEM 0x000A 2 PushType Integer ; LocalVar3: will receive result from GETJULIANDAYNUMBERBUILD 0x000F 3 PushType Integer ; LocalVar4: computation result 0x0014 4 PushVar LocalVar1 0x001A 5 Call GETJULIANDAYNUMBERTODAY 0x001F 5 Pop 0x0020 4 PushVar LocalVar2 0x0026 5 Call GETMINJULIANDAYNUMBERSYSTEM 0x002B 5 Pop 0x002C 4 PushVar LocalVar3 0x0032 5 Call GETJULIANDAYNUMBERBUILD 0x0037 5 Pop 0x0038 4 Calculate LocalVar3 -= 1 0x0048 4 Assign LocalVar4 := LocalVar3 0x0053 4 Calculate LocalVar4 += Argument1 0x005F 4 Calculate LocalVar4 += 1 ``` The function first creates 4 local variables. These are also exactly the variables that are left on the stack when the function returns. It then begins calling functions and assigning their return values to these variables: First, a reference to a local variable is pushed onto the stack to store the return value. Then, the function is called (these functions take no arguments but return a result), populating this variable with its result. The caller then pops this reference from the stack and retains the return value in the destination variable. This means that while the stack is cleared of local variables created by a called function, the call arguments and return value are left on the stack for the caller to clean up manually. This is now sufficient information to implement handling of `Call` instructions without destroying the stack! # Exception Handling IFPS implements try/catch/finally handlers. These were fairly tricky to figure out, and there is one aspect of them that I haven't fully grasped - but it has not come up in practice yet so until then I'll just keep the guessed implementation that I have. This is not purely academic either; sample **1** deliberately triggers a floating point division by zero exception: ``` function InitializeSetup(): Boolean begin ... .. ... 0x0DB 4 PushEH End:0x338 CatchAt:0x2ED ... .. ... 0x26B 4 PushType U32 0x270 5 Assign LocalVar5 := LocalVar1 0x27B 5 Calculate LocalVar5 -= LocalVar2 0x287 5 PushType S32 0x28C 6 PushType S32 0x291 7 Assign LocalVar7 := LocalVar3 0x29C 7 PushVar LocalVar6 0x2A2 8 Call User32::GetSystemMetrics 0x2A7 8 Pop 0x2A8 7 Pop 0x2A9 6 Calculate LocalVar5 /= LocalVar6 ... .. ... 0x2EB 4 PopEH EndTry 0x2ED 4 PushType S32 0x2F2 5 PushVar LocalVar5 0x2F8 6 Call kernel32::GetTickCount 0x2FD 6 Pop 0x2FE 5 Assign LocalVar3 := LocalVar5 0x309 5 Pop 0x30A 4 Assign GlobalVar1 := 1 0x316 4 Assign GlobalVar0 := '02b60c12469a674bf' 0x336 4 PopEH EndCatch JumpDestination04: 0x338 4 PushType Boolean ... .. ... 0x378 4 Ret end; ``` At offset `0x0DB`, an exception handler is registered. The call to `GetSystemMetrics` at `0x2A2` returns `0`, which triggers a divide-by-zero later at `0x2A9`. The exception handling code begins at offset `0x2ED` and is responsible for assigning the password `02b60c12469a674bf` to a global variable which is later used in the `InitializeWizard` callback to automatically populate the password prompt. The `PushEH` opcode has 4 operands, named `Finally1`, `Catch`, `Finally2`, and `End`. Each of these operands references an instruction offset in the current function. Its counterpart is the `PopEH` opcode, which has one operand specifying the type of block (`Try`/`Catch`/`Finally1`/`Finally2`) that is ending at this point. I do not understand the push/pop semantics of the "exception stack" at all; in the emulator, I treat `PushEH` as the beginning of a try block, and each `PopEH` instruction as ending a specific type of block (i.e. ending a `Try`, `Catch`, or `Finally` block). The emulator then directs the control flow in the same way as you would expect these constructs to work in any procedural language that has them. From example code, it looks like `Finally2` is used when a `Catch` handler is specified, and `Finally1` is used when there is no `Catch`. The details remain unclear to me, though, and the emulator simply executes each `Finally` handler that is present. True to my plan, since the implementation works well in practice, I've had no reason to change the approach so far. It does work well enough to extract the password from sample **1** ... ¯\\_(ツ)_/¯. # Variable Assignment Edge Cases The last major issue I ran into were some edge cases for the `Assign` opcode, which is used to assign a value to a variable. A value can be encoded in an operand in 3 different ways: It can be an immediate, a reference to a variable, or a reference to an element of a container variable. In the last case, the index into the container can either be encoded as an integer immediate or as another variable reference. Now there are two kinds of variables that require special attention: Variables of type `Pointer` and containers (`Array`, `StaticArray`, or `Record`). A pointer is a variable that references another variable, and container types contain variables that can be accessed via an index. Note that containers can be nested, i.e. you can have a `Record` that contains a `Record` which contains a `StaticArray`. Based on testing against the listed reference samples, the following algorithm works for implementing `Assign`: - Obtain the value of the source operand. For a `Pointer`, its value is the value of the referenced variable. For a container, its value is the list of the values of all its members. - If the target operand is a pointer, assign the new value to the referenced variable. - If the target operand is a container, assert that the new value is a list and recursively assign the elements of that list to each member of the target container. Again, I cannot be certain that this matches the reference implementation exactly: It works well enough in practice, though, and I did have to revise the procedure several times before arriving at the above sequence of steps, which now works with all referenced samples. # Conclusion IFPS is horrible. And fun. But more importantly: When you have to solve a difficult problem, it is sometimes worth taking a step back and wondering whether there is a duality in the problem space that you can exploit to find an approach that has better qualities (i.e. shorter feedback loops): For myself, I am quite sure there are way more opportunities like this that I miss than those that I take. [BR]: https://github.com/binref/refinery/ [InnoSetup]: https://jrsoftware.org/isinfo.php [IFPSTools]: https://github.com/Wack0/IFPSTools.NET [PascalScript]: https://www.remobjects.com/ps.aspx [PascalScriptGH]: https://github.com/remobjects/pascalscript [PSAPI]: https://jrsoftware.org/ishelp/index.php?topic=scriptfunctions [PSRuntime]: https://github.com/remobjects/pascalscript/blob/master/Source/uPSRuntime.pas [BR56]: https://github.com/binref/refinery/issues/56 [BR58]: https://github.com/binref/refinery/issues/58 [BR70]: https://github.com/binref/refinery/issues/70 [BR74]: https://github.com/binref/refinery/issues/74 [GSM]: https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getsystemmetrics [CrackBlog]: https://medium.com/@gdesmar/crack-any-password-protected-innosetup-installer-5daabb52dbfb [squibbly]: https://github.com/Squiblydoo/ [gdesmar]: https://github.com/gdesmar [Malcat]: https://malcat.fr/ [MalShare]: https://www.malshare.com/ [FordFulkerson]: https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm [PushRelabel]: https://en.wikipedia.org/wiki/Push%E2%80%93relabel_maximum_flow_algorithm

Leave a Reply

Your email address will not be published. Required fields are marked *