API-Hashing in the Sodinokibi/REvil Ransomware – Why and How?



This post is written for aspiring reverse engineers and will talk about a technique called _API hashing_. The technique is used by malware authors to hinder reverse engineering. We will first discuss the reasons a malware author may even consider using API hashing. Then we will cover the necessary technical details around resolving dynamic imports at load-time and at runtime and finally, will described API hashing and show a Python script that emulates the hashing method used in Sodinokibi/REvil ransomware. # Top Down vs. Bottom Up One way to think about different approaches of reverse engineering is _Top Down_ vs. _Bottom Up_. Top Down means that you start with the entry point(s) of a binary and follow the execution flow. Bottom Up means that you start at an *interesting location* of the binary and try to understand, if and how the execution flow will reach it. Such a location may for example be an interesting string embedded in the binary (http://example.com or s3cr3t for example). So-called *imports* are also a good entry point for a Bottom Up analysis. One strength of the Bottom Up approach is, that you can _focus_ your analysis efforts with very little understanding of the malware or - to phrase it differently - at a very early point in time: You want to know where the juice crypto stuff is happening? Look for calls to the Windows API function BCryptOpenAlgorithmProvider. Want to know, where the payload, later dropped to %TEMP%\evil.exe, comes from? Look for reference to the string \evil.exe. Since this approach is so powerful, malware authors are highly motivated to make it harder for the reverse engineer to use it. One of these counter-measures is called "dynamic API resolutions with name hashing" and described in this blog post. # API Resolution at Load-Time Under Windows, imports are listed in the import address table (IAT) of the Portable Executable (PE) file. The table references functions that can be used during execution but are located in an external module. If a program wants to call PathCanonicalize in SHLWAPI.dll for example, the IAT of your executable contains an entry referencing that function. Say, you are looking for the command & control (C2) server of the malware, then you may want to look at the the functions referencing WININET.dll and find the InternetConnect entry. A string containing the C2 may be referenced in the the vacinity of calls to this function. # API Resolution at Runtime An obvious counter-measure against this reverse engineering approach is, to not use the function pointer from the IAT but resolve it at runtime. For this, the Windows API offers the functions LoadLibrary and GetProcAddress. The LoadLibrary function accepts the name of a DLL and returns a handle to it. This handle can then be passed to GetProcAddress together with a function name to get a pointer to the corresponding function:
#include <stdio.h>
#include <windows.h>

typedef BOOL(*funcType)(LPSTR, LPCSTR);

int main() {
    char buf[MAX_PATH];
    HMODULE hModule = LoadLibrary("SHLWAPI.dll");
    funcType PathCanonicalizeA = (funcType)GetProcAddress(hModule, "PathCanonicalizeA");
    if (! PathCanonicalizeA(buf, "A:\\path\\.\\somewhere\\..\\file.dat")) {
        return 1;
    }
    printf("%s", buf);
    FreeLibrary(hModule);
    return 0;
}
Looking at references to PathCanonicalizeA in the IAT will not lead to the call seen in the above code listing, defeating the technique described in _Static Imports_. For convenience reasons, one can now store the addresses retrieved by GetProcAddress in global variables: If these global variables are named like the actual API functions, one wouldn't even need to change old code. Switching back to the perspective of a reverse engineer, one can look for the _string_ PathCanonicalizeA now. It needs to be passed to the GetProcAddress function at some point. This, of course, can then be defeated by obfuscating the strings in the binary. But we will explore a different avenue in this post: The below-described method avoids inclusion of function names altogether. Instead, only a fix-sized hash of the function name will be present which has the additional benefit of saving on storage space. This is especially interesting when developing shellcode, where size restrictions may be tough and real. # API Hashing Let's again put ourselfs into the shoes of a malware author and let us further assume, we have a list of all exported function names for a given DLL. We can now calculate a hash for each function name. The hash doesn't need be cryptographically secure or even evenly distributed on the codomain. It only needs to be relatively collision-free on the set of all exported function names. It may become clearer soon, what that means exactly. The following hashing function - which is used in a recent sample of the Sodinokibi/REvil ransomware - initializes a state with 0x2b and then uses each character of the function name as argument for an arithmetic operation. It performs an arbitrary arithmetic operation in the end and finally returns the resulting state.
#!/usr/bin/env python3
def calc_hash(function_name):
    ret = 0x2b
    for c in function_name:
        ret = ret * 0x10f + ord(c)
    return (ret ^ 0xafb9) & 0x1fffff
The table below lists the resulting hash for a few functions exported from WINHTTP.dll. One already may notice that all the hashes are different. In fact, they are pairwise-different for all functions exported from WINHTTP.dll, so if we would only consider exports of that DLL and only want to use these eleven functions, the hash can be used to uniquly identify the API function. API function name | Hash ----------------------------|------------ WinHttpSendRequest | 0x1ce2a7 WinHttpSetOption | 0x1a5e67 WinHttpReadData | 0x064450 WinHttpCloseHandle | 0x0fd1fe WinHttpOpen | 0x0a459a WinHttpQueryDataAvailable | 0x0b8299 WinHttpReceiveResponse | 0x128d32 WinHttpConnect | 0x105ed8 WinHttpQueryHeaders | 0x09d98e WinHttpOpenRequest | 0x066635 WinHttpCrackUrl | 0x0c17e7 You might have guessed, what the overall goal now is: somehow get a list of all API function names together with their addresses, calculate the hashes for each of their names and only use the hash when referring to them. This way, we would avoid inclusion of any string - obfuscated or not - that refers to the API function by name. This works as long as for each function we want to use, the hashing method is collision-free over all considered function names. # Retrieving API function names But how do we get a list of all exported functions, potentially for all kinds of DLLs? As already described in the _Static Imports_ section, every Portable Executable (PE) file, including DLLs, contain a table with all exported functions. And since all the data from a PE file is loaded into memory, these table is present in memory too. To be precise, the table is the first entry of the DataDirectory array of the _Optional Header_ of the PE. The Optional Header is a structure of type IMAGE_OPTIONAL_HEADER and part of another structure called _PE Header_, which is located at offset 0x18. The PE Header is of type IMAGE_NT_HEADERS and can be found when following the address at the very end of the _DOS Header_ (that is at offset 0x3c). The DOS Header is of type IMAGE_DOS_HEADER. To summarize this rambling the other way around: * The _DOS Header_ is located at the beginning of a PE file. * The _PE Header_ is referenced at position 0x3c of the _DOS header_. * The _Optional Header_ is part of the PE header at offset 0x18. * All _Data Directories_ are stored at offset 0x60 of the Optional Header. * The _Export Directory_ is the first Data Directory. If you prefere code over words, the following pseudo-C-snippet shows, how to retrieve the export directory starting from a pointer to a PE in memory:
IMAGE_DOS_HEADER pe_file = get_ptr_to_memory_containing_pe();
IMAGE_NT_HEADERS pe_header = *(pe_file + 0x3c);
IMAGE_OPTIONAL_HEADER optional_header = pe_header + 0x18;
DATA_DIRECTORY data_directories[16] = optional_header + 0x60;
IMAGE_EXPORT_DIRECTORY export_directory = data_directories[0];
**#lifehack**: offsets 0x3c and 0x78 (which is the sum of the two offsets 0x18 + 0x60 from the listing above) appearing in assembly or decompiled code indicate that PE parsing is going on with the goal to retrieve exports from a PE file. # Listing exports of DLLs Let's get our hands dirty: In this section, we will briefly explain, how to easily collect API function names from Windows DLLs. This is useful independently of the malware family one analysis. In the section after, we will look at a concret sample of the Sodinokibi/REvil ransomware. With the help of a Python script, we will calculate API hashes for the collected function names and compare the resulting hashes with the content of a buffer embedded in the malware. The following Python script accepts a list of DLLs on the command line. Each line of its output is the name of the DLL followed by a space and the name of an export of the DLL.
#!/usr/bin/env python3
import sys
import pefile
import glob

for arg in sys.argv[1:]:
    for file_name in glob.glob(arg):
        try:
            with open(file_name, 'rb') as fp:
                pe = pefile.PE(data=fp.read())
        except pefile.PEFormatError:
            continue

        if not hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
            continue
        export = pe.DIRECTORY_ENTRY_EXPORT
        dll_name = pe.get_string_at_rva(export.struct.Name)
        if not dll_name:
            continue
        if len(export.symbols) == 0:
            continue
        for pe_export in export.symbols:
            if not pe_export.name:
                continue
            print(dll_name.decode('utf-8'), pe_export.name.decode('utf-8'))
You can use the script to accumulate exports for as many DLLs as possible. We will collect the result in a file called exports.txt. # API Hashing in Sodinokibi/REvil Ransomware Let's consider the sample with SHA-256 hash 5f56d5748940e4039053f85978074bde16d64bd5ba97f6f0026ba8172cb29e93. It belongs to the Sodinokibi/REvil ransomware family and contains a build timestamp of 2019-06-10 15:29:32. Reverse engineering with Ghidra yields the buffer shown below in hex-encoded format.
ae91d3b60313b7aca7e29c9ff53a4b505bf85fc37c42ad39da426c2851aa6caeaad50b9f84573f3d142cbcc1c2dbbffc1a8fa8e55e0acfad7b0cd3660bbadb13be00086ad5a37fc9df836de99c6324096645c3fa13bb1280a2271e4d3a56893c2a3d80d022602a5b9b692f847df0d69a339b98f138dd80b7675e5a2306e11fda10dab6b091a51c2bce0763b8b30106ec1e781d4325a63c9d4529f543ccd96d347062c708ca3a735001a7129c93ef8ad4382a8c9550447139176daed28f463eaba15f19354530e98f7335cd5f7f00c4edfed1d6ac2f7e3545c30664eb63bacdd098a63accc961600bd672d5492cf09b9a91d526bfecb754dd51a1e9cb0a58bf325e8ac72d20328e58718cc8e6a5ac10c69ab529583e7f9dc014e6ab8c99253d4fccde71337fb46a8ff8714f1b81a137cb22b188db36848deee778e84373e6db8c5488e4e28945322f1ff9b39352dc28fb62c658c2b5a002cacd287742b87205384e2ae695faa649cc291b9e717d2ad340c47676c931699dd616e0ab8ac4ce68a4e82e47643d12bc9ccaf6609c9a45b738af230d49c26308cabef4079e4cf528d233b459933e71187534c99776e7155b7fe6655c0f5d45f42f2ccb85a19982b5ffe7118a3666a6c64b5bcb21ecbe7bb540031abcf7328d07f05afbf291e683596e75fac6903861938cd85eef237aabddc19e872238d3fd6397db896be34976e01c8ed9a0a469df6be43566141bd636645cddc57faf407ffc15e717cc6a6b107d2bdb8172ebc39aa2bd29b72c8ca85b13e4
Right after startup, the malware interprets it as an array of length 0x230 storing a DWORD in each entry. Each DWORD corresponds to an API function and Sodinokibi/REvil uses API hashing to resolve the corresponding addresses. The Python script listed in the _Appendix_ emulates the behaviour of the malware: it reads all exports from exports.txt, calculate all hashes for all exports, and list each DWORD of the buffer stored in buffer.bin together with their API function name: DLL Name | API Hash | Function Name ---------------|--------------|------------------------------ ADVAPI32.dll | 0x2b7d106b | CheckTokenMembership ADVAPI32.dll | 0x40b57bbe | FreeSid ADVAPI32.dll | 0x431d781e | IsValidSid ADVAPI32.dll | 0x43e878e7 | GetTokenInformation ADVAPI32.dll | 0x45357e2f | GetUserNameW ADVAPI32.dll | 0x49d572d6 | OpenProcessToken ADVAPI32.dll | 0x5b2a6022 | CryptAcquireContextW ADVAPI32.dll | 0x8012bb13 | ImpersonateLoggedOnUser ADVAPI32.dll | 0x8c2cb729 | RegCloseKey ADVAPI32.dll | 0x8f6ab47f | RevertToSelf ADVAPI32.dll | 0x9c12a701 | RegOpenKeyExW ADVAPI32.dll | 0x9d3ca625 | RegSetValueExW ADVAPI32.dll | 0xc35ff85b | RegQueryValueExW ADVAPI32.dll | 0xd48aef93 | RegCreateKeyExW ADVAPI32.dll | 0xda1fe106 | AllocateAndInitializeSid ADVAPI32.dll | 0xe46bdf69 | CryptGenRandom combase.dll | 0x39ad427c | CreateStreamOnHGlobal CRTDLL.dll | 0x958c2a38 | _snwprintf CRYPT32.dll | 0x2dc78a5e | CryptStringToBinaryW CRYPT32.dll | 0xadcf0a5e | CryptBinaryToStringW GDI32.dll | 0x3371decc | DeleteObject GDI32.dll | 0x346dd9cc | DeleteDC GDI32.dll | 0x4bc6a666 | SetTextColor GDI32.dll | 0x5829b59a | SetBkColor GDI32.dll | 0x6e5983e6 | SetPixel GDI32.dll | 0x842f699b | GetDeviceCaps GDI32.dll | 0x8c936138 | CreateFontW GDI32.dll | 0xab3e468f | GetDIBits GDI32.dll | 0xc1bc2c14 | GetObjectW GDI32.dll | 0xd0803d2a | SetBkMode GDI32.dll | 0xeb6406c3 | CreateCompatibleBitmap GDI32.dll | 0xec0601b3 | GetStockObject GDI32.dll | 0xedc4007f | SelectObject GDI32.dll | 0xf7bc1a03 | CreateCompatibleDC KERNEL32.dll | 0x08c76270 | MapViewOfFile KERNEL32.dll | 0x0924639c | DeleteFileW KERNEL32.dll | 0x0b6061c9 | WaitForSingleObject KERNEL32.dll | 0x0f5c65e6 | GetNativeSystemInfo KERNEL32.dll | 0x13dbba0b | timeBeginPeriod KERNEL32.dll | 0x15fc7f40 | GetFileAttributesW KERNEL32.dll | 0x1b4f71f8 | Process32NextW KERNEL32.dll | 0x1ce07649 | GetVolumeInformationW KERNEL32.dll | 0x286c42da | CreateToolhelp32Snapshot KERNEL32.dll | 0x2f324589 | UnmapViewOfFile KERNEL32.dll | 0x2ff4455d | FindClose KERNEL32.dll | 0x32bf580a | GetCommandLineW KERNEL32.dll | 0x35195fa1 | GetFileSize KERNEL32.dll | 0x3c89563a | HeapCreate KERNEL32.dll | 0x3d3f5784 | OpenMutexW KERNEL32.dll | 0x40d32a7d | SetErrorMode KERNEL32.dll | 0x427728cd | FindNextFileW KERNEL32.dll | 0x43f52945 | CreateFileMappingW KERNEL32.dll | 0x490d23af | ExitProcess KERNEL32.dll | 0x4d1e27a2 | SystemTimeToFileTime KERNEL32.dll | 0x4f3d2599 | WriteFile KERNEL32.dll | 0x504b3af5 | PostQueuedCompletionStatus KERNEL32.dll | 0x50733aca | CompareFileTime KERNEL32.dll | 0x588e3220 | GetModuleFileNameW KERNEL32.dll | 0x5c6436d6 | CreateMutexW KERNEL32.dll | 0x5fcd3573 | OpenProcess KERNEL32.dll | 0x66d30c7b | GetDiskFreeSpaceExW KERNEL32.dll | 0x6a0800be | GetUserDefaultUILanguage KERNEL32.dll | 0x719e1b29 | GetProcessHeap KERNEL32.dll | 0x7f5b15e7 | GetDriveTypeW KERNEL32.dll | 0x8aabe016 | FindFirstFileW KERNEL32.dll | 0x8cabe614 | SetFileAttributesW KERNEL32.dll | 0x8cdbe673 | MultiByteToWideChar KERNEL32.dll | 0x90c6fa75 | Sleep KERNEL32.dll | 0x91f2fb5a | ReleaseMutex KERNEL32.dll | 0x93b3f91f | GetComputerNameW KERNEL32.dll | 0x9763fdd3 | Process32FirstW KERNEL32.dll | 0x9a9bf02c | LocalAlloc KERNEL32.dll | 0x9ad6f07d | CreateFileW KERNEL32.dll | 0x9c60f6ca | GetSystemDefaultUILanguage KERNEL32.dll | 0x9e07f4be | GlobalAlloc KERNEL32.dll | 0xa185cb2c | CloseHandle KERNEL32.dll | 0xa468cec4 | SetFilePointerEx KERNEL32.dll | 0xaf7fc5dd | GetSystemDirectoryW KERNEL32.dll | 0xb0b6da10 | TerminateProcess KERNEL32.dll | 0xb780dd38 | GetCurrentProcess KERNEL32.dll | 0xbf26d591 | Wow64RevertWow64FsRedirection KERNEL32.dll | 0xc1ddab7a | GetProcAddress KERNEL32.dll | 0xc610aca5 | GetQueuedCompletionStatus KERNEL32.dll | 0xc97fa3d5 | LocalFree KERNEL32.dll | 0xca02a0b5 | GetCurrentProcessId KERNEL32.dll | 0xca0863c2 | timeGetTime KERNEL32.dll | 0xcb37a181 | MulDiv KERNEL32.dll | 0xcbe9a151 | Wow64DisableWow64FsRedirection KERNEL32.dll | 0xcc3aa698 | CreateThread KERNEL32.dll | 0xcc49a6fa | GetTempPathW KERNEL32.dll | 0xd0cdba63 | GlobalFree KERNEL32.dll | 0xdb88b122 | GetFileSizeEx KERNEL32.dll | 0xdd54b7ec | VirtualAlloc KERNEL32.dll | 0xe2e48854 | ReadFile KERNEL32.dll | 0xe36b89db | WideCharToMultiByte KERNEL32.dll | 0xe5a88f1a | HeapDestroy KERNEL32.dll | 0xe6c88c71 | GetSystemInfo KERNEL32.dll | 0xe96d83df | GetFileAttributesExW KERNEL32.dll | 0xeb7281db | GetWindowsDirectoryW KERNEL32.dll | 0xee8d8436 | MoveFileW KERNEL32.dll | 0xf1989b33 | CreateIoCompletionPort KERNELBASE.dll | 0x380572b8 | PathFindExtensionW MPR.dll | 0x7518713e | WNetEnumResourceW MPR.dll | 0xae6caa51 | WNetCloseEnum MPR.dll | 0xc258c662 | WNetOpenEnumW ntdll.dll | 0x3822879e | RtlFreeHeap ntdll.dll | 0x7697c934 | RtlTimeToTimeFields ntdll.dll | 0x8fe93045 | RtlDeleteCriticalSection ntdll.dll | 0x95e62a4e | NtOpenFile ntdll.dll | 0xacb71303 | RtlGetLastWin32Error ntdll.dll | 0xb86307ce | RtlInitializeCriticalSection ntdll.dll | 0xc09d7f3e | NtClose ntdll.dll | 0xc97676c4 | RtlEnterCriticalSection ntdll.dll | 0xd2ae6d17 | RtlLeaveCriticalSection ntdll.dll | 0xd69d6931 | RtlAllocateHeap ntdll.dll | 0xe4135ba8 | NtQueryInformationFile ntdll.dll | 0xfac34566 | RtlInitUnicodeString rtm.dll | 0x6acc17e7 | EnumOverTable SHCORE.dll | 0x2b1ca591 | CommandLineToArgvW SHCORE.dll | 0x64472ee8 | SHDeleteValueW SHCORE.dll | 0x9f0bd5aa | SHDeleteKeyW SHELL32.dll | 0x9cbc123d | ShellExecuteExW USER32.dll | 0x368a11e7 | GetForegroundWindow USER32.dll | 0x9359b433 | GetDC USER32.dll | 0xb6d391ae | wsprintfW USER32.dll | 0xbda29ac3 | GetKeyboardLayoutList USER32.dll | 0xd228f54c | SystemParametersInfoW USER32.dll | 0xec21cb5b | FillRect USER32.dll | 0xfb28dc52 | DrawTextW USER32.dll | 0xfcbfdbc2 | ReleaseDC WINHTTP.dll | 0x1b146635 | WinHttpOpenRequest WINHTTP.dll | 0x235a5e67 | WinHttpSetOption WINHTTP.dll | 0x23ef5ed8 | WinHttpConnect WINHTTP.dll | 0x38b7459a | WinHttpOpen WINHTTP.dll | 0x39714450 | WinHttpReadData WINHTTP.dll | 0x9f9ce2a7 | WinHttpSendRequest WINHTTP.dll | 0xa4a0d98e | WinHttpQueryHeaders WINHTTP.dll | 0xacd6d1fe | WinHttpCloseHandle WINHTTP.dll | 0xf0078d32 | WinHttpReceiveResponse WINHTTP.dll | 0xffb58299 | WinHttpQueryDataAvailable This data can now be used to deduce which functions are called and enable a Buttom Up approach again. Looking at the only references to WinHttpConnect for example will probably lead to a C2 server. # Appendix
#!/usr/bin/env python3.7
from revil import calc_hash


def chunks(l, n):
    for i in range(0, len(l), n):
        yield l[i:i + n]


def main():
    exports = []
    with open('exports.txt', 'r') as fp:
        for line in fp:
            sp = line.strip().split(' ')
            if len(sp) != 2:
                continue
            exports.append(sp)

    with open('buffer.bin', 'rb') as fp:
        hash_buffer = fp.read()

    resolutions = {}
    for chunk in chunks(hash_buffer, 4):
        api_hash = int.from_bytes(chunk, byteorder='little')
        for dll_name, export_name in exports:
            calculated_hash = calc_hash(export_name)
            if calculated_hash == ( ( api_hash ^ 0x76c7 ) << 0x10 ^ api_hash ) & 0x1fffff:
                if dll_name not in resolutions.keys():
                    resolutions[dll_name] = []
                resolutions[dll_name].append((api_hash, export_name))
                break

    for dll_name, pairs in resolutions.items():
        for api_hash, export_name in pairs:
            print(F'{dll_name}\t0x{api_hash:08x}\t{export_name}')


if __name__ == '__main__':
    main()
Update (2019-11-22): No mention of the term "static import" now because it doesn't make sense. Instead of "dynamic" vs. "static" imports, the post now talks about imports resolved at "load-time" vs. those resolved at "runtime".

Tags: -

4 Replies to “API-Hashing in the Sodinokibi/REvil Ransomware – Why and How?”

Leave a Reply

Your email address will not be published. Required fields are marked *