This post is written for aspiring reverse engineers and will talk about a technique called _API hashing_. The technique is used by malware authors to hinder reverse engineering. We will first discuss the reasons a malware author may even consider using API hashing. Then we will cover the necessary technical details around resolving dynamic imports at load-time and at runtime and finally, will described API hashing and show a Python script that emulates the hashing method used in Sodinokibi/REvil ransomware.
# Top Down vs. Bottom Up
One way to think about different approaches of reverse engineering is _Top Down_ vs. _Bottom Up_. Top Down means that you start with the entry point(s) of a binary and follow the execution flow. Bottom Up means that you start at an *interesting location* of the binary and try to understand, if and how the execution flow will reach it. Such a location may for example be an interesting string embedded in the binary (
http://example.com
or s3cr3t
for example). So-called *imports* are also a good entry point for a Bottom Up analysis.
One strength of the Bottom Up approach is, that you can _focus_ your analysis efforts with very little understanding of the malware or - to phrase it differently - at a very early point in time: You want to know where the juice crypto stuff is happening? Look for calls to the Windows API function BCryptOpenAlgorithmProvider
. Want to know, where the payload, later dropped to %TEMP%\evil.exe
, comes from? Look for reference to the string \evil.exe
. Since this approach is so powerful, malware authors are highly motivated to make it harder for the reverse engineer to use it. One of these counter-measures is called "dynamic API resolutions with name hashing" and described in this blog post.
# API Resolution at Load-Time
Under Windows, imports are listed in the import address table (IAT) of the Portable Executable (PE) file. The table references functions that can be used during execution but are located in an external module. If a program wants to call PathCanonicalize
in SHLWAPI.dll
for example, the IAT of your executable contains an entry referencing that function. Say, you are looking for the command & control (C2) server of the malware, then you may want to look at the the functions referencing WININET.dll
and find the InternetConnect
entry. A string containing the C2 may be referenced in the the vacinity of calls to this function.
# API Resolution at Runtime
An obvious counter-measure against this reverse engineering approach is, to not use the function pointer from the IAT but resolve it at runtime. For this, the Windows API offers the functions LoadLibrary
and GetProcAddress
. The LoadLibrary
function accepts the name of a DLL and returns a handle to it. This handle can then be passed to GetProcAddress
together with a function name to get a pointer to the corresponding function:
#include <stdio.h>
#include <windows.h>
typedef BOOL(*funcType)(LPSTR, LPCSTR);
int main() {
char buf[MAX_PATH];
HMODULE hModule = LoadLibrary("SHLWAPI.dll");
funcType PathCanonicalizeA = (funcType)GetProcAddress(hModule, "PathCanonicalizeA");
if (! PathCanonicalizeA(buf, "A:\\path\\.\\somewhere\\..\\file.dat")) {
return 1;
}
printf("%s", buf);
FreeLibrary(hModule);
return 0;
}
Looking at references to PathCanonicalizeA
in the IAT will not lead to the call seen in the above code listing, defeating the technique described in _Static Imports_. For convenience reasons, one can now store the addresses retrieved by GetProcAddress
in global variables: If these global variables are named like the actual API functions, one wouldn't even need to change old code.
Switching back to the perspective of a reverse engineer, one can look for the _string_ PathCanonicalizeA
now. It needs to be passed to the GetProcAddress
function at some point. This, of course, can then be defeated by obfuscating the strings in the binary. But we will explore a different avenue in this post: The below-described method avoids inclusion of function names altogether. Instead, only a fix-sized hash of the function name will be present which has the additional benefit of saving on storage space. This is especially interesting when developing shellcode, where size restrictions may be tough and real.
# API Hashing
Let's again put ourselfs into the shoes of a malware author and let us further assume, we have a list of all exported function names for a given DLL. We can now calculate a hash for each function name. The hash doesn't need be cryptographically secure or even evenly distributed on the codomain. It only needs to be relatively collision-free on the set of all exported function names. It may become clearer soon, what that means exactly.
The following hashing function - which is used in a recent sample of the Sodinokibi/REvil ransomware - initializes a state with 0x2b
and then uses each character of the function name as argument for an arithmetic operation. It performs an arbitrary arithmetic operation in the end and finally returns the resulting state.
#!/usr/bin/env python3
def calc_hash(function_name):
ret = 0x2b
for c in function_name:
ret = ret * 0x10f + ord(c)
return (ret ^ 0xafb9) & 0x1fffff
The table below lists the resulting hash for a few functions exported from WINHTTP.dll
. One already may notice that all the hashes are different. In fact, they are pairwise-different for all functions exported from WINHTTP.dll
, so if we would only consider exports of that DLL and only want to use these eleven functions, the hash can be used to uniquly identify the API function.
API function name | Hash
----------------------------|------------
WinHttpSendRequest
| 0x1ce2a7
WinHttpSetOption
| 0x1a5e67
WinHttpReadData
| 0x064450
WinHttpCloseHandle
| 0x0fd1fe
WinHttpOpen
| 0x0a459a
WinHttpQueryDataAvailable
| 0x0b8299
WinHttpReceiveResponse
| 0x128d32
WinHttpConnect
| 0x105ed8
WinHttpQueryHeaders
| 0x09d98e
WinHttpOpenRequest
| 0x066635
WinHttpCrackUrl
| 0x0c17e7
You might have guessed, what the overall goal now is: somehow get a list of all API function names together with their addresses, calculate the hashes for each of their names and only use the hash when referring to them. This way, we would avoid inclusion of any string - obfuscated or not - that refers to the API function by name. This works as long as for each function we want to use, the hashing method is collision-free over all considered function names.
# Retrieving API function names
But how do we get a list of all exported functions, potentially for all kinds of DLLs? As already described in the _Static Imports_ section, every Portable Executable (PE) file, including DLLs, contain a table with all exported functions. And since all the data from a PE file is loaded into memory, these table is present in memory too. To be precise, the table is the first entry of the DataDirectory
array of the _Optional Header_ of the PE. The Optional Header is a structure of type IMAGE_OPTIONAL_HEADER
and part of another structure called _PE Header_, which is located at offset 0x18
. The PE Header is of type IMAGE_NT_HEADERS
and can be found when following the address at the very end of the _DOS Header_ (that is at offset 0x3c
). The DOS Header is of type IMAGE_DOS_HEADER
. To summarize this rambling the other way around:
* The _DOS Header_ is located at the beginning of a PE file.
* The _PE Header_ is referenced at position 0x3c
of the _DOS header_.
* The _Optional Header_ is part of the PE header at offset 0x18
.
* All _Data Directories_ are stored at offset 0x60
of the Optional Header.
* The _Export Directory_ is the first Data Directory.
If you prefere code over words, the following pseudo-C-snippet shows, how to retrieve the export directory starting from a pointer to a PE in memory:
IMAGE_DOS_HEADER pe_file = get_ptr_to_memory_containing_pe();
IMAGE_NT_HEADERS pe_header = *(pe_file + 0x3c);
IMAGE_OPTIONAL_HEADER optional_header = pe_header + 0x18;
DATA_DIRECTORY data_directories[16] = optional_header + 0x60;
IMAGE_EXPORT_DIRECTORY export_directory = data_directories[0];
**#lifehack**: offsets 0x3c
and 0x78
(which is the sum of the two offsets 0x18 + 0x60
from the listing above) appearing in assembly or decompiled code indicate that PE parsing is going on with the goal to retrieve exports from a PE file.
# Listing exports of DLLs
Let's get our hands dirty: In this section, we will briefly explain, how to easily collect API function names from Windows DLLs. This is useful independently of the malware family one analysis. In the section after, we will look at a concret sample of the Sodinokibi/REvil ransomware. With the help of a Python script, we will calculate API hashes for the collected function names and compare the resulting hashes with the content of a buffer embedded in the malware.
The following Python script accepts a list of DLLs on the command line. Each line of its output is the name of the DLL followed by a space and the name of an export of the DLL.
#!/usr/bin/env python3
import sys
import pefile
import glob
for arg in sys.argv[1:]:
for file_name in glob.glob(arg):
try:
with open(file_name, 'rb') as fp:
pe = pefile.PE(data=fp.read())
except pefile.PEFormatError:
continue
if not hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
continue
export = pe.DIRECTORY_ENTRY_EXPORT
dll_name = pe.get_string_at_rva(export.struct.Name)
if not dll_name:
continue
if len(export.symbols) == 0:
continue
for pe_export in export.symbols:
if not pe_export.name:
continue
print(dll_name.decode('utf-8'), pe_export.name.decode('utf-8'))
You can use the script to accumulate exports for as many DLLs as possible. We will collect the result in a file called exports.txt
.
# API Hashing in Sodinokibi/REvil Ransomware
Let's consider the sample with SHA-256 hash 5f56d5748940e4039053f85978074bde16d64bd5ba97f6f0026ba8172cb29e93
. It belongs to the Sodinokibi/REvil ransomware family and contains a build timestamp of 2019-06-10 15:29:32. Reverse engineering with Ghidra yields the buffer shown below in hex-encoded format.
ae91d3b60313b7aca7e29c9ff53a4b505bf85fc37c42ad39da426c2851aa6caeaad50b9f84573f3d142cbcc1c2dbbffc1a8fa8e55e0acfad7b0cd3660bbadb13be00086ad5a37fc9df836de99c6324096645c3fa13bb1280a2271e4d3a56893c2a3d80d022602a5b9b692f847df0d69a339b98f138dd80b7675e5a2306e11fda10dab6b091a51c2bce0763b8b30106ec1e781d4325a63c9d4529f543ccd96d347062c708ca3a735001a7129c93ef8ad4382a8c9550447139176daed28f463eaba15f19354530e98f7335cd5f7f00c4edfed1d6ac2f7e3545c30664eb63bacdd098a63accc961600bd672d5492cf09b9a91d526bfecb754dd51a1e9cb0a58bf325e8ac72d20328e58718cc8e6a5ac10c69ab529583e7f9dc014e6ab8c99253d4fccde71337fb46a8ff8714f1b81a137cb22b188db36848deee778e84373e6db8c5488e4e28945322f1ff9b39352dc28fb62c658c2b5a002cacd287742b87205384e2ae695faa649cc291b9e717d2ad340c47676c931699dd616e0ab8ac4ce68a4e82e47643d12bc9ccaf6609c9a45b738af230d49c26308cabef4079e4cf528d233b459933e71187534c99776e7155b7fe6655c0f5d45f42f2ccb85a19982b5ffe7118a3666a6c64b5bcb21ecbe7bb540031abcf7328d07f05afbf291e683596e75fac6903861938cd85eef237aabddc19e872238d3fd6397db896be34976e01c8ed9a0a469df6be43566141bd636645cddc57faf407ffc15e717cc6a6b107d2bdb8172ebc39aa2bd29b72c8ca85b13e4
Right after startup, the malware interprets it as an array of length 0x230
storing a DWORD
in each entry. Each DWORD
corresponds to an API function and Sodinokibi/REvil uses API hashing to resolve the corresponding addresses. The Python script listed in the _Appendix_ emulates the behaviour of the malware: it reads all exports from exports.txt
, calculate all hashes for all exports, and list each DWORD
of the buffer stored in buffer.bin
together with their API function name:
DLL Name | API Hash | Function Name
---------------|--------------|------------------------------
ADVAPI32.dll | 0x2b7d106b
| CheckTokenMembership
ADVAPI32.dll | 0x40b57bbe
| FreeSid
ADVAPI32.dll | 0x431d781e
| IsValidSid
ADVAPI32.dll | 0x43e878e7
| GetTokenInformation
ADVAPI32.dll | 0x45357e2f
| GetUserNameW
ADVAPI32.dll | 0x49d572d6
| OpenProcessToken
ADVAPI32.dll | 0x5b2a6022
| CryptAcquireContextW
ADVAPI32.dll | 0x8012bb13
| ImpersonateLoggedOnUser
ADVAPI32.dll | 0x8c2cb729
| RegCloseKey
ADVAPI32.dll | 0x8f6ab47f
| RevertToSelf
ADVAPI32.dll | 0x9c12a701
| RegOpenKeyExW
ADVAPI32.dll | 0x9d3ca625
| RegSetValueExW
ADVAPI32.dll | 0xc35ff85b
| RegQueryValueExW
ADVAPI32.dll | 0xd48aef93
| RegCreateKeyExW
ADVAPI32.dll | 0xda1fe106
| AllocateAndInitializeSid
ADVAPI32.dll | 0xe46bdf69
| CryptGenRandom
combase.dll | 0x39ad427c
| CreateStreamOnHGlobal
CRTDLL.dll | 0x958c2a38
| _snwprintf
CRYPT32.dll | 0x2dc78a5e
| CryptStringToBinaryW
CRYPT32.dll | 0xadcf0a5e
| CryptBinaryToStringW
GDI32.dll | 0x3371decc
| DeleteObject
GDI32.dll | 0x346dd9cc
| DeleteDC
GDI32.dll | 0x4bc6a666
| SetTextColor
GDI32.dll | 0x5829b59a
| SetBkColor
GDI32.dll | 0x6e5983e6
| SetPixel
GDI32.dll | 0x842f699b
| GetDeviceCaps
GDI32.dll | 0x8c936138
| CreateFontW
GDI32.dll | 0xab3e468f
| GetDIBits
GDI32.dll | 0xc1bc2c14
| GetObjectW
GDI32.dll | 0xd0803d2a
| SetBkMode
GDI32.dll | 0xeb6406c3
| CreateCompatibleBitmap
GDI32.dll | 0xec0601b3
| GetStockObject
GDI32.dll | 0xedc4007f
| SelectObject
GDI32.dll | 0xf7bc1a03
| CreateCompatibleDC
KERNEL32.dll | 0x08c76270
| MapViewOfFile
KERNEL32.dll | 0x0924639c
| DeleteFileW
KERNEL32.dll | 0x0b6061c9
| WaitForSingleObject
KERNEL32.dll | 0x0f5c65e6
| GetNativeSystemInfo
KERNEL32.dll | 0x13dbba0b
| timeBeginPeriod
KERNEL32.dll | 0x15fc7f40
| GetFileAttributesW
KERNEL32.dll | 0x1b4f71f8
| Process32NextW
KERNEL32.dll | 0x1ce07649
| GetVolumeInformationW
KERNEL32.dll | 0x286c42da
| CreateToolhelp32Snapshot
KERNEL32.dll | 0x2f324589
| UnmapViewOfFile
KERNEL32.dll | 0x2ff4455d
| FindClose
KERNEL32.dll | 0x32bf580a
| GetCommandLineW
KERNEL32.dll | 0x35195fa1
| GetFileSize
KERNEL32.dll | 0x3c89563a
| HeapCreate
KERNEL32.dll | 0x3d3f5784
| OpenMutexW
KERNEL32.dll | 0x40d32a7d
| SetErrorMode
KERNEL32.dll | 0x427728cd
| FindNextFileW
KERNEL32.dll | 0x43f52945
| CreateFileMappingW
KERNEL32.dll | 0x490d23af
| ExitProcess
KERNEL32.dll | 0x4d1e27a2
| SystemTimeToFileTime
KERNEL32.dll | 0x4f3d2599
| WriteFile
KERNEL32.dll | 0x504b3af5
| PostQueuedCompletionStatus
KERNEL32.dll | 0x50733aca
| CompareFileTime
KERNEL32.dll | 0x588e3220
| GetModuleFileNameW
KERNEL32.dll | 0x5c6436d6
| CreateMutexW
KERNEL32.dll | 0x5fcd3573
| OpenProcess
KERNEL32.dll | 0x66d30c7b
| GetDiskFreeSpaceExW
KERNEL32.dll | 0x6a0800be
| GetUserDefaultUILanguage
KERNEL32.dll | 0x719e1b29
| GetProcessHeap
KERNEL32.dll | 0x7f5b15e7
| GetDriveTypeW
KERNEL32.dll | 0x8aabe016
| FindFirstFileW
KERNEL32.dll | 0x8cabe614
| SetFileAttributesW
KERNEL32.dll | 0x8cdbe673
| MultiByteToWideChar
KERNEL32.dll | 0x90c6fa75
| Sleep
KERNEL32.dll | 0x91f2fb5a
| ReleaseMutex
KERNEL32.dll | 0x93b3f91f
| GetComputerNameW
KERNEL32.dll | 0x9763fdd3
| Process32FirstW
KERNEL32.dll | 0x9a9bf02c
| LocalAlloc
KERNEL32.dll | 0x9ad6f07d
| CreateFileW
KERNEL32.dll | 0x9c60f6ca
| GetSystemDefaultUILanguage
KERNEL32.dll | 0x9e07f4be
| GlobalAlloc
KERNEL32.dll | 0xa185cb2c
| CloseHandle
KERNEL32.dll | 0xa468cec4
| SetFilePointerEx
KERNEL32.dll | 0xaf7fc5dd
| GetSystemDirectoryW
KERNEL32.dll | 0xb0b6da10
| TerminateProcess
KERNEL32.dll | 0xb780dd38
| GetCurrentProcess
KERNEL32.dll | 0xbf26d591
| Wow64RevertWow64FsRedirection
KERNEL32.dll | 0xc1ddab7a
| GetProcAddress
KERNEL32.dll | 0xc610aca5
| GetQueuedCompletionStatus
KERNEL32.dll | 0xc97fa3d5
| LocalFree
KERNEL32.dll | 0xca02a0b5
| GetCurrentProcessId
KERNEL32.dll | 0xca0863c2
| timeGetTime
KERNEL32.dll | 0xcb37a181
| MulDiv
KERNEL32.dll | 0xcbe9a151
| Wow64DisableWow64FsRedirection
KERNEL32.dll | 0xcc3aa698
| CreateThread
KERNEL32.dll | 0xcc49a6fa
| GetTempPathW
KERNEL32.dll | 0xd0cdba63
| GlobalFree
KERNEL32.dll | 0xdb88b122
| GetFileSizeEx
KERNEL32.dll | 0xdd54b7ec
| VirtualAlloc
KERNEL32.dll | 0xe2e48854
| ReadFile
KERNEL32.dll | 0xe36b89db
| WideCharToMultiByte
KERNEL32.dll | 0xe5a88f1a
| HeapDestroy
KERNEL32.dll | 0xe6c88c71
| GetSystemInfo
KERNEL32.dll | 0xe96d83df
| GetFileAttributesExW
KERNEL32.dll | 0xeb7281db
| GetWindowsDirectoryW
KERNEL32.dll | 0xee8d8436
| MoveFileW
KERNEL32.dll | 0xf1989b33
| CreateIoCompletionPort
KERNELBASE.dll | 0x380572b8
| PathFindExtensionW
MPR.dll | 0x7518713e
| WNetEnumResourceW
MPR.dll | 0xae6caa51
| WNetCloseEnum
MPR.dll | 0xc258c662
| WNetOpenEnumW
ntdll.dll | 0x3822879e
| RtlFreeHeap
ntdll.dll | 0x7697c934
| RtlTimeToTimeFields
ntdll.dll | 0x8fe93045
| RtlDeleteCriticalSection
ntdll.dll | 0x95e62a4e
| NtOpenFile
ntdll.dll | 0xacb71303
| RtlGetLastWin32Error
ntdll.dll | 0xb86307ce
| RtlInitializeCriticalSection
ntdll.dll | 0xc09d7f3e
| NtClose
ntdll.dll | 0xc97676c4
| RtlEnterCriticalSection
ntdll.dll | 0xd2ae6d17
| RtlLeaveCriticalSection
ntdll.dll | 0xd69d6931
| RtlAllocateHeap
ntdll.dll | 0xe4135ba8
| NtQueryInformationFile
ntdll.dll | 0xfac34566
| RtlInitUnicodeString
rtm.dll | 0x6acc17e7
| EnumOverTable
SHCORE.dll | 0x2b1ca591
| CommandLineToArgvW
SHCORE.dll | 0x64472ee8
| SHDeleteValueW
SHCORE.dll | 0x9f0bd5aa
| SHDeleteKeyW
SHELL32.dll | 0x9cbc123d
| ShellExecuteExW
USER32.dll | 0x368a11e7
| GetForegroundWindow
USER32.dll | 0x9359b433
| GetDC
USER32.dll | 0xb6d391ae
| wsprintfW
USER32.dll | 0xbda29ac3
| GetKeyboardLayoutList
USER32.dll | 0xd228f54c
| SystemParametersInfoW
USER32.dll | 0xec21cb5b
| FillRect
USER32.dll | 0xfb28dc52
| DrawTextW
USER32.dll | 0xfcbfdbc2
| ReleaseDC
WINHTTP.dll | 0x1b146635
| WinHttpOpenRequest
WINHTTP.dll | 0x235a5e67
| WinHttpSetOption
WINHTTP.dll | 0x23ef5ed8
| WinHttpConnect
WINHTTP.dll | 0x38b7459a
| WinHttpOpen
WINHTTP.dll | 0x39714450
| WinHttpReadData
WINHTTP.dll | 0x9f9ce2a7
| WinHttpSendRequest
WINHTTP.dll | 0xa4a0d98e
| WinHttpQueryHeaders
WINHTTP.dll | 0xacd6d1fe
| WinHttpCloseHandle
WINHTTP.dll | 0xf0078d32
| WinHttpReceiveResponse
WINHTTP.dll | 0xffb58299
| WinHttpQueryDataAvailable
This data can now be used to deduce which functions are called and enable a Buttom Up approach again. Looking at the only references to WinHttpConnect
for example will probably lead to a C2 server.
# Appendix
#!/usr/bin/env python3.7
from revil import calc_hash
def chunks(l, n):
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
exports = []
with open('exports.txt', 'r') as fp:
for line in fp:
sp = line.strip().split(' ')
if len(sp) != 2:
continue
exports.append(sp)
with open('buffer.bin', 'rb') as fp:
hash_buffer = fp.read()
resolutions = {}
for chunk in chunks(hash_buffer, 4):
api_hash = int.from_bytes(chunk, byteorder='little')
for dll_name, export_name in exports:
calculated_hash = calc_hash(export_name)
if calculated_hash == ( ( api_hash ^ 0x76c7 ) << 0x10 ^ api_hash ) & 0x1fffff:
if dll_name not in resolutions.keys():
resolutions[dll_name] = []
resolutions[dll_name].append((api_hash, export_name))
break
for dll_name, pairs in resolutions.items():
for api_hash, export_name in pairs:
print(F'{dll_name}\t0x{api_hash:08x}\t{export_name}')
if __name__ == '__main__':
main()
Update (2019-11-22): No mention of the term "static import" now because it doesn't make sense. Instead of "dynamic" vs. "static" imports, the post now talks about imports resolved at "load-time" vs. those resolved at "runtime".
4 Replies to “API-Hashing in the Sodinokibi/REvil Ransomware – Why and How?”