Ghidra: YARA scanning



This blag post covers scanning the Ghidra virtual memory with YARA. # What is YARA YARA is the industry standard when it comes to signatures matching on malware. If you don't know YARA: It's a simple pattern-matching language that has some features tailored for searching in binary data. Logical expressions for matches and the capability to match on parsed structures of some common executable formats are a few examples of such features.
rule hello_yara {
    strings:
        $a = "Hello"
        $b = "Nullteilerfrei"
    condition:
        $a or $b
}
It is, of course, possible to write YARA signatures for a particular malware family. This was basically the approach of classical anti-virus solutions: write a signature for every malware out there and notify the user if any of them matches. Nowadays there are a lot of other things going on like heuristics and machine learning but that will not be the focus of this blag post. Instead, I want to focus on the fact that you can also write relative generic YARA signatures that match on a particular algorithm. # Algorithm Matching Having rules for different algorithms is especially interesting if you are interested in a good entry point for a bottom-up analysis: if you are looking at ransomware for example, and get a hit for the AES algorithm - a popular symmetric encryption algorithm - starting your analysis there may yield fast results. Writing rules for sequences of assembly commands - to _really_ match on the algorithm - is not easy: compiler may re-arrange commands and use different registers or optimization techniques resulting in vastly different sequences of assembly instructions even when starting of with the same source code. Hence writing YARA rules based on _data_ that is necessary for an algorithm is a very attractive alternative. Any implementation of the AES algorithm, for example, will leverage a so-called S-Box (substitution-box). Generally, an S-Box simply is a schema to substitute values with others (in an invertible manner) with the intention to "further confuse" the data being encrypted. Oftentimes this S-Box is just included verbatim as an array in the source code which will nicely translate to a binary buffer in the compiled executable. [So head over to Wikipedia](https://en.wikipedia.org/wiki/Rijndael_S-box) and paste the S-Box into a YARA signature:
rule aes
{
    strings:
        $sbox = {
            63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76
            ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0
            b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15
            04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75
            09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84
            53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf
            d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8
            51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2
            cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73
            60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db
            e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79
            e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08
            ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a
            70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e
            e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df
            8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16
        }
    condition:
        any of them
}
Other encryption, compression, or really any sort of algorithm can have similar telltales which can be captured in YARA signatures. [rattle and I collect rules to match a ton of those on github](https://github.com/nullteilerfrei/reversing-class/blob/master/yara/algos.yara). Contributions welcome. # Ghidra Script One of my default steps when starting analysis on a binary, is to scan it with these rules and look at the matches. [To make this step easier during analysis with Ghidra, mich (0x6d696368) wrote a handy Python script](https://github.com/0x6d696368/ghidra_scripts/blob/master/YaraSearch.py). Since I am currently honing my Java skills I re-implemented it as a Ghidra Java script. Nothing fancy to see here: it will ask you for a YARA rule file, dump each memory region to disk, scan those files and show matches in Ghidra's console window. It will also add bookmarks and comments to the matching positions in the disassembly as well as the decompiled output.
//Perform a YARA scan of the binary, show results, comment on those lines and add bookmarks
//@author larsborn
//@category Search
//@keybinding Ctrl+Alt+F9

import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.OptionalLong;

import ghidra.app.script.GhidraScript;
import ghidra.program.model.address.*;
import ghidra.program.model.listing.CodeUnit;
import ghidra.program.model.listing.Instruction;
import ghidra.program.model.mem.MemoryAccessException;

public class YaraScan extends GhidraScript {
	String YARA_PATH = "C:\\bin\\yara.exe";

	private class YaraMatch {
		public String ruleName;
		public String stringName;
		public long offset;

		private YaraMatch(String ruleName, String stringName, long offset) {
			this.ruleName = ruleName;
			this.stringName = stringName;
			this.offset = offset;
		}
	}

	private File dumpToTempFile(byte[] content) throws IOException {
		File tempFile = File.createTempFile("GhidraYaraScan-", ".bin");
		FileOutputStream fos = new FileOutputStream(tempFile);
		fos.write(content);
		fos.close();
		tempFile.deleteOnExit();
		return tempFile;
	}

	private OptionalLong getStringMatchPosition(String line) {
		String[] tripple = getRuleName(line).split(":", 3);
		if (tripple.length < 2) {
			return OptionalLong.empty();
		}
		this.println(line);

		return OptionalLong.of(Long.decode(tripple[0]));
	}

	private String getStringName(String line) {
		String[] tripple = getRuleName(line).split(":", 3);
		if (tripple.length < 2) {
			return null;
		}

		return tripple[1];
	}

	private String getRuleName(String line) {
		String[] pair = line.split(" ", 2);
		if (pair.length != 2) {
			return null;
		}
		return pair[0];
	}

	private List<YaraMatch> scanFile(File file, String rulePath) {
		List<YaraMatch> ret = new ArrayList<YaraMatch>();
		try {
			String command = String.format("%s -s %s %s", YARA_PATH, rulePath, file.getAbsolutePath());
			Runtime run = Runtime.getRuntime();
			Process proc = run.exec(command);
			BufferedReader stdInput = new BufferedReader(new InputStreamReader(proc.getInputStream()));
			String line = null;
			String currentRule = null;
			while ((line = stdInput.readLine()) != null) {
				OptionalLong position = getStringMatchPosition(line);
				if (position.isPresent()) {
					if (currentRule == null) {
						continue;
					}
					ret.add(new YaraMatch(currentRule, getStringName(line), position.getAsLong()));
					continue;
				}

				String ruleName = getRuleName(line);
				if (ruleName != null) {
					currentRule = ruleName;
					continue;
				}
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
		return ret;
	}

	public void run() throws Exception {
		File rule = askFile("YARA Rules File", "Scan!");
		for (AddressRange addressRange : currentProgram.getMemory().getAddressRanges()) {
			try {
				File tmp = dumpToTempFile(getBytes(addressRange.getMinAddress(), (int) addressRange.getLength()));
				for (YaraMatch match : scanFile(tmp, rule.getAbsolutePath())) {
					long offset = addressRange.getMinAddress().getOffset() + match.offset;
					this.println(String.format("YARA Match %s: 0x%x (%s)", match.ruleName, offset, match.stringName));
					Address address = findNearestAssemblyInstructionBackwards(offset);
					setComment(address, String.format("YARA Match: %s (%s)", match.ruleName, match.stringName));
					createBookmark(toAddr(offset), "YARAMatch",
							String.format("%s (%s)", match.ruleName, match.stringName));
				}
				tmp.delete();
			} catch (MemoryAccessException e) {
				this.println(String.format("Cannot access memory in range 0x%x-0x%x",
						addressRange.getMinAddress().getOffset(), addressRange.getMaxAddress().getOffset()));
			}
		}
	}

	protected void setComment(Address address, String comment) {
		CodeUnit codeUnit = currentProgram.getListing().getCodeUnitAt(address);
		if (codeUnit == null) {
			this.println(String.format("WARNING: Could not find address for offset 0x%x", address.getOffset()));
			return;
		}
		codeUnit.setComment(CodeUnit.PLATE_COMMENT, comment);
		codeUnit.setComment(CodeUnit.PRE_COMMENT, comment);
	}

	short MAX_ASSEMBLY_INSTRUCTION_LENGTH = 15;

	/**
	 * Searchers backwards for the last complete assembly instruction and returns
	 * its address
	 */
	protected Address findNearestAssemblyInstructionBackwards(long offset) {
		for (int i = 0; i <= MAX_ASSEMBLY_INSTRUCTION_LENGTH; i++) {
			Address addr = toAddr(offset - i);
			Instruction instruction = getInstructionAt(addr);
			// this.println(String.format("0x%x %s", offset - i, instruction));
			if (instruction != null) {
				return addr;
			}
		}
		return toAddr(offset);
	}
}
Don't forget to adjust the path to your YARA executable. Good Hunting.

Leave a Reply

Your email address will not be published. Required fields are marked *