blue teaming
malware analysis

YARA Part 1: Malware analysis for detecting IOC's

Zhassulan Zhussupov
22 June, 2023

yara security
Over the last couple of years, there has been a significant amount of study conducted to improve detection abilities, and researchers have been successful in developing a significant number of innovative detection approaches and methodologies. Each choice comes with its own individual combination of benefits and drawbacks. As a consequence of this, we are unable to assert that we will favor one method of detection over another when it comes to the issue of detection. Some individuals are better suited for one activity than they are for another. This is the manner in which a hybrid solution will be deployed in many different types of enterprises.

One type of detection method is called "signature based detection" and it refers to the use of a tool that looks for files that include a certain hash value or phrase. Another type of detection method is called "anomaly based detection" and it refers to the use of a tool that looks for files based on statistics and patterns. Both of these methods are examples of detection techniques.

The main detection mechanisms out there are:

  • Signature based
  • Anomaly and Behavioral based (e.g. UEBA)
  • Reputation based
  • Hybrid based (a mix of more than one)

If we were forced to recommend just one resource, it would be the book Applied Network Security Monitoring. We won't go into detail about all of the detection mechanisms because doing so would go beyond the scope of this series of articles, but if we had to choose just one reference, it would be this book.

The signature-based detection approach is going to be discussed in depth in this series of blog posts. This collection of articles digs into the many types of signatures that are available, as well as the information that can be used to assist you in designing your own signatures from scratch.

We will use the YARA tool to generate our own signatures, classify malware samples using signatures, and detect signatures using malware samples.

Before we go into the various types of signatures and YARA-specific stuff, it is essential to understand that it doesn't matter which detection method you use; if you don't supply relevant information, the indicator is a poor one. This is something you need to keep in mind at all times. The quality of the results will increase in direct proportion to the accuracy of the data that is sent into the detecting process. If you don't keep this criterion in mind, you'll end up with a lot of results that are incorrectly positive (false-positive).

What is an IOC?

Any forensic data discovered on a network or host that could be used with a high degree of certainty to identify an intrusion.

Such as a file hash or IP address.

As you may recall from our hashing instruction. Each file has its own unique hash value (with the exception of rare "collisions").

This means that if we search a host for malware with a particular hash and find it, we have confirmation that the host is infected with the same malware.

Types of IOCs: Network based indicators

Network based indicators are:

  • IP Addresses (IPv4 or IPv6)
  • URLs
  • Domain Names
  • Source Email Addresses
  • Email Message Objects
  • Email Attachments
  • X-Originating and X-Forwarding IP Addresses
  • X509 Certificate Hashes etc.

Types of IOCs: Host base indicators

For example:

  • File Names
  • File Hashes
  • File Locations (paths on host)
  • DLLs used
  • Registry Keys
  • Process Handle or Mutex Name

Why did we switch from signatures and detection techniques to indicators of compromise? What are IOCs and what is their relationship to signatures and everything else?

A signature may contain one or multiple IOCs.

It is possible to create a signature that searches for malware based on its hash value. However, what if the hash value of the sample is different from the one you are seeking? In contrast, your signature has been circumvented.

Consequently, signatures with a high level of granularity are superior. If we use a single IOC, we may be able to identify the sample, but if we combine multiple IOCs and perform the search, we will have a greater chance of detecting the sample and fewer false positives.


Yara is an instrument used to assist malware researchers in the identification and classification of malware samples.

Thanks to Victor Alvarez of VirusTotal for developing YARA and providing the community with a terrific tool for locating and categorizing malware samples.

YARA is a language used to describe the malware you're searching for. Every definition is known as a norm. Each rule is composed of a set of strings and a logical boolean expression.

rule silent_banker : banker
        description = "This is just an example"
        threat_level = 3
        in_the_wild = true

        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}

        $a or $b or $c

Another example:

rule pe_file
        description = "Rule fo PE-files"
        $mz = "{4d 5a}"
        $mz1 = "MZ"
        ($mz at 0x00) or ($mz1 at 0)

When using the YARA tool with this rule, "malware" is identified:

yara pe_file.yar <path>


Our rule was able to identify and classify two files as belonging to the same family of malware.

Don't worry if you did not perceive the previous rule or its application.

In subsequent malware analysis articles, we will often use YARA rules to classify or identify malware, and these examples are a good demonstration of how to apply this knowledge in practice, since YARA rules are a must-have for any professional malware analysis report.

YARA Core Syntax

Each rule begins with the word rule on the first line, followed by a name, when the rule is matched, the name is printed. Every rule has a condition that is executed in accordance with the logic defined by the rule:


Rule names must always begin with a character, but may also contain alphanumeric and underscore characters. The matter of the rules' names is significant.

As you write your rules, you can tag them so that you can subsequently filter your output based on the tags you create:


Both the meta and strings sections we saw previously, in the example rule, are optional.

You can write YARA rules in any text editor, and the file extension .yar is the standard.

A file called malware.yar is an example.

The metadata section is used to add metadata about your rule.

There are numerous reasons why you may desire to utilize this area, but we assume you are already aware of its utility. Identifiers defined in the metadata section cannot be utilized in the subsequent condition section. These are only for adding metadata to your rule; using them will result in an error.

If your rules are searching for strings, you will also need this section: strings.

The identifier could be a single character or a phrase, but it must commence with the dollar sign $. As an example of a string, consider the following:


We can only specify string values using characters, but we can search for bytes in hexadecimal representation using the strings section:


Another time you might want to use hexadecimal in the strings section is if you want to look for a certain sequence of bytes with four bytes that change from one sample to the next. The question mark ? might be used to ignore the value of those bytes and accept any value as long as the rest of the sequence follows our rule. For example, let's say we want to search for a string that always has two bytes changing in each sample, therefore we cannot search for an exact match.

Here is an example of three different variations: example5.

We can now benefit from the question mark sign to accept any value for last two bytes as shown below:


But wait, what if I want to look for a string that varies in length from 2 to 5 bytes in some cases? What changes do we need to make to our regulation to accommodate this? The good news is that YARA allows you to use wildcards.

The matching is achieved by using bracket signs to define the length of characters we can disregard the value of:


What if we're looking for very particular values in those bytes? Assume that the only options are 54 cd or 84 ca:

The string to match this occurrence is defined like this:


We saw a string description for the portable executable (PE) files in prior examples, but we never utilized it to check the offset. Now that we know that PE files always begin with MZ at offset zero, we can create a rule to look for them:


We could also search for ASCII strings in the samples:


No case-sensitive:


Also, you can use regular expressions with YARA. Understanding regex is beyond the scope of this blog article.


In this post, we examined the fundamentals of IOCs and how to perform static malware analysis using YARA rules. In the next installment of this series, we will examine how to evaluate malware using YARA's conditions and modules.

awesome yara
Practical Malware Analysis Book

Applied Network Security Monitoring Book

Authored By
Zhassulan Zhussupov

Cybersecurity enthusiast | Author | Speaker | CTF player | R&D Engineer | Jiu-Jitsu Practicioner

Share with the world!

Need Security?

Are you really sure your organization is secure?

At WebSec we help you answer this question by performing advanced security assessments.

Want to know more? Schedule a call with one of our experts.

Schedule a call
Authored By
Zhassulan Zhussupov

Cybersecurity enthusiast | Author | Speaker | CTF player | R&D Engineer | Jiu-Jitsu Practicioner