Intro to static malware analysis

Static Malware Analysis

What is malware analysis

Malware Analysis is the practice of determining and analyzing suspicious files on endpoints and within networks using dynamic analysis, static analysis, or full reverse engineering.

Types of malware analysis

Malware analysis is now a thriving industry in the field of information security. There are different malware analysis techniques. Performing malware analysis is done by following two methods:

- static malware analysis

Static analysis is the process of dissecting malware without actually running it. This also can be broken down further into two techniques:
- Basic: The analyst attempts to comprehend the malware by examining the file, file structure, malicious functions, and so on.
- Advanced: The analyst goes further into the malware and tries to debugger it using low-level instructions.

This is often where the malware sample is disassembled manually by the analyst.

- dynamic malware analysis

Dynamic or Behavioral analysis is performed by observing the behavior of the malware while it is actually running on a host system. This form of analysis is often performed in a sandbox environment too prevent the malware from actually infecting production a.k.a host systems.

Where can I find samples to experiment with?

There are several internet tools that may help you locate additional malware samples to practice with and enhance your malware analysis expertise.
Some of these resources are completely free and registration-free, while others require an email account.
There are commercial resources available, but the free ones should suffice for your purposes, as there are several examples to employ.

Please note that these resources are by no means exhaustive, but they should give you with sufficient samples to work with:

the Zoo repo
https://github.com/fabrimagic72/malware-samples
http://app.any.run/ - interactive online malware analysis sandbox
MalwareBazaar database

The majority of researchers who produce lectures and publish their work online also offer a URL to the sample they discuss in their video, thus this is an additional location to search for further samples.

Throughout this article, the term "sample" is used to refer to a file that is being inspected or analyzed rather than malware.

This is due to the fact that we must initially assess the file or sample to determine if it is harmful or benign.

If you are given a sample or collect one yourself, you should first understand as much as possible about it before running it. The more you know about your sample, the better and simpler its analysis will be.

Basic static analysis will employ several techniques and methodologies, but none of them will include code examination.

Basic static analysis

We'll go through Basic Static Analysis in depth in this subject. Our research methods will be divided into four stages:

1. Identifying and classifying files is the first step.
2. Examining
3. Examine the file format
4. Recognizing Obfuscation

File identification and classification

What exactly is meant by the terms "file classification" and "file identification," and why is it so fundamental?

File identification refers to the process of determining the type of file being examined and producing an individual "signature" for that file. It is comparable to the social security number of an individual. With this one-of-a-kind identification, not only are you able to share information on the sample with other parties, but you may also avoid having to perform an analysis on a file that you have already completed.

There are several various methods for identifying files that can be used in malware investigation, including the following:

1. Based on the file types:
- Portable executable (PE-files)
- PDF
- DOCX

2. Based on file hashes:
- MD5, SHA1, SHA256, and other common hashing techniques for fixed value hashes
- ImpHash
- Fuzzy Hash

3. Based on the file's contained strings

Every file, as you can see at the bottom, has its own structure to describe its content.

Basically PE file structure looks like this:

And PDF file structure looks like this:

hexdump

Let's examine both of these files using the hexadecimal notation.
The hexdump command in Linux may be used to filter and display the contents of certain files or the standard input in a manner that is readable by humans:

hexdump -C example1.exe

As you can see, if we look at the EXE file, We are going to note that it has a certain structure, which is something that we are going to go over in more depth later.

hexdump -C helloworld.pdf

Here we can see a sample view of the structure of a PDF file.

hexdump -C pefile.png

file

The file command may be found in virtually all Linux and BSD distributions. It is based on a library known as libmagic, which is capable of doing metadata analysis by making use of any file structure information that has been stored in a "magic database":

file helloworld.pdf

file example1.exe

Therefore, even if a file identifies itself as a "png," we must still practice the principle of "TRUST, BUT VERIFY."
We need to make sure that the file is a png and not an executable file that is pretending to be a png or an executable file that has a png icon attached to it.

Hashes

The next step is to categorize the sample and determine whether it has been classed as malware previously.
We may be able to determine whether the sample comes from a certain family or an APT (Advanced Persistent Threat) group.

The sample's hash value is one classification approach for malware.
The process of obtaining the hash value is known as hashing, and the resulting string is known as a hash.

A sample's hash serves as the file's signature.

Consequently, you can use this signature to search a database of known malware hashes for the existence of the virus.

One of the capabilities of systems like VirusTotal is this.

There are a variety of hashing functions used in malware analysis, but the following are the most common:

MD5
SHA-1
SHA-256

For example, run:

md5sum locker.exe

for SHA-256 run:

sha256sum locker.exe

In this examples I used Conti ransomware samples

It is crucial to understand that even a single bit change in a file will result in an entirely different hash value.

This is a worry, particularly with malware, because a sample may adapt itself for a specific target without affecting its main goal, yet from a hash perspective, it is a whole separate and unrelated file.

Malware that is capable of self-modification is known as malware that applies polymorphism.

Wikipedia:

In computer terms, polymorphic code is code that use a polymorphic engine to modify while maintaining the integrity of the original algorithm.

How do we handle malware that is polymorphic across all platforms, resulting in unique hashes for each version?

"Fuzzy Hashing" is the solution.

This approach utilizes fuzzy hashes as input to identify similarities among files and to determine if a sample is malicious or not.

https://ssdeep-project.github.io is a command-line utility that is widely used for malware analysis and other fuzzy hashing testing:

ssdeep -b foo.txt > hashes.txt
ssdeep -b -m hashes.txt bar.txt

The number at the end of each line is a match score, which is a weighted indication of how similar these files are. The greater the number, the closer together the files are.

Strings

As a Malware Analyst, when you extract strings from files, you may use them in a variety of ways to determine the sort of sample you're working with.

Strings can be utilized as an additional source of information on the sample under investigation.

Information obtained from strings retrieved from samples during analysis may include, but is not limited to, the following:

1. The sample's internal/external messages
2. The functions that are referenced (invoked)
3. What sections does the sample use?
4. The usage of IP addresses and/or domain names
5. Error handling and messages
6. Other names, keywords, and so on

The strings utility is also included in the binutils package. This program scans the file from start to end for strings that would be encoded following conventional conventions, such as a sequence of human-readable characters followed by the 0 (NULL) byte (\x00).
The strings tool can be configured to adjust its behavior to simply filter longer strings, and it can also recognize a variety of various string encodings, including the common UTF-16 on Windows.

To display only 8-byte or longer strings, use the following syntax:

strings -n 8 evil.exe

Conclusion

As you can see, there is a variety of software that is both open source and free that does static analysis.

In the following parts of our blog, we will study and analyze various file formats such as PE-files. With each installment in this series, we'll look at more sophisticated cases and try to reverse the malware's most fascinating varieties (dynamic analysis). We'll start with simple examples, of course.

the Zoo repo
https://github.com/fabrimagic72/malware-samples
http://app.any.run/ - interactive online malware analysis sandbox
MalwareBazaar database
https://en.wikipedia.org/wiki/MD5
https://en.wikipedia.org/wiki/SHA-1
https://en.wikipedia.org/wiki/SHA-2
https://ssdeep-project.github.io