Hello and welcome back to a new practical case: analyze office documents. In our case, we are going to start looking at how to analyze Microsoft Office documents.
To begin, we are going to take a look at the fundamentals of analyzing office documents and office attacks. Office attacks are just one of the many methods that malware authors are utilizing in order to infiltrate computers at this time, and they can do so utilizing one of three different approaches.
What are the methods of attacks via office documents?
The first method is to use VBA scripts or macros, both of which are embedded within the Office document itself and will run automatically whenever the Office document is opened.
Sub HelloWorld() MsgBox "Hello, World!" End Sub
or another example:
Sub Document_Open() Dim strPrompt As String Dim strResponse As String strPrompt = "Please enter your name:" strResponse = InputBox(strPrompt) MsgBox "Welcome to websec.nl blog, " & strResponse End Sub
The second method, which is to use features of the office programs, such as DDE (Dynamic Data Exchange), that can also automate certain tasks and execute certain commands, is to use one of the most powerful versions of the Visual Basic for Applications (VBA) scripting language that is currently available. It can do almost anything that the malware can do.
The third method is to make use of a exploits in order to take advantage of vulnerabilities that are present inside the office programs themselves.
The process of analyzing documents involves searching for particular keywords and indicators of potentially harmful components. These may include things like scripts, VBA macros, and other types of commands hidden inside the papers.
Therefore, if you come across any kind of document that makes you suspicious, you should be on the lookout for embedded files. These embedded files could be binary files, or they could be compressed or encoded obfuscated files within the Microsoft document. You should also be on the lookout for embedded files.
The first thing you absolutely need to be aware of is the fact that there are two different versions of Microsoft Office, which are also referred to as office document formats.
The first version is an older version that was written before 2007. It was published in 2007. In addition, the Microsoft documents saved in this edition will have the following extensions:
.doc for Microsoft Word,
.xls for Microsoft Excel, and
.ppt for Microsoft PowerPoint.
This storage format is also known as structured storage format or SSF.
And after 2007, Microsoft released a new format also known as OpenOffice XML format. And in this format, the Microsoft document itself is actually an archive, an zip archive containing XML inside. And in this format, then you will see the extension,
.docm and so on.
And if you see,
.docm it means that it has got macros inside it. On the other hand, documents saved in
.docx format can potentially have macros that cannot be activated in some circumstances.
Malware creators, however, have devised ways to circumvent these defenses.
Microsoft Sturctured Storage Format (SSF)
Microsoft Office Structured Storage is a file format used to store compound documents, such as those created in Microsoft Office applications. It allows multiple streams of data to be stored within a single file, with each stream treated as a separate entity.
A compound document stored in Microsoft Office Structured Storage format consists of a root storage, which contains one or more streams and storages. Streams are used to store data, such as text or images, while storages are used to store additional streams and storages, creating a hierarchy within the compound document.
And oletools is one of the essential and helpful tools for studying and analyzing the SSF format.
It is made up of a number of useful scripts, such as, oletimes which is used for the purpose of extracting time, this is useful in order to gain some metadata information about when the malicious document wase created. Or olebrowse whose purpose is to view and extract streams.
We also have oleid, which is helpful for checking for features associated with harmful behavior, and this might be the first to use when you are doing triaging, similar to what we can do when we are studying, analyzing PDF documents: we use pdfid for triaging.
And, of course, we also have the olevba tool, which is use for extracting VBA scripts. In addition,
olebva may also be utilized for the latest format introduced by Microsoft Office, known as the archive.
Open Office XML format
Next, we are going to take a look at a new Microsoft Office format, which is called the Open Office XML format. As we have seen at this post, this is an open XML format that is associated with the file that has the extension
And inside of it, you will find an archive that is comprised of multiple files, including the XML files and their location in a variety of directories, as well as the
word directory, which houses the
If this archive contains
vbaProject.bin file, it means that this document contains VBA scripts, also known as macros.
vbaProject.bin will not be permitted to execute in this extension:
.docx under normal circumstances.
However, as we mentioned earlier, now the others may have ways to overcome this limitation. Additionally, it is essential to keep in mind that the format in question (
vbaProject.bin) is a binary format.
So we can't open it directly with hex editor to look at it.
The other format is
docm, and you can be certain that it contains a
VBAproject.bin file when you encounter it. Consequently, he has the VBA macros.
The tools we use for analyzing Microsoft Office documents include metadata tools such as
exiftool, which we have also used in the past for analyzing our PDF files.
We can also use
yara for signature detection, and "olevba" for analyzing and extracting VBA scripts from office files.
So what is workflow for analysing office documents?
When you are faced with an office document, the first step is to determine whether it is the old format, which is the
SSF format, or is it a new open office xml format.
Therefore, we must determine the document type using our identification tools.
Second, we will use tools to search for malicious indicators, such as keywords, files, structures, and filenames. If you discover any hidden or embedded files, we will extract them and continue our investigation.
Let's go to analyse malicious office document.
Let's say we have malicious document:
baddoc.doc. First of all, use
exiftool for looking metadata:
And we see that file modification timestamp is
and create date is
As you can see also, this is a Russian origin:
and use "Microsoft Word 97-2003" old office format:
So, it's format is old Microsoft Structured Storage Format - SSF, therefore not Open Office XML format.
And the next step is scan via
yara -w ~/malw/rules/index.yar baddoc.doc
yara say that this document contains VBA macro code.
Let's go to continue analysis this "bad" file with other tools.
This file is not encrypted. As you can see,
oleid once again confirms the hypothesis that this is a Microsoft Office 1997-2003 file and contains a macro.
Next tool is
And as you can see, all timestamps are the same. So this more or less confirms that this is the date in which it was created.
You can also see a lot of storages and stream objects here, and
oletime tool "says" it contains VBA.
At the next step run
And as you can see, there are a lot of information.
The result includes the VBA script, and one of the most important sections contains a table that summarizes the entire analysis.
So, it contains
AutoExec which means that this document execute something when it is opening.
Suspicious markers also give us a lot of information:
Please note that this section contains obfuscation. And if we examine the VBA script once more, there is a great deal of code obfuscation, which is an attempt to make its analysis more difficult.
You will also see some location paths of the file system, this could be a location where this macro may the write things.
To extract VBA scripts from this document, we can use
olevba -c option:
olevba -c baddoc.doc > baddoc.vba
So, it is ready for dynamic analysis in the future.
Static malware analysis of Office documents is the process of examining the content of the document to detect malicious code. This type of analysis can be used to identify malicious macros, embedded executables, and other malicious content in the document.
The first step in static malware analysis is to extract the content of the document. This can be done by unzipping the document and then examining the contents. During this process, malicious macros, embedded executables, and other malicious content can be identified.
The next step is to analyze the content of the document. This is done by examining the macros and executables to determine the purpose of the code. If malicious code is detected, it can be blocked or removed from the document.
Finally, the document can be scanned for any malicious activity. This can be done by using a malware scanning tool to detect any malicious activity that may have been hidden in the document. If suspicious activity is detected, it can be blocked or removed from the document.
Static malware analysis of Office documents is a very effective way of identifying malicious content in documents. By performing a thorough analysis of the document, malicious content can be identified and blocked or removed from the document.