HackTricks
Search…
Pentesting
Office file analysis

Introduction

Microsoft has created dozens of office document file formats, many of which are popular for the distribution of phishing attacks and malware because of their ability to include macros (VBA scripts).
Broadly speaking, there are two generations of Office file format: the OLE formats (file extensions like RTF, DOC, XLS, PPT), and the "Office Open XML" formats (file extensions that include DOCX, XLSX, PPTX). Both formats are structured, compound file binary formats that enable Linked or Embedded content (Objects). OOXML files are actually zip file containers, meaning that one of the easiest ways to check for hidden data is to simply unzip the document:
1
$ unzip example.docx
2
Archive: example.docx
3
inflating: [Content_Types].xml
4
inflating: _rels/.rels
5
inflating: word/_rels/document.xml.rels
6
inflating: word/document.xml
7
inflating: word/theme/theme1.xml
8
extracting: docProps/thumbnail.jpeg
9
inflating: word/comments.xml
10
inflating: word/settings.xml
11
inflating: word/fontTable.xml
12
inflating: word/styles.xml
13
inflating: word/stylesWithEffects.xml
14
inflating: docProps/app.xml
15
inflating: docProps/core.xml
16
inflating: word/webSettings.xml
17
inflating: word/numbering.xml
18
$ tree
19
.
20
├── [Content_Types].xml
21
├── _rels
22
├── docProps
23
│ ├── app.xml
24
│ ├── core.xml
25
│ └── thumbnail.jpeg
26
└── word
27
├── _rels
28
│ └── document.xml.rels
29
├── comments.xml
30
├── document.xml
31
├── fontTable.xml
32
├── numbering.xml
33
├── settings.xml
34
├── styles.xml
35
├── stylesWithEffects.xml
36
├── theme
37
│ └── theme1.xml
38
└── webSettings.xml
Copied!
As you can see, some of the structure is created by the file and folder hierarchy. The rest is specified inside the XML files. New Steganographic Techniques for the OOXML File Format, 2011 details some ideas for data hiding techniques, but CTF challenge authors will always be coming up with new ones.
Once again, a Python toolset exists for the examination and analysis of OLE and OOXML documents: oletools. For OOXML documents in particular, OfficeDissector is a very powerful analysis framework (and Python library). The latter includes a quick guide to its usage.
Sometimes the challenge is not to find hidden static data, but to analyze a VBA macro to determine its behavior. This is a more realistic scenario, and one that analysts in the field perform every day. The aforementioned dissector tools can indicate whether a macro is present, and probably extract it for you. A typical VBA macro in an Office document, on Windows, will download a PowerShell script to %TEMP% and attempt to execute it, in which case you now have a PowerShell script analysis task too. But malicious VBA macros are rarely complicated, since VBA is typically just used as a jumping-off platform to bootstrap code execution. In the case where you do need to understand a complicated VBA macro, or if the macro is obfuscated and has an unpacker routine, you don't need to own a license to Microsoft Office to debug this. You can use Libre Office: its interface will be familiar to anyone who has debugged a program; you can set breakpoints and create watch variables and capture values after they have been unpacked but before whatever payload behavior has executed. You can even start a macro of a specific document from a command line:
1
$ soffice path/to/test.docx macro://./standard.module1.mymacro
Copied!

oletools

1
sudo pip3 install -U oletools
2
olevba -c /path/to/document #Extract macros
Copied!

Automatic Execution

Macro functions like AutoOpen, AutoExec or Document_Open will be automatically executed.

References

Last modified 5mo ago