macOS Universal binaries & Mach-O Format
Last updated
Last updated
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE) Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Mac OS binaries usually are compiled as universal binaries. A universal binary can support multiple architectures in the same file.
These binaries follows the Mach-O structure which is basically compased of:
Header
Load Commands
Data
Search for the file with: mdfind fat.h | grep -i mach-o | grep -E "fat.h$"
The header has the magic bytes followed by the number of archs the file contains (nfat_arch
) and each arch will have a fat_arch
struct.
Check it with:
or using the Mach-O View tool:
As you may be thinking usually a universal binary compiled for 2 architectures doubles the size of one compiled for just 1 arch.
The header contains basic information about the file, such as magic bytes to identify it as a Mach-O file and information about the target architecture. You can find it in: mdfind loader.h | grep -i mach-o | grep -E "loader.h$"
There are different file types, you can find them defined in the source code for example here. The most important ones are:
MH_OBJECT
: Relocatable object file (intermediate products of compilation, not executables yet).
MH_EXECUTE
: Executable files.
MH_FVMLIB
: Fixed VM library file.
MH_CORE
: Code Dumps
MH_PRELOAD
: Preloaded executable file (no longer supported in XNU)
MH_DYLIB
: Dynamic Libraries
MH_DYLINKER
: Dynamic Linker
MH_BUNDLE
: "Plugin files". Generated using -bundle in gcc and explicitly loaded by NSBundle
or dlopen
.
MH_DYSM
: Companion .dSym
file (file with symbols for debugging).
MH_KEXT_BUNDLE
: Kernel Extensions.
Or using Mach-O View:
The source code also defines several flags useful for loading libraries:
MH_NOUNDEFS
: No undefined references (fully linked)
MH_DYLDLINK
: Dyld linking
MH_PREBOUND
: Dynamic references prebound.
MH_SPLIT_SEGS
: File splits r/o and r/w segments.
MH_WEAK_DEFINES
: Binary has weak defined symbols
MH_BINDS_TO_WEAK
: Binary uses weak symbols
MH_ALLOW_STACK_EXECUTION
: Make the stack executable
MH_NO_REEXPORTED_DYLIBS
: Library not LC_REEXPORT commands
MH_PIE
: Position Independent Executable
MH_HAS_TLV_DESCRIPTORS
: There is a section with thread local variables
MH_NO_HEAP_EXECUTION
: No execution for heap/data pages
MH_HAS_OBJC
: Binary has oBject-C sections
MH_SIM_SUPPORT
: Simulator support
MH_DYLIB_IN_CACHE
: Used on dylibs/frameworks in shared library cache.
The file's layout in memory is specified here, detailing the symbol table's location, the context of the main thread at execution start, and the required shared libraries. Instructions are provided to the dynamic loader (dyld) on the binary's loading process into memory.
The uses the load_command structure, defined in the mentioned loader.h
:
There are about 50 different types of load commands that the system handles differently. The most common ones are: LC_SEGMENT_64
, LC_LOAD_DYLINKER
, LC_MAIN
, LC_LOAD_DYLIB
, and LC_CODE_SIGNATURE
.
Basically, this type of Load Command define how to load the __TEXT (executable code) and __DATA (data for the process) segments according to the offsets indicated in the Data section when the binary is executed.
These commands define segments that are mapped into the virtual memory space of a process when it is executed.
There are different types of segments, such as the __TEXT segment, which holds the executable code of a program, and the __DATA segment, which contains data used by the process. These segments are located in the data section of the Mach-O file.
Each segment can be further divided into multiple sections. The load command structure contains information about these sections within the respective segment.
In the header first you find the segment header:
Example of segment header:
This header defines the number of sections whose headers appear after it:
Example of section header:
If you add the section offset (0x37DC) + the offset where the arch starts, in this case 0x18000
--> 0x37DC + 0x18000 = 0x1B7DC
It's also possible to get headers information from the command line with:
Common segments loaded by this cmd:
__PAGEZERO
: It instructs the kernel to map the address zero so it cannot be read from, written to, or executed. The maxprot and minprot variables in the structure are set to zero to indicate there are no read-write-execute rights on this page.
This allocation is important to mitigate NULL pointer dereference vulnerabilities. This is because XNU enforces a hard page zero that ensures the first page (only the first) of memory is innaccesible (except in i386). A binary could fulfil this requirements by crafting a small __PAGEZERO (using the -pagezero_size
) to cover the first 4k and having the rest of 32bit memory accessible in both user and kernel mode.
__TEXT
: Contains executable code with read and execute permissions (no writable). Common sections of this segment:
__text
: Compiled binary code
__const
: Constant data (read only)
__[c/u/os_log]string
: C, Unicode or os logs string constants
__stubs
and __stubs_helper
: Involved during the dynamic library loading process
__unwind_info
: Stack unwind data.
Note that all this content is signed but also marked as executable (creating more options for exploitation of sections that doesn't necessarily need this privilege, like string dedicated sections).
__DATA
: Contains data that is readable and writable (no executable).
__got:
Global Offset Table
__nl_symbol_ptr
: Non lazy (bind at load) symbol pointer
__la_symbol_ptr
: Lazy (bind on use) symbol pointer
__const
: Should be read-only data (not really)
__cfstring
: CoreFoundation strings
__data
: Global variables (that have been initialized)
__bss
: Static variables (that have not been initialized)
__objc_*
(__objc_classlist, __objc_protolist, etc): Information used by the Objective-C runtime
__DATA_CONST
: __DATA.__const is not guaranteed to be constant (write permissions), nor are other pointers and the GOT. This section makes __const
, some initializers and the GOT table (once resolved) read only using mprotect
.
__LINKEDIT
: Contains information for the linker (dyld) such as, symbol, string, and relocation table entries. It' a generic container for contents that are neither in __TEXT
or __DATA
and its content is decribed in other load commands.
dyld information: Rebase, Non-lazy/lazy/weak binding opcodes and export info
Functions starts: Table of start addresses of functions
Data In Code: Data islands in __text
SYmbol Table: Symbols in binary
Indirect Symbol Table: Pointer/stub symbols
String Table
Code Signature
__OBJC
: Contains information used by the Objective-C runtime. Though this information might also be found in the __DATA segment, within various in __objc_* sections.
__RESTRICT
: A segment without content with a single section called __restrict
(also empty) that ensures that when running the binary, it will ignore DYLD environmental variables.
As it was possible to see in the code, segments also support flags (although they aren't used very much):
SG_HIGHVM
: Core only (not used)
SG_FVMLIB
: Not used
SG_NORELOC
: Segment has no relocation
SG_PROTECTED_VERSION_1
: Encryption. Used for example by Finder to encrypt text __TEXT
segment.
LC_UNIXTHREAD/LC_MAIN
LC_MAIN
contains the entrypoint in the entryoff attribute. At load time, dyld simply adds this value to the (in-memory) base of the binary, then jumps to this instruction to start execution of the binary’s code.
LC_UNIXTHREAD
contains the values the register must have when starting the main thread. This was already deprecated but dyld
still uses it. It's possible to see the vlaues of the registers set by this with:
LC_CODE_SIGNATURE
Contains information about the code signature of the Macho-O file. It only contains an offset that points to the signature blob. This is typically at the very end of the file. However, you can find some information about this section in this blog post and this gists.
LC_ENCRYPTION_INFO[_64]
Support for binary encryption. However, of course, if an attacker manages to compromise the process, he will be able to dump the memory unencrypted.
LC_LOAD_DYLINKER
Contains the path to the dynamic linker executable that maps shared libraries into the process address space. The value is always set to /usr/lib/dyld
. It’s important to note that in macOS, dylib mapping happens in user mode, not in kernel mode.
LC_IDENT
Obsolete but when configured to geenrate dumps on panic, a Mach-O core dump is created and the kernel version is set in the LC_IDENT
command.
LC_UUID
Random UUID. It's useful for anything directly but XNU caches it with the rest of the process info. It can be used in crash reports.
LC_DYLD_ENVIRONMENT
Allows to indicate environment variables to the dyld beforenthe process is executed. This can be vary dangerous as it can allow to execute arbitrary code inside the process so this load command is only used in dyld build with #define SUPPORT_LC_DYLD_ENVIRONMENT
and further restricts processing only to variables of the form DYLD_..._PATH
specifying load paths.
LC_LOAD_DYLIB
This load command describes a dynamic library dependency which instructs the loader (dyld) to load and link said library. There is a LC_LOAD_DYLIB
load command for each library that the Mach-O binary requires.
This load command is a structure of type dylib_command
(which contains a struct dylib, describing the actual dependent dynamic library):
You could also get this info from the cli with:
Some potential malware related libraries are:
DiskArbitration: Monitoring USB drives
AVFoundation: Capture audio and video
CoreWLAN: Wifi scans.
A Mach-O binary can contain one or more constructors, that will be executed before the address specified in LC_MAIN. The offsets of any constructors are held in the __mod_init_func section of the __DATA_CONST segment.
At the core of the file lies the data region, which is composed of several segments as defined in the load-commands region. A variety of data sections can be housed within each segment, with each section holding code or data specific to a type.
The data is basically the part containing all the information that is loaded by the load commands LC_SEGMENTS_64
This includes:
Function table: Which holds information about the program functions.
Symbol table: Which contains information about the external function used by the binary
It could also contain internal function, variable names as well and more.
To check it you could use the Mach-O View tool:
Or from the cli:
In __TEXT
segment (r-x):
__objc_classname
: Class names (strings)
__objc_methname
: Method names (strings)
__objc_methtype
: Method types (strings)
In __DATA
segment (rw-):
__objc_classlist
: Pointers to all Objetive-C classes
__objc_nlclslist
: Pointers to Non-Lazy Objective-C classes
__objc_catlist
: Pointer to Categories
__objc_nlcatlist
: Pointer to Non-Lazy Categories
__objc_protolist
: Protocols list
__objc_const
: Constant data
__objc_imageinfo
, __objc_selrefs
, objc__protorefs
...
_swift_typeref
, _swift3_capture
, _swift3_assocty
, _swift3_types, _swift3_proto
, _swift3_fieldmd
, _swift3_builtin
, _swift3_reflstr
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE) Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)