Libc Heap

Heap Basics

The heap is basically the place where a program is going to be able to store data when it requests data calling functions like malloc, calloc... Moreover, when this memory is no longer needed it's made available calling the function free.

As it's shown, its just after where the binary is being loaded in memory (check the [heap] section):

Basic Chunk Allocation

When some data is requested to be stored in the heap, some space of the heap is allocated to it. This space will belong to a bin and only the requested data + the space of the bin headers + minimum bin size offset will be reserved for the chunk. The goal is to just reserve as minimum memory as possible without making it complicated to find where each chunk is. For this, the metadata chunk information is used to know where each used/free chunk is.

There are different ways to reserver the space mainly depending on the used bin, but a general methodology is the following:

  • The program starts by requesting certain amount of memory.

  • If in the list of chunks there someone available big enough to fulfil the request, it'll be used

    • This might even mean that part of the available chunk will be used for this request and the rest will be added to the chunks list

  • If there isn't any available chunk in the list but there is still space in allocated heap memory, the heap manager creates a new chunk

  • If there is not enough heap space to allocate the new chunk, the heap manager asks the kernel to expand the memory allocated to the heap and then use this memory to generate the new chunk

  • If everything fails, malloc returns null.

Note that if the requested memory passes a threshold, mmap will be used to map the requested memory.


In multithreaded applications, the heap manager must prevent race conditions that could lead to crashes. Initially, this was done using a global mutex to ensure that only one thread could access the heap at a time, but this caused performance issues due to the mutex-induced bottleneck.

To address this, the ptmalloc2 heap allocator introduced "arenas," where each arena acts as a separate heap with its own data structures and mutex, allowing multiple threads to perform heap operations without interfering with each other, as long as they use different arenas.

The default "main" arena handles heap operations for single-threaded applications. When new threads are added, the heap manager assigns them secondary arenas to reduce contention. It first attempts to attach each new thread to an unused arena, creating new ones if needed, up to a limit of 2 times the number of CPU cores for 32-bit systems and 8 times for 64-bit systems. Once the limit is reached, threads must share arenas, leading to potential contention.

Unlike the main arena, which expands using the brk system call, secondary arenas create "subheaps" using mmap and mprotect to simulate the heap behaviour, allowing flexibility in managing memory for multithreaded operations.


Subheaps serve as memory reserves for secondary arenas in multithreaded applications, allowing them to grow and manage their own heap regions separately from the main heap. Here's how subheaps differ from the initial heap and how they operate:

  1. Initial Heap vs. Subheaps:

    • The initial heap is located directly after the program's binary in memory, and it expands using the sbrk system call.

    • Subheaps, used by secondary arenas, are created through mmap, a system call that maps a specified memory region.

  2. Memory Reservation with mmap:

    • When the heap manager creates a subheap, it reserves a large block of memory through mmap. This reservation doesn't allocate memory immediately; it simply designates a region that other system processes or allocations shouldn't use.

    • By default, the reserved size for a subheap is 1 MB for 32-bit processes and 64 MB for 64-bit processes.

  3. Gradual Expansion with mprotect:

    • The reserved memory region is initially marked as PROT_NONE, indicating that the kernel doesn't need to allocate physical memory to this space yet.

    • To "grow" the subheap, the heap manager uses mprotect to change page permissions from PROT_NONE to PROT_READ | PROT_WRITE, prompting the kernel to allocate physical memory to the previously reserved addresses. This step-by-step approach allows the subheap to expand as needed.

    • Once the entire subheap is exhausted, the heap manager creates a new subheap to continue allocation.


This struct allocates relevant information of the heap. Moreover, heap memory might not be continuous after more allocations, this struct will also store that info.

// From

typedef struct _heap_info
  mstate ar_ptr; /* Arena for this heap. */
  struct _heap_info *prev; /* Previous heap. */
  size_t size;   /* Current size in bytes. */
  size_t mprotect_size; /* Size in bytes that has been mprotected
                           PROT_READ|PROT_WRITE.  */
  size_t pagesize; /* Page size used when allocating the arena.  */
  /* Make sure the following data is properly aligned, particularly
     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
  char pad[-3 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;


Each heap (main arena or other threads arenas) has a malloc_state structure. It’s important to notice that the main arena malloc_state structure is a global variable in the libc (therefore located in the libc memory space). In the case of malloc_state structures of the heaps of threads, they are located inside own thread "heap".

There some interesting things to note from this structure (see C code below):

  • __libc_lock_define (, mutex); Is there to make sure this structure from the heap is accessed by 1 thread at a time

  • Flags:

    • #define NONCONTIGUOUS_BIT     (2U)
      #define contiguous(M)          (((M)->flags & NONCONTIGUOUS_BIT) == 0)
      #define noncontiguous(M)       (((M)->flags & NONCONTIGUOUS_BIT) != 0)
      #define set_noncontiguous(M)   ((M)->flags |= NONCONTIGUOUS_BIT)
      #define set_contiguous(M)      ((M)->flags &= ~NONCONTIGUOUS_BIT)
  • The mchunkptr bins[NBINS * 2 - 2]; contains pointers to the first and last chunks of the small, large and unsorted bins (the -2 is because the index 0 is not used)

    • Therefore, the first chunk of these bins will have a backwards pointer to this structure and the last chunk of these bins will have a forward pointer to this structure. Which basically means that if you can leak these addresses in the main arena you will have a pointer to the structure in the libc.

  • The structs struct malloc_state *next; and struct malloc_state *next_free; are linked lists os arenas

  • The top chunk is the last "chunk", which is basically all the heap reminding space. Once the top chunk is "empty", the heap is completely used and it needs to request more space.

  • The last reminder chunk comes from cases where an exact size chunk is not available and therefore a bigger chunk is splitter, a pointer remaining part is placed here.

// From

struct malloc_state
  /* Serialize access.  */
  __libc_lock_define (, mutex);

  /* Flags (formerly in max_fast).  */
  int flags;

  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  int have_fastchunks;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;


This structure represents a particular chunk of memory. The various fields have different meaning for allocated and unallocated chunks.

struct malloc_chunk {
  INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk, if it is free. */
  INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead. */
  struct malloc_chunk* fd;                /* double links -- used only if this chunk is free. */
  struct malloc_chunk* bk;
  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if this chunk is free. */
  struct malloc_chunk* bk_nextsize;

typedef struct malloc_chunk* mchunkptr;

As commented previously, these chunks also have some metadata, very good represented in this image:

The metadata is usually 0x08B indicating the current chunk size using the last 3 bits to indicate:

  • A: If 1 it comes from a subheap, if 0 it's in the main arena

  • M: If 1, this chunk is part of a space allocated with mmap and not part of a heap

  • P: If 1, the previous chunk is in use

Then, the space for the user data, and finally 0x08B to indicate the previous chunk size when the chunk is available (or to store user data when it's allocated).

Moreover, when available, the user data is used to contain also some data:

  • fd: Pointer to the next chunk

  • bk: Pointer to the previous chunk

  • fd_nextsize: Pointer to the first chunk in the list is smaller than itself

  • bk_nextsize: Pointer to the first chunk the list that is larger than itself

Note how liking the list this way prevents the need to having an array where every single chunk is being registered.

Chunk Pointers

When malloc is used a pointer to the content that can be written is returned (just after the headers), however, when managing chunks, it's needed a pointer to the begining of the headers (metadata). For these conversions these functions are used:


/* Convert a chunk address to a user mem pointer without correcting the tag.  */
#define chunk2mem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))

/* Convert a user mem pointer to a chunk address and extract the right tag.  */
#define mem2chunk(mem) ((mchunkptr)tag_at (((char*)(mem) - CHUNK_HDR_SZ)))

/* The smallest possible chunk */
#define MIN_CHUNK_SIZE        (offsetof(struct malloc_chunk, fd_nextsize))

/* The smallest size we can malloc is an aligned minimal chunk */

#define MINSIZE  \

Alignment & min size

The pointer to the chunk and 0x0f must be 0.

// From


/* Check if m has acceptable alignment */
#define aligned_OK(m)  (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)

#define misaligned_chunk(p) \
  ((uintptr_t)(MALLOC_ALIGNMENT == CHUNK_HDR_SZ ? (p) : chunk2mem (p)) \

/* pad request bytes into a usable size -- internal version */
/* Note: This must be a macro that evaluates to a compile time constant
   if passed a literal constant.  */
#define request2size(req)                                         \
  (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
   MINSIZE :                                                      \

/* Check if REQ overflows when padded and aligned and if the resulting
   value is less than PTRDIFF_T.  Returns the requested size or
   MINSIZE in case the value is less than MINSIZE, or 0 if any of the
   previous checks fail.  */
static inline size_t
checked_request2size (size_t req) __nonnull (1)
  if (__glibc_unlikely (req > PTRDIFF_MAX))
    return 0;

  /* When using tagged memory, we cannot share the end of the user
     block with the header for the next chunk, so ensure that we
     allocate blocks that are rounded up to the granule size.  Take
     care not to overflow from close to MAX_SIZE_T to a small
     number.  Ideally, this would be part of request2size(), but that
     must be a macro that produces a compile time constant if passed
     a constant literal.  */
  if (__glibc_unlikely (mtag_enabled))
      /* Ensure this is not evaluated if !mtag_enabled, see gcc PR 99551.  */
      asm ("");

      req = (req + (__MTAG_GRANULE_SIZE - 1)) &
	    ~(size_t)(__MTAG_GRANULE_SIZE - 1);

  return request2size (req);

Note that for calculating the total space needed it's only added SIZE_SZ 1 time because the prev_size field can be used to store data, therefore only the initial header is needed.

Get Chunk data and alter metadata

These functions work by receiving a pointer to a chunk and are useful to check/set metadata:

  • Check chunk flags

// From

/* size field is or'ed with PREV_INUSE when previous adjacent chunk in use */
#define PREV_INUSE 0x1

/* extract inuse bit of previous chunk */
#define prev_inuse(p)       ((p)->mchunk_size & PREV_INUSE)

/* size field is or'ed with IS_MMAPPED if the chunk was obtained with mmap() */
#define IS_MMAPPED 0x2

/* check for mmap()'ed chunk */
#define chunk_is_mmapped(p) ((p)->mchunk_size & IS_MMAPPED)

/* size field is or'ed with NON_MAIN_ARENA if the chunk was obtained
   from a non-main arena.  This is only set immediately before handing
   the chunk to the user, if necessary.  */
#define NON_MAIN_ARENA 0x4

/* Check for chunk from main arena.  */
#define chunk_main_arena(p) (((p)->mchunk_size & NON_MAIN_ARENA) == 0)

/* Mark a chunk as not being on the main arena.  */
#define set_non_main_arena(p) ((p)->mchunk_size |= NON_MAIN_ARENA)
  • Sizes and pointers to other chunks

   Bits to mask off when extracting size

   Note: IS_MMAPPED is intentionally not masked off from size field in
   macros for which mmapped chunks should never be seen. This should
   cause helpful core dumps to occur if it is tried by accident by
   people extending or adapting this malloc.

/* Get size, ignoring use bits */
#define chunksize(p) (chunksize_nomask (p) & ~(SIZE_BITS))

/* Like chunksize, but do not mask SIZE_BITS.  */
#define chunksize_nomask(p)         ((p)->mchunk_size)

/* Ptr to next physical malloc_chunk. */
#define next_chunk(p) ((mchunkptr) (((char *) (p)) + chunksize (p)))

/* Size of the chunk below P.  Only valid if !prev_inuse (P).  */
#define prev_size(p) ((p)->mchunk_prev_size)

/* Set the size of the chunk below P.  Only valid if !prev_inuse (P).  */
#define set_prev_size(p, sz) ((p)->mchunk_prev_size = (sz))

/* Ptr to previous physical malloc_chunk.  Only valid if !prev_inuse (P).  */
#define prev_chunk(p) ((mchunkptr) (((char *) (p)) - prev_size (p)))

/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr) (((char *) (p)) + (s)))
  • Insue bit

/* extract p's inuse bit */
#define inuse(p)							      \
  ((((mchunkptr) (((char *) (p)) + chunksize (p)))->mchunk_size) & PREV_INUSE)

/* set/clear chunk as being inuse without otherwise disturbing */
#define set_inuse(p)							      \
  ((mchunkptr) (((char *) (p)) + chunksize (p)))->mchunk_size |= PREV_INUSE

#define clear_inuse(p)							      \
  ((mchunkptr) (((char *) (p)) + chunksize (p)))->mchunk_size &= ~(PREV_INUSE)

/* check/set/clear inuse bits in known places */
#define inuse_bit_at_offset(p, s)					      \
  (((mchunkptr) (((char *) (p)) + (s)))->mchunk_size & PREV_INUSE)

#define set_inuse_bit_at_offset(p, s)					      \
  (((mchunkptr) (((char *) (p)) + (s)))->mchunk_size |= PREV_INUSE)

#define clear_inuse_bit_at_offset(p, s)					      \
  (((mchunkptr) (((char *) (p)) + (s)))->mchunk_size &= ~(PREV_INUSE))
  • Set head and footer (when chunk nos in use

/* Set size at head, without disturbing its use bit */
#define set_head_size(p, s)  ((p)->mchunk_size = (((p)->mchunk_size & SIZE_BITS) | (s)))

/* Set size/use field */
#define set_head(p, s)       ((p)->mchunk_size = (s))

/* Set size at footer (only when chunk is not in use) */
#define set_foot(p, s)       (((mchunkptr) ((char *) (p) + (s)))->mchunk_prev_size = (s))
  • Get the size of the real usable data inside the chunk

#pragma GCC poison mchunk_size
#pragma GCC poison mchunk_prev_size

/* This is the size of the real usable data in the chunk.  Not valid for
   dumped heap chunks.  */
#define memsize(p)                                                    \
  (__MTAG_GRANULE_SIZE > SIZE_SZ && __glibc_unlikely (mtag_enabled) ? \
    chunksize (p) - CHUNK_HDR_SZ :                                    \
    chunksize (p) - CHUNK_HDR_SZ + (chunk_is_mmapped (p) ? 0 : SIZE_SZ))

/* If memory tagging is enabled the layout changes to accommodate the granule
   size, this is wasteful for small allocations so not done by default.
   Both the chunk header and user data has to be granule aligned.  */
_Static_assert (__MTAG_GRANULE_SIZE <= CHUNK_HDR_SZ,
		"memory tagging is not supported with large granule.");

static __always_inline void *
tag_new_usable (void *ptr)
  if (__glibc_unlikely (mtag_enabled) && ptr)
      mchunkptr cp = mem2chunk(ptr);
      ptr = __libc_mtag_tag_region (__libc_mtag_new_tag (ptr), memsize (cp));
  return ptr;


Quick Heap Example

Quick heap example from but in arm64:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main(void)
    char *ptr;
    ptr = malloc(0x10);
    strcpy(ptr, "panda");

Set a breakpoint at the end of the main function and lets find out where the information was stored:

It's possible to see that the string panda was stored at 0xaaaaaaac12a0 (which was the address given as response by malloc inside x0). Checking 0x10 bytes before it's possible to see that the 0x0 represents that the previous chunk is not used (length 0) and that the length of this chunk is 0x21.

The extra spaces reserved (0x21-0x10=0x11) comes from the added headers (0x10) and 0x1 doesn't mean that it was reserved 0x21B but the last 3 bits of the length of the current headed have the some special meanings. As the length is always 16-byte aligned (in 64bits machines), these bits are actually never going to be used by the length number.

0x1:     Previous in Use     - Specifies that the chunk before it in memory is in use
0x2:     Is MMAPPED          - Specifies that the chunk was obtained with mmap()
0x4:     Non Main Arena      - Specifies that the chunk was obtained from outside of the main arena

Multithreading Example

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* threadFuncMalloc(void* arg) {
    printf("Hello from thread 1\n");
    char* addr = (char*) malloc(1000);
    printf("After malloc and before free in thread 1\n");
    printf("After free in thread 1\n");

void* threadFuncNoMalloc(void* arg) {
    printf("Hello from thread 2\n");

int main() {
    pthread_t t1;
    void* s;
    int ret;
    char* addr;

    printf("Before creating thread 1\n");
    ret = pthread_create(&t1, NULL, threadFuncMalloc, NULL);

    printf("Before creating thread 2\n");
    ret = pthread_create(&t1, NULL, threadFuncNoMalloc, NULL);

    printf("Before exit\n");

    return 0;

Debugging the previous example it's possible to see how at the beginning there is only 1 arena:

Then, after calling the first thread, the one that calls malloc, a new arena is created:

and inside of it some chunks can be found: