Bins & Memory Allocations

学习并练习 AWS 黑客技能：HackTricks 培训 AWS 红队专家 (ARTE) 学习并练习 GCP 黑客技能：HackTricks 培训 GCP 红队专家 (GRTE)

支持 HackTricks

查看订阅计划!
加入 💬 Discord 群组 或 电报群组 或关注我们的 Twitter 🐦 @hacktricks_live.
通过向 HackTricks 和 HackTricks Cloud github 仓库提交 PR 来分享黑客技巧。

基本信息

为了提高块存储的效率，每个块不仅仅在一个链接列表中，而是有几种类型。这些是 bins，有 5 种类型的 bins：62 小型 bins，63 大型 bins，1 未排序 bin，10 快速 bins 和每个线程 64 个 tcache bins。

每个未排序、小型和大型 bins 的初始地址都在同一个数组内。索引 0 未使用，1 是未排序 bin，bins 2-64 是小型 bins，bins 65-127 是大型 bins。

Tcache（每线程缓存）Bins

尽管线程尝试拥有自己的堆（参见 Arenas 和 Subheaps），但存在一个进程有大量线程（如 Web 服务器）最终会与其他线程共享堆的可能性。在这种情况下，主要解决方案是使用锁，这可能会显著减慢线程。

因此，tcache 类似于每个线程的快速 bin，它是一个单链表，不合并块。每个线程有64 个单链 tcache bins。每个 bin 可以有最多7 个相同大小的块，范围从24 到 1032 字节（64 位系统）和 12 到 516 字节（32 位系统）。

当一个线程释放一个块时，如果它不太大以至于无法在 tcache 中分配，并且相应的 tcache bin 未满（已有 7 个块），它将被分配到那里。如果无法进入 tcache，则需要等待堆锁以执行全局释放操作。

当分配一个块时，如果在Tcache 中有所需大小的空闲块，它将使用它，否则，需要等待堆锁以在全局 bins 中找到一个或创建一个新的。还有一个优化，在这种情况下，持有堆锁时，线程将用请求的大小填充他的 Tcache 中的堆块（7 个），因此，如果需要更多，它将在 Tcache 中找到它们。

添加一个 tcache 块示例

```c #include #include

int main(void) { char *chunk; chunk = malloc(24); printf("Address of the chunk: %p\n", (void *)chunk); gets(chunk); free(chunk); return 0; }

编译并在主函数的ret操作码处设置断点进行调试。然后使用gef工具，您可以看到正在使用的tcache bin：
```bash
gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=0, size=0x20, count=1] ←  Chunk(addr=0xaaaaaaac12a0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)

Tcache 结构和函数

在下面的代码中，可以看到最大 bin和每个索引的块数，创建的**tcache_entry结构用于避免双重释放，以及tcache_perthread_struct**，每个线程使用的结构来存储 bin 的每个索引的地址。

tcache_entry 和 tcache_perthread_struct

```c // From https://github.com/bminor/glibc/blob/f942a732d37a96217ef828116ebe64a644db18d7/malloc/malloc.c

/* We want 64 entries. This is an arbitrary limit, which tunables can reduce. */

define TCACHE_MAX_BINS 64

define MAX_TCACHE_SIZE tidx2usize (TCACHE_MAX_BINS-1)

/* Only used to pre-fill the tunables. */

define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)

/* When "x" is from chunksize(). */

define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)

/* When "x" is a user-provided size. */

define usize2tidx(x) csize2tidx (request2size (x))

/* With rounding and alignment, the bins are... idx 0 bytes 0..24 (64-bit) or 0..12 (32-bit) idx 1 bytes 25..40 or 13..20 idx 2 bytes 41..56 or 21..28 etc. */

/* This is another arbitrary limit, which tunables can change. Each tcache bin will hold at most this number of chunks. */

define TCACHE_FILL_COUNT 7

/* Maximum chunks in tcache bins for tunables. This value must fit the range of tcache->counts[] entries, else they may overflow. */

define MAX_TCACHE_COUNT UINT16_MAX

[...]

typedef struct tcache_entry { struct tcache_entry next; / This field exists to detect double frees. */ uintptr_t key; } tcache_entry;

/* There is one of these for each thread, which contains the per-thread cache (hence "tcache_perthread_struct"). Keeping overall size low is mildly important. Note that COUNTS and ENTRIES are redundant (we could have just counted the linked list each time), this is for performance reasons. */ typedef struct tcache_perthread_struct { uint16_t counts[TCACHE_MAX_BINS]; tcache_entry *entries[TCACHE_MAX_BINS]; } tcache_perthread_struct;

</details>

函数`__tcache_init`是创建并为`tcache_perthread_struct`对象分配空间的函数

<details>

<summary>tcache_init 代码</summary>
```c
// From https://github.com/bminor/glibc/blob/f942a732d37a96217ef828116ebe64a644db18d7/malloc/malloc.c#L3241C1-L3274C2

static void
tcache_init(void)
{
mstate ar_ptr;
void *victim = 0;
const size_t bytes = sizeof (tcache_perthread_struct);

if (tcache_shutting_down)
return;

arena_get (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
if (!victim && ar_ptr != NULL)
{
ar_ptr = arena_get_retry (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
}


if (ar_ptr != NULL)
__libc_lock_unlock (ar_ptr->mutex);

/* In a low memory situation, we may not be able to allocate memory
- in which case, we just keep trying later.  However, we
typically do this very early, so either there is sufficient
memory, or there isn't enough memory to do non-trivial
allocations anyway.  */
if (victim)
{
tcache = (tcache_perthread_struct *) victim;
memset (tcache, 0, sizeof (tcache_perthread_struct));
}

}

Tcache索引

Tcache有几个bin，取决于大小和每个索引的第一个chunk的初始指针以及每个索引的chunk数量都位于一个chunk内。这意味着定位具有此信息的chunk（通常是第一个chunk），可以找到所有tcache的初始点和Tcache chunk的数量。

快速bins

快速bins旨在通过将最近释放的chunks保留在一个快速访问结构中，加快小块内存分配的速度。这些bins使用后进先出（LIFO）的方法，这意味着最近释放的chunk是首先在有新的分配请求时被重用的。这种行为对于速度是有利的，因为与队列（FIFO）相比，从栈的顶部（LIFO）插入和删除更快。

此外，快速bins使用单链表，而不是双链表，这进一步提高了速度。由于快速bins中的chunks不会与相邻的chunks合并，所以不需要一个复杂的结构来允许从中间删除。单链表对于这些操作来说更简单更快。

基本上，在这里发生的情况是头部（指向要检查的第一个chunk的指针）始终指向该大小的最新释放的chunk。所以：

当分配该大小的新chunk时，头部指向一个可用的自由chunk。由于这个自由chunk指向下一个要使用的chunk，所以该地址存储在头部中，以便下一个分配知道从哪里获取一个可用的chunk。
当释放一个chunk时，自由的chunk将保存当前可用chunk的地址，并且这个新释放的chunk的地址将放入头部中。

链表的最大大小为0x80，它们被组织成大小为0x20的chunk将位于索引0，大小为0x30的chunk将位于索引1...

快速bins中的chunks未设置为可用，因此它们在一段时间内保持为快速bin chunks，而不是能够与周围的其他自由chunks合并。

// From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711

/*
Fastbins

An array of lists holding recently freed small chunks.  Fastbins
are not doubly linked.  It is faster to single-link them, and
since chunks are never removed from the middles of these lists,
double linking is not necessary. Also, unlike regular bins, they
are not even processed in FIFO order (they use faster LIFO) since
ordering doesn't much matter in the transient contexts in which
fastbins are normally used.

Chunks in fastbins keep their inuse bit set, so they cannot
be consolidated with other free chunks. malloc_consolidate
releases all chunks in fastbins and consolidates them with
other free chunks.
*/

typedef struct malloc_chunk *mfastbinptr;
#define fastbin(ar_ptr, idx) ((ar_ptr)->fastbinsY[idx])

/* offset 2 to use otherwise unindexable first 2 bins */
#define fastbin_index(sz) \
((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)


/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE     (80 * SIZE_SZ / 4)

#define NFASTBINS  (fastbin_index (request2size (MAX_FAST_SIZE)) + 1)

添加一个fastbin块示例

```c #include #include

int main(void) { char *chunks[8]; int i;

// Loop to allocate memory 8 times for (i = 0; i < 8; i++) { chunks[i] = malloc(24); if (chunks[i] == NULL) { // Check if malloc failed fprintf(stderr, "Memory allocation failed at iteration %d\n", i); return 1; } printf("Address of chunk %d: %p\n", i, (void *)chunks[i]); }

// Loop to free the allocated memory for (i = 0; i < 8; i++) { free(chunks[i]); }

return 0; }

注意我们如何分配和释放8个相同大小的块，使它们填满tcache，并且第八个存储在快速块中。

编译并在`main`函数的`ret`操作码处设置断点进行调试。然后使用`gef`，您可以看到tcache bin已满，一个块在快速bin中：
```bash
gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=0, size=0x20, count=7] ←  Chunk(addr=0xaaaaaaac1770, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1750, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1730, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1710, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac16f0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac16d0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac12a0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20]  ←  Chunk(addr=0xaaaaaaac1790, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
Fastbins[idx=1, size=0x30] 0x00

未排序 bin

未排序 bin 是堆管理器用来加快内存分配速度的缓存。它的工作原理如下：当程序释放一个块时，如果这个块不能在 tcache 或 fast bin 中分配，并且不与顶部块发生冲突，堆管理器不会立即将其放入特定的小型或大型 bin 中。相反，它首先尝试与任何相邻的空闲块合并，以创建一个更大的空闲内存块。然后，它将这个新块放入一个名为“未排序 bin”的通用 bin 中。

当程序请求内存时，堆管理器会检查未排序 bin，看看是否有足够大小的块。如果找到一个，它会立即使用。如果在未排序 bin 中找不到合适的块，它会将此列表中的所有块移动到它们相应的 bin 中，无论是小型还是大型，都基于它们的大小。

请注意，如果一个较大的块被分成两半，剩下的部分大于 MINSIZE，它将被放回未排序 bin 中。

因此，未排序 bin 是一种通过快速重用最近释放的内存来加快内存分配速度的方法，从而减少了耗时的搜索和合并的需求。

请注意，即使块属于不同的类别，如果一个可用块与另一个可用块发生冲突（即使它们最初属于不同的 bin），它们将被合并。

添加一个未排序块示例

```c #include #include

int main(void) { char *chunks[9]; int i;

// Loop to allocate memory 8 times for (i = 0; i < 9; i++) { chunks[i] = malloc(0x100); if (chunks[i] == NULL) { // Check if malloc failed fprintf(stderr, "Memory allocation failed at iteration %d\n", i); return 1; } printf("Address of chunk %d: %p\n", i, (void *)chunks[i]); }

// Loop to free the allocated memory for (i = 0; i < 8; i++) { free(chunks[i]); }

return 0; }

注意我们如何分配和释放9个相同大小的块，使它们填满了tcache，第八个块存储在未排序的bin中，因为它对于fastbin来说太大了，第九个块没有被释放，所以第九个和第八个块不会与顶部块合并。

编译并在`main`函数的`ret`操作码处设置断点进行调试。然后使用`gef`，您可以看到tcache bin已满，一个块在未排序的bin中：
```bash
gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=15, size=0x110, count=7] ←  Chunk(addr=0xaaaaaaac1d10, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1c00, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1af0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac19e0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac18d0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac17c0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac12a0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20] 0x00
Fastbins[idx=1, size=0x30] 0x00
Fastbins[idx=2, size=0x40] 0x00
Fastbins[idx=3, size=0x50] 0x00
Fastbins[idx=4, size=0x60] 0x00
Fastbins[idx=5, size=0x70] 0x00
Fastbins[idx=6, size=0x80] 0x00
─────────────────────────────────────────────────────────────────────── Unsorted Bin for arena at 0xfffff7f90b00 ───────────────────────────────────────────────────────────────────────
[+] unsorted_bins[0]: fw=0xaaaaaaac1e10, bk=0xaaaaaaac1e10
→   Chunk(addr=0xaaaaaaac1e20, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[+] Found 1 chunks in unsorted bin.

Small Bins

小型块比大型块更快，但比快速块慢。

62个小型块中的每个块都将具有相同大小的块：16、24、...（在32位系统中最大为504字节，在64位系统中为1024字节）。这有助于加快查找应分配空间的块所在的块以及在这些列表上插入和删除条目的速度。

这是根据块的索引计算小型块大小的方法：

最小大小：2*4*索引（例如，索引5 -> 40）
最大大小：2*8*索引（例如，索引5 -> 80）

// From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711
#define NSMALLBINS         64
#define SMALLBIN_WIDTH    MALLOC_ALIGNMENT
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > CHUNK_HDR_SZ)
#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)

#define in_smallbin_range(sz)  \
((unsigned long) (sz) < (unsigned long) MIN_LARGE_SIZE)

#define smallbin_index(sz) \
((SMALLBIN_WIDTH == 16 ? (((unsigned) (sz)) >> 4) : (((unsigned) (sz)) >> 3))\
+ SMALLBIN_CORRECTION)

Function to choose between small and large bins:

选择小块和大块之间的函数：

#define bin_index(sz) \
((in_smallbin_range (sz)) ? smallbin_index (sz) : largebin_index (sz))

添加一个小块示例

```c #include #include

int main(void) { char *chunks[10]; int i;

// Loop to free the allocated memory for (i = 0; i < 8; i++) { free(chunks[i]); }

chunks[9] = malloc(0x110);

return 0; }

注意我们如何分配和释放9个相同大小的块，以便它们填满tcache，并且第八个存储在未排序的bin中，因为它对于fastbin来说太大了，第九个没有被释放，所以第九个和第八个不会与顶部块合并。然后我们分配一个更大的0x110块，这使得未排序bin中的块进入small bin。

编译它并在`main`函数的`ret`操作码处设置断点进行调试。然后使用`gef`，您可以看到tcache bin已满，一个块在small bin中：
```bash
gef➤  heap bins
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
Tcachebins[idx=15, size=0x110, count=7] ←  Chunk(addr=0xaaaaaaac1d10, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1c00, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac1af0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac19e0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac18d0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac17c0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  Chunk(addr=0xaaaaaaac12a0, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20] 0x00
Fastbins[idx=1, size=0x30] 0x00
Fastbins[idx=2, size=0x40] 0x00
Fastbins[idx=3, size=0x50] 0x00
Fastbins[idx=4, size=0x60] 0x00
Fastbins[idx=5, size=0x70] 0x00
Fastbins[idx=6, size=0x80] 0x00
─────────────────────────────────────────────────────────────────────── Unsorted Bin for arena at 0xfffff7f90b00 ───────────────────────────────────────────────────────────────────────
[+] Found 0 chunks in unsorted bin.
──────────────────────────────────────────────────────────────────────── Small Bins for arena at 0xfffff7f90b00 ────────────────────────────────────────────────────────────────────────
[+] small_bins[16]: fw=0xaaaaaaac1e10, bk=0xaaaaaaac1e10
→   Chunk(addr=0xaaaaaaac1e20, size=0x110, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[+] Found 1 chunks in 1 small non-empty bins.

大型 bins

与管理固定大小的块的小型 bins 不同，每个大型 bin 处理一系列块大小。这样更加灵活，允许系统在不需要为每个大小单独设置 bin 的情况下容纳各种大小。

在内存分配器中，大型 bins 从小型 bins 结束的地方开始。大型 bins 的范围逐渐增大，意味着第一个 bin 可能覆盖从 512 到 576 字节的块，而下一个覆盖 576 到 640 字节。这种模式继续下去，最大的 bin 包含所有大于 1MB 的块。

与小型 bins 相比，大型 bins 操作速度较慢，因为它们必须对各种块大小的列表进行排序和搜索，以找到最佳匹配进行分配。当一个块被插入到大型 bin 中时，它必须被排序，当分配内存时，系统必须找到正确的块。这额外的工作使它们更慢，但由于大型分配比小型分配更少见，这是一个可以接受的折衷。

有：

32 个 64B 范围的 bins（与小型 bins 冲突）
16 个 512B 范围的 bins（与小型 bins 冲突）
8 个 4096B 范围的 bins（部分与小型 bins 冲突）
4 个 32768B 范围的 bins
2 个 262144B 范围的 bins
1 个用于剩余大小的 bin

大型 bin 大小代码

```c // From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711

#define largebin_index_32(sz) (((((unsigned long) (sz)) >> 6) <= 38) ? 56 + (((unsigned long) (sz)) >> 6) : ((((unsigned long) (sz)) >> 9) <= 20) ? 91 + (((unsigned long) (sz)) >> 9) : ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) : ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) : ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) : 126)

#define largebin_index_32_big(sz) (((((unsigned long) (sz)) >> 6) <= 45) ? 49 + (((unsigned long) (sz)) >> 6) : ((((unsigned long) (sz)) >> 9) <= 20) ? 91 + (((unsigned long) (sz)) >> 9) : ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) : ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) : ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) : 126)

// XXX It remains to be seen whether it is good to keep the widths of // XXX the buckets the same or whether it should be scaled by a factor // XXX of two as well. #define largebin_index_64(sz) (((((unsigned long) (sz)) >> 6) <= 48) ? 48 + (((unsigned long) (sz)) >> 6) : ((((unsigned long) (sz)) >> 9) <= 20) ? 91 + (((unsigned long) (sz)) >> 9) : ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) : ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) : ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) : 126)

#define largebin_index(sz) (SIZE_SZ == 8 ? largebin_index_64 (sz) : MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz) : largebin_index_32 (sz))

</details>

<details>

<summary>添加一个大块示例</summary>
```c
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
char *chunks[2];

chunks[0] = malloc(0x1500);
chunks[1] = malloc(0x1500);
free(chunks[0]);
chunks[0] = malloc(0x2000);

return 0;
}

2个大型分配被执行，然后一个被释放（将其放入未排序的bin），然后进行更大的分配（将空闲的块从未排序的bin移动到大型bin）。

编译并在main函数的ret操作码处设置断点进行调试。然后使用gef，您可以看到tcache bin已满，一个块在大型bin中：

gef➤  heap bin
──────────────────────────────────────────────────────────────────────────────── Tcachebins for thread 1 ────────────────────────────────────────────────────────────────────────────────
All tcachebins are empty
───────────────────────────────────────────────────────────────────────── Fastbins for arena at 0xfffff7f90b00 ─────────────────────────────────────────────────────────────────────────
Fastbins[idx=0, size=0x20] 0x00
Fastbins[idx=1, size=0x30] 0x00
Fastbins[idx=2, size=0x40] 0x00
Fastbins[idx=3, size=0x50] 0x00
Fastbins[idx=4, size=0x60] 0x00
Fastbins[idx=5, size=0x70] 0x00
Fastbins[idx=6, size=0x80] 0x00
─────────────────────────────────────────────────────────────────────── Unsorted Bin for arena at 0xfffff7f90b00 ───────────────────────────────────────────────────────────────────────
[+] Found 0 chunks in unsorted bin.
──────────────────────────────────────────────────────────────────────── Small Bins for arena at 0xfffff7f90b00 ────────────────────────────────────────────────────────────────────────
[+] Found 0 chunks in 0 small non-empty bins.
──────────────────────────────────────────────────────────────────────── Large Bins for arena at 0xfffff7f90b00 ────────────────────────────────────────────────────────────────────────
[+] large_bins[100]: fw=0xaaaaaaac1290, bk=0xaaaaaaac1290
→   Chunk(addr=0xaaaaaaac12a0, size=0x1510, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[+] Found 1 chunks in 1 large non-empty bins.

顶部块

// From https://github.com/bminor/glibc/blob/a07e000e82cb71238259e674529c37c12dc7d423/malloc/malloc.c#L1711

/*
Top

The top-most available chunk (i.e., the one bordering the end of
available memory) is treated specially. It is never included in
any bin, is used only if no other chunk is available, and is
released back to the system if it is very large (see
M_TRIM_THRESHOLD).  Because top initially
points to its own bin with initial zero size, thus forcing
extension on the first malloc request, we avoid having any special
code in malloc to check whether it even exists yet. But we still
need to do so when getting memory from system, so we make
initial_top treat the bin as a legal but unusable chunk during the
interval between initialization and the first call to
sysmalloc. (This is somewhat delicate, since it relies on
the 2 preceding words to be zero during this interval as well.)
*/

/* Conveniently, the unsorted bin can be used as dummy top on first call */
#define initial_top(M)              (unsorted_chunks (M))

基本上，这是一个包含当前所有可用堆的块。当执行malloc时，如果没有可用的空闲块可用，这个顶部块将减小其大小，提供必要的空间。顶部块的指针存储在malloc_state结构中。

此外，在开始时，可以将未排序的块用作顶部块。

观察顶部块示例

```c #include #include

int main(void) { char *chunk; chunk = malloc(24); printf("Address of the chunk: %p\n", (void *)chunk); gets(chunk); return 0; }

在将其编译和调试后，在`main`函数的`ret`操作码处设置断点，我发现malloc返回了地址`0xaaaaaaac12a0`，以下是这些块：
```bash
gef➤  heap chunks
Chunk(addr=0xaaaaaaac1010, size=0x290, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[0x0000aaaaaaac1010     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................]
Chunk(addr=0xaaaaaaac12a0, size=0x20, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[0x0000aaaaaaac12a0     41 41 41 41 41 41 41 00 00 00 00 00 00 00 00 00    AAAAAAA.........]
Chunk(addr=0xaaaaaaac12c0, size=0x410, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[0x0000aaaaaaac12c0     41 64 64 72 65 73 73 20 6f 66 20 74 68 65 20 63    Address of the c]
Chunk(addr=0xaaaaaaac16d0, size=0x410, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
[0x0000aaaaaaac16d0     41 41 41 41 41 41 41 0a 00 00 00 00 00 00 00 00    AAAAAAA.........]
Chunk(addr=0xaaaaaaac1ae0, size=0x20530, flags=PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)  ←  top chunk

在地址0xaaaaaaac1ae0处可以看到顶部块的存在。这并不奇怪，因为最后分配的块位于0xaaaaaaac12a0，大小为0x410，因此0xaaaaaaac12a0 + 0x410 = 0xaaaaaaac1ae0。还可以在块头部看到顶部块的长度：

gef➤  x/8wx 0xaaaaaaac1ae0 - 16
0xaaaaaaac1ad0:	0x00000000	0x00000000	0x00020531	0x00000000
0xaaaaaaac1ae0:	0x00000000	0x00000000	0x00000000	0x00000000