再次尝试 Fuzzing —— JerryScript
开始之前
这段时间本来想研究下 Chrome V8,学了一段时间发现 V8 还是太吃操作了……感觉应该先了解下比较简单的 JS 引擎。于是想着先从适合嵌入式设备的轻量 JS 引擎 JerryScript 开始玩起。正好看到 JerryScript 的 Issues 有好多关于漏洞的报告(无人在意说是),那就复现一下 fuzzing 漏洞挖掘吧。
源码与编译
git clone https://github.com/jerryscript-project/jerryscript
cd jerryscript
python tools/build.py
编译 JerryScript 还是相当简单的,要想 fuzz 它,我们可以直接让 AFL 将文件作为参数传入然后等待崩溃。但是这样的 fuzz 是没有意义的,因为没有经过 AFL instruction。我们需要使用 afl-clang-lto 作为编译器。有关 AFL 的用法和原理,前人之述备矣,我就不赘述了。
JerryScript 已经在 tools/build.py
为我们准备好了接入 libfuzzer 的编译选项,而 AFL 支持为 libfuzzer sanitized binary 启用 persistent mode。那么就用现成的就好。
CC=afl-clang-lto python tools/build.py --libfuzzer=ON --compile-flag='-Wno-enum-enum-conversion' --strip=OFF
CC=afl-clang-lto AFL_LLVM_CMPLOG=1 python tools/build.py --libfuzzer=ON --compile-flag='-Wno-enum-enum
-conversion -fsanitize=address' --strip=OFF
我们需要添加 -Wno-enum-enum-conversion
编译参数来防止高版本 clang 编译不通过。(如果要用高版本 gcc 编译的话,还需要添加 -Wno-unterminated-string-initialization
,因为 jerry-core/ecma/builtin-objects/ecma-builtin-helpers-date.c
中的 day_names_p
和 month_names_p
没有考虑 C-style 字符串字面量 tailing NULL byte 占用的空间。)
准备初始 corpus
作为实验,我没有考虑太多,选用 test262 作为 JS 样本,去除其中的注释,就直接作为初始 corpus 了。我选用 AFL 作为 fuzzing 引擎。这对于 JS 引擎而言,效果不会好,但本来也只是实验性质的尝试。AFL 在 fuzz 过程中会根据这些文件不断通过各种策略构造新的输入,收集对于每个输入程序执行后的覆盖率,继续构造新的输入。
import os
import shutil
import subprocess
TEST262_REPO = "https://github.com/tc39/test262.git"
CLONE_DIR = "test262"
CORPUS_DIR = "corpus"
NUM_FILES = 100 # Adjust how many files you want
# Directories considered ES5 core tests
ES5_TEST_DIRS = [
"test/built-ins",
"test/language",
"test/statements",
"test/annexB"
]
def clone_test262():
if not os.path.exists(CLONE_DIR):
print("Cloning test262 repo...")
subprocess.run(["git", "clone", TEST262_REPO], check=True)
else:
print("test262 repo already cloned.")
def gather_es5_js_files():
js_files = []
for root, _, files in os.walk(CLONE_DIR):
# Check if the file is inside one of the ES5 directories
if any(es5_dir in root.replace("\\", "/") for es5_dir in ES5_TEST_DIRS):
for file in files:
if file.endswith(".js"):
js_files.append(os.path.join(root, file))
return js_files
def prepare_corpus(js_files):
os.makedirs(CORPUS_DIR, exist_ok=True)
selected_files = js_files[:NUM_FILES]
print(f"Copying {len(selected_files)} files to corpus directory...")
existing_names = set()
for path in selected_files:
filename = os.path.basename(path)
name, ext = os.path.splitext(filename)
# Avoid duplicates by renaming with suffix if needed
original_filename = filename
suffix = 1
while filename in existing_names:
filename = f"{name}_{suffix}{ext}"
suffix += 1
existing_names.add(filename)
shutil.copy(path, os.path.join(CORPUS_DIR, filename))
print("Corpus preparation complete.")
if __name__ == "__main__":
clone_test262()
all_js_files = gather_es5_js_files()
if len(all_js_files) == 0:
print("No ES5 JS files found in test262 repo!")
else:
prepare_corpus(all_js_files)
fuzzing
afl-fuzz -i input -o output -b 2 -a text -M master -- ./jerry-libfuzzer
AFL_USE_ASAN=1 afl-fuzz -i input -o output -b 4 -a text -S sanitizer -c 0 -l 2AT -P exploit -p exploit -- ./jerry-libfuzzer
很快就发生了 crash。可以看到 AFL 构造的 JS 输入和乱码真的没区别了。也就是说 JerryScript 在语法分析甚至词法分析阶段就可能崩溃,发生段错误。
结果处理
虽然听起来有点离谱,但是挂机一天后 AFL 收集到了 543 个 crashes。但其中大多数都是 null pointer deref。所以我决定简单筛选一下无效的 crashes。使用 Python gdb
模块批量调试 crash inputs,段错误后先提取产生段错误位置的汇编指令,找到解引用 [reg + offset]
(寄存器间接寻址)处使用的寄存器,然后再让 gdb 查询这个寄存器的值,如果值为很大的数则将这个 input 另存起来。
import gdb
import os
import shlex
import shutil
import re
from pathlib import Path
# ====== Configuration ======
CRASH_DIR = Path("./crashes")
VALID_DIR = Path("./valid")
LOG_DIR = Path("./logs")
MODE = "copy" # "copy" or "link"
PATTERN = "cafebabe" # if NOT found in crash bt/output -> save to VALID_DIR
USE_STDIN = False # If True, run "run < file" to feed the file on stdin
# Note: timeouts are not enforced inside gdb-embedded script; if you need per-run
# timeouts, run gdb under an external timeout wrapper (e.g. GNU timeout) or use
# the external/python+subprocess approach.
# ===========================
CRASH_DIR = CRASH_DIR.resolve()
VALID_DIR = VALID_DIR.resolve()
LOG_DIR = LOG_DIR.resolve()
x86_64_registers = [
"rax", "rbx", "rcx", "rdx",
"rsp", "rbp", "rsi", "rdi",
"r8", "r9", "r10", "r11",
"r12", "r13", "r14", "r15"
]
for d in (VALID_DIR, LOG_DIR):
d.mkdir(parents=True, exist_ok=True)
# helper: unique destination path (avoid overwriting)
def unique_dest(dest: Path) -> Path:
if not dest.exists():
return dest
i = 1
while True:
candidate = dest.with_name(dest.name + f".{i}")
if not candidate.exists():
return candidate
i += 1
def install_file(src: Path) -> Path:
dest = VALID_DIR / src.name
dest = unique_dest(dest)
if MODE == "link":
# try symlink to absolute path
try:
os.symlink(str(src.resolve()), str(dest))
except OSError:
shutil.copy2(src, dest)
else:
shutil.copy2(src, dest)
return dest
CRASH_PATTERNS = [
r"Program received signal",
r"SIGSEGV",
r"SIGABRT",
r"Segmentation fault",
r"SIGILL",
r"SIGFPE",
r"^#0", # backtrace frame 0
r"AddressSanitizer",
r"ASAN:",
r"terminate called",
]
_crash_re = re.compile("|".join("(?:" + p + ")" for p in CRASH_PATTERNS), flags=re.I | re.M)
def detect_crash(text: str) -> bool:
return bool(_crash_re.search(text))
# Turn off pagination so gdb.execute(..., to_string=True) returns full text
try:
gdb.execute("set pagination off")
except Exception:
pass
# The program to run is the one passed with --args ./jerry when launching gdb.
# gdb already knows the executable from --args; we will just set program args each run.
files = sorted([p for p in CRASH_DIR.iterdir() if p.is_file()])
summary = {"processed": 0, "crashes": 0, "saved": 0, "no_crash": 0}
for infile in files:
summary["processed"] += 1
name = infile.name
logfile = LOG_DIR / (name + ".log")
print("---- Processing:", name)
# Set args or use stdin redirection
if USE_STDIN:
# clear any args (not necessary, but explicit)
try:
gdb.execute("set args")
except Exception:
pass
run_cmd = "run < " + shlex.quote(str(infile))
else:
# set argv for the debugged program to the filename
# (if your program accepts multiple args, adjust as needed)
try:
gdb.execute("set args " + shlex.quote(str(infile)))
except Exception:
pass
run_cmd = "run"
# Execute run and capture textual output
try:
out_run = gdb.execute(run_cmd, to_string=True)
except gdb.error as e:
# gdb.error may be thrown if the program exited in a way gdb treats specially;
# capture the string representation and continue to collect bt below.
out_run = str(e)
# After run, collect a backtrace (best-effort)
try:
out_bt = gdb.execute("bt full", to_string=True)
except Exception:
try:
out_bt = gdb.execute("bt", to_string=True)
except Exception:
out_bt = ""
combined = out_run + "\n" + out_bt
# Save log
with logfile.open("w", encoding="utf-8", errors="replace") as f:
f.write("COMMAND: " + run_cmd + "\n\n")
f.write("=== RUN OUTPUT ===\n")
f.write(out_run + "\n\n")
f.write("=== BACKTRACE ===\n")
f.write(out_bt + "\n")
# Detect crash
if detect_crash(combined):
summary["crashes"] += 1
crash_line = gdb.execute('x/i $rip', to_string=True)
valid = False
if "[" not in crash_line:
continue
for reg in x86_64_registers:
if reg in crash_line[crash_line.index("["):crash_line.index("]")] and int(gdb.execute(f"p ${reg}", to_string=True).split(' ')[-1], 16) > 8:
valid = True
if not valid:
continue
print(" -> Valid crash detected. Log:", logfile)
if PATTERN.lower() in combined.lower():
print(f" -> pattern '{PATTERN}' FOUND in backtrace/output. Not saving.")
else:
dest = install_file(infile)
summary["saved"] += 1
print(f" -> pattern '{PATTERN}' NOT found. Saved to:", dest)
else:
summary["no_crash"] += 1
print(" -> No crash detected. Log:", logfile)
# Attempt to kill inferior if still running so we can restart cleanly next time
try:
gdb.execute("kill", to_string=True)
except Exception:
# ignore; keep going
pass
# Final summary
print("\nDone.")
print("Summary:")
for k, v in summary.items():
print(f" {k}: {v}")
print("Logs:", LOG_DIR)
print("Valid candidates:", VALID_DIR)
# End of gdb_run.py
经过筛选后,我发现了一个很有意思的崩溃:
$ ./jerry-asan /storage/jsfuzz/valid/id:000005,sig:11,src:005743,time:469380,execs:12877861,op:havo
c,rep:4
=================================================================
==1365920==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7b6b9b700098 at pc 0x558aff052c4d bp 0x7ffcb9f80e60 sp 0x7ffcb9f80e50
READ of size 1 at 0x7b6b9b700098 thread T0
#0 0x558aff052c4c in scanner_create_variables (/storage/jsfuzz/jerry-asan+0x78c4c) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#1 0x558aff0551bc in parser_parse_function_arguments.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x7b1bc) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#2 0x558aff0585c8 in parser_parse_function (/storage/jsfuzz/jerry-asan+0x7e5c8) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#3 0x558aff0a26bc in lexer_construct_function_object (/storage/jsfuzz/jerry-asan+0xc86bc) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#4 0x558aff0a6a77 in parser_parse_class (/storage/jsfuzz/jerry-asan+0xcca77) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#5 0x558aff0b6198 in parser_parse_statements (/storage/jsfuzz/jerry-asan+0xdc198) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#6 0x558aff057d49 in parser_parse_source.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x7dd49) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#7 0x558aff008764 in jerry_parse_common.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x2e764) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#8 0x558aff0bf0bc in jerryx_source_parse_script (/storage/jsfuzz/jerry-asan+0xe50bc) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#9 0x558afeff6be3 in main (/storage/jsfuzz/jerry-asan+0x1cbe3) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
#10 0x7f6b9da27674 (/usr/lib/libc.so.6+0x27674) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
#11 0x7f6b9da27728 in __libc_start_main (/usr/lib/libc.so.6+0x27728) (BuildId: 4fe011c94a88e8aeb6f2201b9eb369f42b4a1e9e)
#12 0x558afeff72e4 in _start (/storage/jsfuzz/jerry-asan+0x1d2e4) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
Address 0x7b6b9b700098 is located in stack of thread T0 at offset 152 in frame
#0 0x558aff055ffe in parser_parse_source.lto_priv.0 (/storage/jsfuzz/jerry-asan+0x7bffe) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b)
This frame has 6 object(s):
[32, 33) 'flags' (line 2041)
[48, 49) 'flags' (line 2063)
[64, 80) 'branch' (line 2253)
[96, 112) 'literal'
[128, 152) 'scanner_info_end' (line 2115) <== Memory access at offset 152 overflows this variable
[192, 792) 'context' (line 1988)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/storage/jsfuzz/jerry-asan+0x78c4c) (BuildId: 85560800a62467c72ec57dc61008c1abe723d70b) in scanner_create_variables
Shadow bytes around the buggy address:
0x7b6b9b6ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b6ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b6fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b6fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b700000: f1 f1 f1 f1 01 f2 01 f2 00 00 f2 f2 f8 f8 f2 f2
=>0x7b6b9b700080: 00 00 00[f2]f2 f2 f2 f2 00 00 00 00 00 00 00 00
0x7b6b9b700100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b700180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b700200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b700280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7b6b9b700300: 00 00 00 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1365920==ABORTING
这个输入是:
class MyError extends Error {7667111111111111111;;;;;;;static
{ throwased = true;
d = trsert.s}.defeuse(resourcd = true;
new MyError(); });
stack.defer(function () {});
assert.throws(MyError, functction (# {
Csu 12), .defer(function41024448kTtrspose()&
});
还有一个输入会使得用于寄存器间接寻址的寄存器 RDI 地址值变为 RDI 0x646573610a20650a ('\ne \nased')
,RDI 内容是输入本身的一部分。不过很有意思的是它并不会触发 Address Sanitizer。说明 ASAN 很可能会改变某些调用栈帧的内存布局。(我手动 trim 了一下,不然这个输入真的又长又难看。)
class MyE{7667;;667;;sta;7;;667;;s;;#;statTtra;sta;7;;667;;;;;;;;s;;#;statTtra;;';s;;#at;#;statTtra;;';s;;#atTtra;;#;;sta;;;
e
ased =
class{76671;
6
;
s;;;;;;;;;;static
ase
6
e
ased =
class{76671;
6
;
s;;;;;;;;;;static
ased6671;
6
e
ased =
class{76671;
6
;
s;;;;;;;;;;static
as}}}}|}}}Of(}}|}csleO}}}}|}}}Of}}|}02000(1167E0Y.u(3}}}}}}}}}PisleO}}}}|}}}Of}}|}02000(1167E000002000(11676cY.u(Pisle}}}}PisleO}}}}|}}}OfInfinityaa, new .u9PisleOaaaaa!pa}}}}}}PisleO}}}}|}}}Of
另外有很多与它相似的 crash inputs,可以很明显发现 JerryScript 对于 JS 类私有字段名的处理有很大问题。
总结
其实这是一次没什么意义的 fuzzing,fuzz 类似编译器的软件应该使用结构化的 fuzzer,而不是 AFL++ 这样基本依靠字节随机变异的 fuzzer,不然连语法检查都过不了很难进一步挖掘漏洞。之后我可能再尝试一下 fuzzilli,或者考虑自己手写一个 fuzzer(画大饼 ing)。