Fuzzing rust code: cargo-fuzz and honggfuzz
This post explains how to test Rust code using fuzzers.
Parsers are good target for fuzzers, especially because they usually are functions that only takes bytes as input.
Preamble: why fuzz Rust code ?
Since Rust provides some kind of memory safety, it may first appear strange to fuzz Rust code. However, there are some kind of bugs that a fuzzer will help you find, including:
- debug or unfinished code, like
unimplemented!
andpanic!
calls - out of range accesses, like
array[i]
- integers overflows/underflows, like
base + offset
(see end of article) - stack overflows, unbound recursions
- crashes in
unsafe
code - direct calls to
std::process::exit
- timeouts and functions that take too long
The last kind of bugs is very similar to the previous list, but is quite annoying:
- functions from other crates, even std, which can panic (like Add, Sub and other ops for Duration)
Note that fuzzing has other advantages, like acting as regression tests when you change/update code, and can even be used as functional testing if you add assertions.
The good news is that some tools are well integrated with cargo
, making it very easy to use.
We’ll show how to use 2 tools: cargo-fuzz and honggfuzz.
cargo-fuzz
cargo-fuzz
is a nice command-line wrapper around libFuzzer, an LLVM library
for coverage-based fuzzing.
Install cargo-fuzz
:
$ cargo install cargo-fuzz
Note that you will need the nightly
version of the toolchain.
Note: I’m using stable as default, so all commands requiring nightly will explicitly set +nightly
.
Initialize directories:
$ cargo +nightly fuzz init
This will create the fuzz/
directory, where fuzzers code and data will live.
We’ll then add a new fuzzer.
I’ll take the example of the parse_der
function of
der-parser (DER is always a good target for fuzzing).
$ cargo +nightly fuzz add fuzzer_parse_der
It will create fuzz/fuzz_targets/fuzzer_parse_der.rs
with template code, and add stuff to
fuzz/Cargo.toml
so it builds.
Edit the fuzzer_parse_der.rs
, and add a call to the targeted function:
#![no_main]
extern crate libfuzzer_sys;
extern crate der_parser;
#[export_name="rust_fuzzer_test_input"]
pub extern fn go(data: &[u8]) {
let _ = der_parser::parse_der(data);
}
That’s all you need to start. Run the fuzzer:
$ cargo +nightly fuzz run fuzzer_parse_der
It will show you the progress of execution. To understand output, see libFuzzer Output. Many items are interesting:
NEW
items show that a new code path has been discovered. With time,NEW
events will appear less oftencov
andft
give an idea of the current coverage (edges and comparisons)exec/s
shows how many times per second the function has executed. This should remain high for the fuzzer to be efficient.
Next, a few tips that will make fuzzing much more efficient.
Use a corpus
libFuzzer is a mutation-based fuzzer. If you give it enough time, it should be able to find all execution paths and branches. However, discovery of new paths can be slow.
Providing examples (as much as possible, good and bad) in the corpus makes this process much faster. This is especially true if the input data should be structured.
Simply copy files to the fuzz/corpus/fuzzer_parse_der/
directory.
Neat point: if your fuzzer is already running, do nothing more! Files copied to the corpus are
automatically detected, and the fuzzer will issue a RELOAD
.
Use parallelism
cargo-fuzz
can start many processes (instead of 1 by default) using the --jobs n
option.
$ cargo +nightly fuzz run --jobs 24 fuzzer_parse_der
All processes share the same corpus, and paths discovered by one process are automatically used by others (since a new item is added to corpus, others will reload it).
Handling crashes
When a crash is detected, the input will be saved in the fuzz/artifacts/<fuzzer_name>/
directory.
This is useful, especially for testing a candidate fix. Giving the name of a file as input argument
of cargo-fuzz
will run the fuzzer only for this file.
$ cargo +nightly fuzz run fuzzer_parse_der fuzz/artifacts/<fuzzer_name>/<input_file>
Minimize corpus
libFuzzer adds new samples to the corpus for every new path. With time, the corpus can grow to become very large, which will slow down fuzzing and become hard to manage (please do not commit a corpus of several gigs to github when your source code is only a few kilobytes!).
The cmin
subcommand can be used to minimize corpus examples, while preserving coverage.
$ cargo +nightly fuzz cmin fuzzer_parse_der
Also fuzz in release mode
By default, cargo-fuzz
uses the debug mode (which is good, because operations on integers are only
instrumented in debug mode by default).
Fuzzing in release mode has several advantages: it is much faster, and it provides a target closer
to the code that will be executed in the end.
Just add --release
to command-line arguments:
$ cargo +nightly fuzz run --jobs 24 --release fuzzer_parse_der
Note that fuzzing in release mode is complementary to debug mode, but does not replace it.
Visualizing code coverage
After torturing the parser, I want to look at the coverage of source code by the current corpus. For this I used kcov, passing the entire corpus as argument of the fuzzer:
$ cd fuzz
$ mkdir cov
$ kcov ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*
with not much success:
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Seed: 631975293
kcov: Process exited with signal 11 (SIGSEGV) at 0x5555555a8f75
After fighting a bit with search results, I found this kcov
issue which helped solving the problem.
After adding some arguments (to include source path for the fuzzer and the der-parser
lib), kcov
worked:
kcov --include-path .,.. ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*
honggfuzz
honggfuzz is another fuzzer, maintained by Google, easy to use in Rust using honggfuzz-rs.
I have not a complete comparison of of honggfuzz versus cargo-fuzz. Here are a few points that looks important to me:
cargo-fuzz
makes it easier to create multiple fuzzers in the same project. It is more efficient to write multiple, small fuzzers if your project can parse several formats. It is faster, and makes managing corpus easier.honggfuzz
seems faster in terms of execution per seconds (not sure why, just looking at the numbers)honggfuzz
does not require nightly
Similarly to the fuzz
directory of cargo-fuzz
, I used a hfuzz
directory to store the project.
Create a new crate:
$ cargo new --bin hfuzz
Enter this directory, and edit the Cargo.toml
file:
# ...
[dependencies]
honggfuzz = "0.5"
[dependencies.der-parser]
path = ".."
Edit src/main.rs
file, and call the targeted function:
#[macro_use]
extern crate honggfuzz;
extern crate der_parser;
fn main() {
println!("Starting fuzzer");
loop {
// The fuzz macro gives an arbitrary object (see `arbitrary crate`)
// to a closure-like block of code.
// For performance reasons, it is recommended that you use the native type
// `&[u8]` when possible.
// Here, this slice will contain a "random" quantity of "random" data.
fuzz!(|data: &[u8]| {
let _ = der_parser::parse_der(data);
});
}
}
Build and run fuzzer:
$ cargo hfuzz build
$ cargo hfuzz run hfuzz
This will display the fuzzing console, with some log output and current progress display.
Parallelism
By default, honggfuzz
uses half the logical number of CPUs. To set the number of workers, use the
-n
argument:
$ HFUZZ_RUN_ARGS="-n 12" cargo hfuzz run hfuzz
Corpus
honggfuzz stores its corpus (by default) in hfuzz_workspace/<fuzzer_name>/input/
.
Note that the corpus is similar to the one used in libFuzzer
or cargo-fuzz
, so files can just be
copied to populate the initial corpus.
Exit upon crash
Unlike cargo-fuzz
, honggfuzz
will continue when the target function crashes. While this can be
interesting in some cases, during the development phase it is easier to stop the fuzzer and fix the
bug before continuing.
To do that, add --exit_upon_crash
to HFUZZ_RUN_ARGS
.
Release mode
By default, honggfuzz uses instrumented release mode. To build (run) without instrumentation, or
in debug mode, use the following targets: build-no-instr, build-debug
(run-no-instr, run-debug
)
instead of build
(run
).
honggfuzz arguments
The list of possible arguments for HFUZZ_RUN_ARGS
can be displayed using:
$ cargo honggfuzz --help
Digression on checked integer operations
In Rust, the overflowing_xxx
and checked_xxx
family of functions helps detecting overflows,
returning the result of operations and a boolean set to true if an operation occurred.
This is great, because a parser should be careful with overflows (especially when adding numbers coming from untrusted sources like the network).
However, the checked_shl
and overflowing_shl
are misleading in the performed check: they do
not check for overflow, but that the number of bits shifted is not greater than the
representation of the left argument. While this is important to test, this is not sufficient to
detect shift overflows.
See this rust playground example for a silent overflow.
Note that this is usually the case for the CPU instructions (sal
for x86, lsl
, etc.) where the shifted
bit goes to the carry flag, so if you shift several bits and the last is a 0, the overflow cannot be
detected using carry. LLVM does not provide intrinsics to test it.
As such, it is neither a bug nor a new thing, but something that you have to be careful with.
To test for an overflow when shifting bits, you have to test manually.