Fuzzing rust code: cargo-fuzz and honggfuzz

This post explains how to test Rust code using fuzzers.

Parsers are good target for fuzzers, especially because they usually are functions that only takes bytes as input.

Preamble: why fuzz Rust code ?

Since Rust provides some kind of memory safety, it may first appear strange to fuzz Rust code. However, there are some kind of bugs that a fuzzer will help you find, including:

debug or unfinished code, like unimplemented! and panic! calls
out of range accesses, like array[i]
integers overflows/underflows, like base + offset (see end of article)
stack overflows, unbound recursions
crashes in unsafe code
direct calls to std::process::exit
timeouts and functions that take too long

The last kind of bugs is very similar to the previous list, but is quite annoying:

functions from other crates, even std, which can panic (like Add, Sub and other ops for Duration)

Note that fuzzing has other advantages, like acting as regression tests when you change/update code, and can even be used as functional testing if you add assertions.

The good news is that some tools are well integrated with cargo, making it very easy to use. We’ll show how to use 2 tools: cargo-fuzz and honggfuzz.

cargo-fuzz

cargo-fuzz is a nice command-line wrapper around libFuzzer, an LLVM library for coverage-based fuzzing.

Install cargo-fuzz:

$ cargo install cargo-fuzz

Note that you will need the nightly version of the toolchain.

Note: I’m using stable as default, so all commands requiring nightly will explicitly set +nightly.

Initialize directories:

$ cargo +nightly fuzz init

This will create the fuzz/ directory, where fuzzers code and data will live.

We’ll then add a new fuzzer. I’ll take the example of the parse_der function of der-parser (DER is always a good target for fuzzing).

$ cargo +nightly fuzz add fuzzer_parse_der

It will create fuzz/fuzz_targets/fuzzer_parse_der.rs with template code, and add stuff to fuzz/Cargo.toml so it builds. Edit the fuzzer_parse_der.rs, and add a call to the targeted function:

#![no_main]
extern crate libfuzzer_sys;
extern crate der_parser;
#[export_name="rust_fuzzer_test_input"]
pub extern fn go(data: &[u8]) {
    let _ = der_parser::parse_der(data);
}

That’s all you need to start. Run the fuzzer:

$ cargo +nightly fuzz run fuzzer_parse_der

It will show you the progress of execution. To understand output, see libFuzzer Output. Many items are interesting:

NEW items show that a new code path has been discovered. With time, NEW events will appear less often
cov and ft give an idea of the current coverage (edges and comparisons)
exec/s shows how many times per second the function has executed. This should remain high for the fuzzer to be efficient.

Next, a few tips that will make fuzzing much more efficient.

Use a corpus

libFuzzer is a mutation-based fuzzer. If you give it enough time, it should be able to find all execution paths and branches. However, discovery of new paths can be slow.

Providing examples (as much as possible, good and bad) in the corpus makes this process much faster. This is especially true if the input data should be structured.

Simply copy files to the fuzz/corpus/fuzzer_parse_der/ directory.

Neat point: if your fuzzer is already running, do nothing more! Files copied to the corpus are automatically detected, and the fuzzer will issue a RELOAD.

Use parallelism

cargo-fuzz can start many processes (instead of 1 by default) using the --jobs n option.

$ cargo +nightly fuzz run --jobs 24 fuzzer_parse_der

All processes share the same corpus, and paths discovered by one process are automatically used by others (since a new item is added to corpus, others will reload it).

Handling crashes

When a crash is detected, the input will be saved in the fuzz/artifacts/<fuzzer_name>/ directory.

This is useful, especially for testing a candidate fix. Giving the name of a file as input argument of cargo-fuzz will run the fuzzer only for this file.

$ cargo +nightly fuzz run fuzzer_parse_der fuzz/artifacts/<fuzzer_name>/<input_file>

Minimize corpus

libFuzzer adds new samples to the corpus for every new path. With time, the corpus can grow to become very large, which will slow down fuzzing and become hard to manage (please do not commit a corpus of several gigs to github when your source code is only a few kilobytes!).

The cmin subcommand can be used to minimize corpus examples, while preserving coverage.

$ cargo +nightly fuzz cmin fuzzer_parse_der

Also fuzz in release mode

By default, cargo-fuzz uses the debug mode (which is good, because operations on integers are only instrumented in debug mode by default). Fuzzing in release mode has several advantages: it is much faster, and it provides a target closer to the code that will be executed in the end.

Just add --release to command-line arguments:

$ cargo +nightly fuzz run --jobs 24 --release fuzzer_parse_der

Note that fuzzing in release mode is complementary to debug mode, but does not replace it.

Visualizing code coverage

After torturing the parser, I want to look at the coverage of source code by the current corpus. For this I used kcov, passing the entire corpus as argument of the fuzzer:

$ cd fuzz
$ mkdir cov
$ kcov ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*

with not much success:

WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Seed: 631975293
kcov: Process exited with signal 11 (SIGSEGV) at 0x5555555a8f75

After fighting a bit with search results, I found this kcov issue which helped solving the problem. After adding some arguments (to include source path for the fuzzer and the der-parser lib), kcov worked:

kcov --include-path .,.. ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*

Overview

honggfuzz

honggfuzz is another fuzzer, maintained by Google, easy to use in Rust using honggfuzz-rs.

I have not a complete comparison of of honggfuzz versus cargo-fuzz. Here are a few points that looks important to me:

cargo-fuzz makes it easier to create multiple fuzzers in the same project. It is more efficient to write multiple, small fuzzers if your project can parse several formats. It is faster, and makes managing corpus easier.
honggfuzz seems faster in terms of execution per seconds (not sure why, just looking at the numbers)
honggfuzz does not require nightly

Similarly to the fuzz directory of cargo-fuzz, I used a hfuzz directory to store the project.

Create a new crate:

$ cargo new --bin hfuzz

Enter this directory, and edit the Cargo.toml file:

# ...
[dependencies]
honggfuzz = "0.5"

[dependencies.der-parser]
path = ".."

Edit src/main.rs file, and call the targeted function:

#[macro_use]
extern crate honggfuzz;
extern crate der_parser;

fn main() {
    println!("Starting fuzzer");
    loop {
        // The fuzz macro gives an arbitrary object (see `arbitrary crate`)
        // to a closure-like block of code.
        // For performance reasons, it is recommended that you use the native type
        // `&[u8]` when possible.
        // Here, this slice will contain a "random" quantity of "random" data.
        fuzz!(|data: &[u8]| {
          let _ = der_parser::parse_der(data);
        });
    }
}

Build and run fuzzer:

$ cargo hfuzz build
$ cargo hfuzz run hfuzz

This will display the fuzzing console, with some log output and current progress display.

Parallelism

By default, honggfuzz uses half the logical number of CPUs. To set the number of workers, use the -n argument:

$ HFUZZ_RUN_ARGS="-n 12" cargo hfuzz run hfuzz

Corpus

honggfuzz stores its corpus (by default) in hfuzz_workspace/<fuzzer_name>/input/.

Note that the corpus is similar to the one used in libFuzzer or cargo-fuzz, so files can just be copied to populate the initial corpus.

Exit upon crash

Unlike cargo-fuzz, honggfuzz will continue when the target function crashes. While this can be interesting in some cases, during the development phase it is easier to stop the fuzzer and fix the bug before continuing.

To do that, add --exit_upon_crash to HFUZZ_RUN_ARGS.

Release mode

By default, honggfuzz uses instrumented release mode. To build (run) without instrumentation, or in debug mode, use the following targets: build-no-instr, build-debug (run-no-instr, run-debug) instead of build (run).

honggfuzz arguments

The list of possible arguments for HFUZZ_RUN_ARGS can be displayed using:

$ cargo honggfuzz --help

Digression on checked integer operations

In Rust, the overflowing_xxx and checked_xxx family of functions helps detecting overflows, returning the result of operations and a boolean set to true if an operation occurred.

This is great, because a parser should be careful with overflows (especially when adding numbers coming from untrusted sources like the network).

However, the checked_shl and overflowing_shl are misleading in the performed check: they do not check for overflow, but that the number of bits shifted is not greater than the representation of the left argument. While this is important to test, this is not sufficient to detect shift overflows.

See this rust playground example for a silent overflow.

Note that this is usually the case for the CPU instructions (sal for x86, lsl, etc.) where the shifted bit goes to the carry flag, so if you shift several bits and the last is a 0, the overflow cannot be detected using carry. LLVM does not provide intrinsics to test it.

As such, it is neither a bug nor a new thing, but something that you have to be careful with.

To test for an overflow when shifting bits, you have to test manually.

Page 1 / 1

links

Tags

Pollux's corner Rust

links

Social

Tags

Pollux's corner Rust