Fuzzing rust code: cargo-fuzz and honggfuzz

This post explains how to test Rust code using fuzzers.

Parsers are good target for fuzzers, especially because they usually are functions that only takes bytes as input.

Preamble: why fuzz Rust code ?

Since Rust provides some kind of memory safety, it may first appear strange to fuzz Rust code. However, there are some kind of bugs that a fuzzer will help you find, including:

debug or unfinished code, like unimplemented! and panic! calls
out of range accesses, like array[i]
integers overflows/underflows, like base + offset (see end of article)
stack overflows, unbound recursions
crashes in unsafe code
direct calls to std::process::exit
timeouts and functions that take too long

The last kind of bugs is very similar to the previous list, but is quite annoying:

functions from other crates, even std, which can panic (like Add, Sub and other ops for Duration)

Note that fuzzing has other advantages, like acting as regression tests when you change/update code, and can even be used as functional testing if you add assertions.

The good news is that some tools are well integrated with cargo, making it very easy to use. We’ll show how to use 2 tools: cargo-fuzz and honggfuzz.

cargo-fuzz

cargo-fuzz is a nice command-line wrapper around libFuzzer, an LLVM library for coverage-based fuzzing.

Install cargo-fuzz:

$ cargo install cargo-fuzz

Note that you will need the nightly version of the toolchain.

Note: I’m using stable as default, so all commands requiring nightly will explicitly set +nightly.

Initialize directories:

$ cargo +nightly fuzz init

This will create the fuzz/ directory, where fuzzers code and data will live.

We’ll then add a new fuzzer. I’ll take the example of the parse_der function of der-parser (DER is always a good target for fuzzing).

$ cargo +nightly fuzz add fuzzer_parse_der

It will create fuzz/fuzz_targets/fuzzer_parse_der.rs with template code, and add stuff to fuzz/Cargo.toml so it builds. Edit the fuzzer_parse_der.rs, and add a call to the targeted function:

#![no_main]
extern crate libfuzzer_sys;
extern crate der_parser;
#[export_name="rust_fuzzer_test_input"]
pub extern fn go(data: &[u8]) {
    let _ = der_parser::parse_der(data);
}

That’s all you need to start. Run the fuzzer:

$ cargo +nightly fuzz run fuzzer_parse_der

It will show you the progress of execution. To understand output, see libFuzzer Output. Many items are interesting:

NEW items show that a new code path has been discovered. With time, NEW events will appear less often
cov and ft give an idea of the current coverage (edges and comparisons)
exec/s shows how many times per second the function has executed. This should remain high for the fuzzer to be efficient.

Next, a few tips that will make fuzzing much more efficient.

Use a corpus

libFuzzer is a mutation-based fuzzer. If you give it enough time, it should be able to find all execution paths and branches. However, discovery of new paths can be slow.

Providing examples (as much as possible, good and bad) in the corpus makes this process much faster. This is especially true if the input data should be structured.

Simply copy files to the fuzz/corpus/fuzzer_parse_der/ directory.

Neat point: if your fuzzer is already running, do nothing more! Files copied to the corpus are automatically detected, and the fuzzer will issue a RELOAD.

Use parallelism

cargo-fuzz can start many processes (instead of 1 by default) using the --jobs n option.

$ cargo +nightly fuzz run --jobs 24 fuzzer_parse_der

All processes share the same corpus, and paths discovered by one process are automatically used by others (since a new item is added to corpus, others will reload it).

Handling crashes

When a crash is detected, the input will be saved in the fuzz/artifacts/<fuzzer_name>/ directory.

This is useful, especially for testing a candidate fix. Giving the name of a file as input argument of cargo-fuzz will run the fuzzer only for this file.

$ cargo +nightly fuzz run fuzzer_parse_der fuzz/artifacts/<fuzzer_name>/<input_file>

Minimize corpus

libFuzzer adds new samples to the corpus for every new path. With time, the corpus can grow to become very large, which will slow down fuzzing and become hard to manage (please do not commit a corpus of several gigs to github when your source code is only a few kilobytes!).

The cmin subcommand can be used to minimize corpus examples, while preserving coverage.

$ cargo +nightly fuzz cmin fuzzer_parse_der

Also fuzz in release mode

By default, cargo-fuzz uses the debug mode (which is good, because operations on integers are only instrumented in debug mode by default). Fuzzing in release mode has several advantages: it is much faster, and it provides a target closer to the code that will be executed in the end.

Just add --release to command-line arguments:

$ cargo +nightly fuzz run --jobs 24 --release fuzzer_parse_der

Note that fuzzing in release mode is complementary to debug mode, but does not replace it.

Visualizing code coverage

After torturing the parser, I want to look at the coverage of source code by the current corpus. For this I used kcov, passing the entire corpus as argument of the fuzzer:

$ cd fuzz
$ mkdir cov
$ kcov ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*

with not much success:

WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Seed: 631975293
kcov: Process exited with signal 11 (SIGSEGV) at 0x5555555a8f75

After fighting a bit with search results, I found this kcov issue which helped solving the problem. After adding some arguments (to include source path for the fuzzer and the der-parser lib), kcov worked:

kcov --include-path .,.. ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*

Overview

honggfuzz

honggfuzz is another fuzzer, maintained by Google, easy to use in Rust using honggfuzz-rs.

I have not a complete comparison of of honggfuzz versus cargo-fuzz. Here are a few points that looks important to me:

cargo-fuzz makes it easier to create multiple fuzzers in the same project. It is more efficient to write multiple, small fuzzers if your project can parse several formats. It is faster, and makes managing corpus easier.
honggfuzz seems faster in terms of execution per seconds (not sure why, just looking at the numbers)
honggfuzz does not require nightly

Similarly to the fuzz directory of cargo-fuzz, I used a hfuzz directory to store the project.

Create a new crate:

$ cargo new --bin hfuzz

Enter this directory, and edit the Cargo.toml file:

# ...
[dependencies]
honggfuzz = "0.5"

[dependencies.der-parser]
path = ".."

Edit src/main.rs file, and call the targeted function:

#[macro_use]
extern crate honggfuzz;
extern crate der_parser;

fn main() {
    println!("Starting fuzzer");
    loop {
        // The fuzz macro gives an arbitrary object (see `arbitrary crate`)
        // to a closure-like block of code.
        // For performance reasons, it is recommended that you use the native type
        // `&[u8]` when possible.
        // Here, this slice will contain a "random" quantity of "random" data.
        fuzz!(|data: &[u8]| {
          let _ = der_parser::parse_der(data);
        });
    }
}

Build and run fuzzer:

$ cargo hfuzz build
$ cargo hfuzz run hfuzz

This will display the fuzzing console, with some log output and current progress display.

Parallelism

By default, honggfuzz uses half the logical number of CPUs. To set the number of workers, use the -n argument:

$ HFUZZ_RUN_ARGS="-n 12" cargo hfuzz run hfuzz

Corpus

honggfuzz stores its corpus (by default) in hfuzz_workspace/<fuzzer_name>/input/.

Note that the corpus is similar to the one used in libFuzzer or cargo-fuzz, so files can just be copied to populate the initial corpus.

Exit upon crash

Unlike cargo-fuzz, honggfuzz will continue when the target function crashes. While this can be interesting in some cases, during the development phase it is easier to stop the fuzzer and fix the bug before continuing.

To do that, add --exit_upon_crash to HFUZZ_RUN_ARGS.

Release mode

By default, honggfuzz uses instrumented release mode. To build (run) without instrumentation, or in debug mode, use the following targets: build-no-instr, build-debug (run-no-instr, run-debug) instead of build (run).

honggfuzz arguments

The list of possible arguments for HFUZZ_RUN_ARGS can be displayed using:

$ cargo honggfuzz --help

Digression on checked integer operations

In Rust, the overflowing_xxx and checked_xxx family of functions helps detecting overflows, returning the result of operations and a boolean set to true if an operation occurred.

This is great, because a parser should be careful with overflows (especially when adding numbers coming from untrusted sources like the network).

However, the checked_shl and overflowing_shl are misleading in the performed check: they do not check for overflow, but that the number of bits shifted is not greater than the representation of the left argument. While this is important to test, this is not sufficient to detect shift overflows.

See this rust playground example for a silent overflow.

Note that this is usually the case for the CPU instructions (sal for x86, lsl, etc.) where the shifted bit goes to the carry flag, so if you shift several bits and the last is a 0, the overflow cannot be detected using carry. LLVM does not provide intrinsics to test it.

As such, it is neither a bug nor a new thing, but something that you have to be careful with.

To test for an overflow when shifting bits, you have to test manually.

Other articles

OCaml LLVM bindings tutorial, part 4
Date Wed 01 July 2015 Tags Programming OCaml LLVM Compiler

See also:
In the previous examples, we’ve seen how to build OCaml applications to read, manipulate and write LLVM bitcode.

To be able to generate realistic code, we now need to add a few more things. This part explains how to create bitcode with a correctly specified target triple, how to verify bitcode, and write a hello world application.

Target Triple and Data Layout

While LLVM IR is (or should be) target independent, there are a few things that are not. For example, the support for some instructions, the padding and alignment inside structures, the endianness, the size of pointers, etc. All these things are specified in two attributes of modules: the target triple, and the data layout.

In the current (3.5) version of LLVM, these two attributes are optional. However, they could become mandatory in the future, so it is best specifying them.

Note: in my personal opinion, specifying that inside the module is clearly redundant with the -march= option of llc. Most of this could have been handled by compiler flags, instead of creating situations where one can …
read more
OCaml LLVM bindings tutorial, part 3
Date Wed 24 June 2015 Tags Programming OCaml LLVM Compiler

See also:
- OCaml LLVM bindings tutorial, part 1
- OCaml LLVM bindings tutorial, part 2
The previous articles explain how to build applications using the OCaml-LLVM bindings, and how to use the API to manipulate the LLVM objects. This was the “read-only” part of the tutorial, which can be used to analyze LLVM IR.

This part explains how to create LLVM IR, and write a simple application from scratch, and see how to build and run it.

Modules

As in the previous tutorial, we need to create a context and a module:
```
let llctx = global_context () in
let llm = create_module llctx "mymodule" in
```
Functions

There are two actions that can be done on functions:
- declare_function to give only a declaration of the prototype,
- define_function to give both the declaration and the implementation.
In both cases, we need to give the signature (return type, number and type of arguments) of the function.

This is pretty similar to C. We’ll use this to declare the function int main(void).

The int type is a bit problematic in LLVM (and in C, but for other reasons): integer types must have a known size in LLVM. While this does not change the architecture-independent property …
read more
OCaml LLVM bindings tutorial, part 2
Date Thu 11 June 2015 Tags Programming OCaml LLVM Compiler

See also:
- OCaml LLVM bindings tutorial, part 1
In the previous tutorial, we’ve seen how to use ocamlbuild and make to build a simple application. In this part, we’ll start exploring the API, and see how to access values and attributes of LLVM objects.

The base of the code is the same as in part 1: it reads an existing LLVM bitcode file, for example one generated by clang.

As in previous tutorial part, knowing the LLVM C++ API is not required (but can help).

LLVM objects

The top-level container is a module (llmodule). The module contains global variables, types and functions, which in turn contains basic blocks, and basic blocks contain instructions.

Values

In the OCaml bindings, all objects (variables, functions, instructions) are instances of the opaque type llvalue.

A value has a type, a name, a definition, a list of users, and other things like attributes (for ex. visibility or linkage options) or aliases.

Each value has a type (lltype), which is a composite object to define the type of a value and its arguments. To match the real type, it needs to be converted to a TypeKind.t:
```
let rec print_type llty =
  let ty = Llvm …
```
read more
OCaml LLVM bindings tutorial, part 1
Date Tue 09 June 2015 Tags Programming OCaml LLVM Compiler

This is the first part of a tutorial series, on how to use the OCaml bindings for LLVM. Why use OCaml bindings ? Because you can avoid using the C++ API, spending huge amounts of time compiling Clang sources, then your plugin, then debugging the segfaults again and again. The bindings are stable, cover most of the API, and are quite simple to use, thanks to the Debian packages.

This tutorial is written based on a Debian Sid, things may differ but should stay similar on other distributions.

The objectives of this first part are:
- install the required packages
- setup a build environment for ocamlbuild
- build a simple application that reads an LLVM bitcode file and prints it
Installation

The required packages are:
- llvm-3.5-dev
- libllvm-3.5-ocaml-dev
- the LLVM and OCaml compilers (llvm-3.5, ocaml)
- optionally, clang
The current LLVM version is 3.6, however the OCaml bindings are currently disabled (See Debian bug #783919), because of changes in the required dependencies.

Project Layout

The sources are organized as follows:
```
part1/
├── build
├── Makefile
└── src
    └── tutorial01.ml
```
First application

First, create file src/tutorial01.ml:
```
let _ =
  let llctx = Llvm.global_context () in
  let llmem = Llvm.MemoryBuffer.of_file Sys.argv.(1) in
  let …
```
read more
Materials for my talk at SSTIC 2015 - PICON : Control Flow Integrity on LLVM IR
Date Mon 08 June 2015 Tags Programming LLVM Compiler Security

Here are the materials for the talk PICON : Control Flow Integrity on LLVM IR, given during SSTIC 2015. While SSTIC is a french-speaking conference, I publish here in English because my other posts also are in English.

Here is the summary, from the website:

Control flow integrity has been a well explored field of software security for more than a decade.

However, most of the proposed approaches are stalled in a proof of concept state - when the implementation is publicly available - or have been designed with a minimal performance overhead as their primary objective, sacrificing security.

Currently, none of the proposed approaches can be used to fully protect real-world programs compiled with most common compilers (e.g. GCC, Clang/LLVM).

In this paper we describe a control flow integrity enforcement mechanism whose main objective is security. Our approach is based on compile-time code instrumentation, making the program communicate with its external execution monitor. The program is terminated by the monitor as soon as a control flow integrity violation is detected.

Our approach is implemented as an LLVM plugin and is working on LLVM’s Intermediate Representation.
- Article (EN)
- Slides (FR)
- Video (FR)
Code is currently being published (with an opensource …
read more
Python scripts in GDB
Date Mon 20 December 2010 Tags Programming gdb debug Python

Since version 7.0, gdb has gained the ability to execute Python scripts. This allows to write gdb extensions, commands, or manipulate data in a very easy way. It can also allow to manipulate graphic data (by spawning commands in threads), change the program, or even write a firewall (ahem ..). I’ll assume you’re familiar with both gdb commands and basic Python scripts.

The first and very basic test is to check a simple command
```
(gdb) python print "Hello, world !"
Hello, world !
```
So far so good. Yet, printing hello world won’t help us to debug our programs :)

The reference documentation can be found here, but does not really help for really manipulating data. I’ll try to give a few examples here.

The Python script

The first thing to do is to write a script (we’ll call it gdb-wzdftpd.py) containing the Python commands.

We will define a command to print the Glib’s type GList, with nodes and content (which is stored using a void*).

To define a new command, we have to create a new class inherited from gdb.Command. This class has two mandatory methods, __init__ and invoke.

Gdb redirects stdout and stderr to …
read more
animated charts in python and Qt
Date Sun 04 October 2009 Tags Programming Python Graphs Visualization

I’m currently trying to generated interactive (and animated) charts in Python + Qt. The wanted library would be:
- portable: this is one of the reasons of the choice of PyQt
- simple: same reason
- interactive: I want to be able to select, for example, the slices of a pie chart. A signal of events like Qt’s would be perfect
- animated: this is useless, but looking at things like AnyChart or FusionCharts, the result is really nice !
- light on dependencies: relying on tons of libs makes the project hard to maintain and not portable, especially for windows where there is not packaging and dependency system.
- free software
A quick search gave me the following products:
- matplotlib: mostly for scientific plots, but there is a nice number of options, a well-documented API.
- pyQwt: Python bindings for Qwt. Again, it’s more scientific plot than charts
- cairoplot: projects looks dead (or in the "yeah, the project’s not finished, but we’re recoding it in \$LANG to be faster" syndrome, which is more or less the same). It generates images, though item maps can be extracted. The name tells it, it uses Cairo.
- pyCha: some nice charts, uses Cairo. Very simple API (not …
read more
libnetfilter-{queue,log} bindings release
Date Sun 03 May 2009 Tags Netfilter Python Perl Programming

I just released nfqueue-bindings 0.2 and nflog-bindings 0.1. Despite the difference of versions, functions are almost the same :)

Here is a short diff since previous version:
```
Add af_family argument to bind operations (allow IPv6 binds)
Add notes on set_queue_maxlen requiring a kernel >= 2.6.20
bugfix: use queue number when creating queue
bugfix: really link Perl binding to Perl library 
Fix cmake warning
```
Get them on nfqueue-bindings and nflog-bindings.

read more
Git rocks
Date Wed 05 November 2008 Tags Programming Git

No news here, this post is mostly a note for myself, to remember some commands for git:

Creating a repository to be shared between several hosts (with an existing project)

On the server:
```
mkdir project.git
cd project.git
git --bare init
```
On the remote host:
```
cd project
git init
git remote add origin ssh://server/var/git/project
git config branch.master.remote origin
git config branch.master.merge refs/heads/master
```
Now you can make the first commit:
```
git add .
git commit -m "First commit"
git push
```
Fix a mistake in a previous commit
1. Save your work so far.
2. Stash your changes away for now: git stash
3. Now your working copy is clean at the state of your last commit.
4. Use ‘git rebase -i’, and use the ‘edit’ command on the commit you want to edit
5. Make the fixes. (If you just want to change the log, skip this step.)
6. Commit the changes in “amend” mode: git commit —all —amend
7. Your editor will come up asking for a log message (by default, the old log message). Save and quit the editor when you’re happy with it.
8. The new changes are added on to the old commit. See …
read more

Page 1 / 2 »

links

Tags

Pollux's corner Programming

Other articles

OCaml LLVM bindings tutorial, part 4

Target Triple and Data Layout

OCaml LLVM bindings tutorial, part 3

Modules

Functions

OCaml LLVM bindings tutorial, part 2

LLVM objects

Values

OCaml LLVM bindings tutorial, part 1

Installation

Project Layout

First application

Materials for my talk at SSTIC 2015 - PICON : Control Flow Integrity on LLVM IR

Python scripts in GDB

The Python script

animated charts in python and Qt

libnetfilter-{queue,log} bindings release

Git rocks

Creating a repository to be shared between several hosts (with an existing project)

Fix a mistake in a previous commit

links

Social

Tags

Pollux's corner Programming

Other articles

Target Triple and Data Layout

Modules

Functions

LLVM objects

Values

Installation

Project Layout

First application

The Python script

Creating a repository to be shared between several hosts (with an existing project)

Fix a mistake in a previous commit