The Magic behind Feedback-based Fuzzing

What's all the fuzz about?

You are a developer or an IT project manager? You are wondering why fuzzing is gathering so much popularity recently and looking for more information? You still think that fuzzing is just feeding random inputs into the program? You have come to the right place!

This article will introduce you to new cutting-edge fuzzing technologies and show how fuzzing can help you build more secure software. You will learn about recent developments in fuzzing from first-hand sources and will be able to compare different types of fuzzing engines based on their technical features.

So let's start with an overview of fuzzing as compared to other software testing methods and an introduction into code coverage and feedback-based fuzzing.

Unit, integration, system and acceptance testing

If you are a software developer, you most likely already know the difference between unit, integration, system and acceptance tests (Diagram 1). If that is the case, you can skip reading this section and move on to the more “fuzzy” parts. If not, read on for a quick recap.

Diagram 1: Software testing types (Source: Wikipedia)

In case you are a fan of test-driven development (like we are), you know that for every new feature that you add to your software, you should first create a corresponding unit test. A unit test verifies the correctness of a relatively isolated software unit, such as a class, by testing every method that it offers. It usually does so by executing the functions under test with pre-defined, limited sets of inputs, which should lead to defined behaviors or outputs. If these expected outputs do not match the actual output during the execution, the unit test fails.

Here is a simple example in C++ (taken from the documentation of the unit test framework Catch ). Say you have written a function to calculate factorials and now you want to test it.

TEST_CASE creates a new unit test case, that can later be executed form the framework. REQUIRE defines an assertion that has to evaluate to true in order for the test to pass.

An integration test does something very similar but on a higher level of software architecture. While unit tests should have no dependencies to code outside of the unit under test, these dependencies are the focus of integration testing. But just like in unit testing, sets of inputs and expected outputs have to be defined by the developer to ensure that the different software modules interact like they are designed to.

System testing is the next level of software testing which is conducted on a complete and fully integrated software system. There are different types of system testing, including usability, hardware/software, and functional testing.

Acceptance testing is the last stage of testing done by actual users of the software. The software is checked against technical, functional and business requirements.

Limitations of unit and integration tests

Let us come back to unit and integration tests, which play an essential role in developing bug-free, secure and functional software. As we stated earlier, unit and integration tests are done by manually defining a set of inputs and monitoring the output of the system.

Here come two major limitations of these tests: firstly, they only cover a limited set of inputs and may omit some inputs that would cause the software to fail. On the other hand, if a developer wanted to write unit tests that cover all execution paths, those would take more code lines than the software itself.

Secondly, performing unit testing requires a lot of discipline throughout the software development process, since you need to keep track of test results and if necessary, update them on every code change. This may become extremely difficult if you consider the time pressure and changing requirements that are common in most software development projects.

Alternative: feedback-based fuzzing

Early fuzzing that started in the 1980s was a type of automated software testing that, unlike the pre-defined inputs of unit testing, generates random inputs with the aim of failing the program being tested. The program being fuzzed was then monitored for exceptions such as crashes, or failing built-in code assertions, or memory leaks. The problem with brute-force fuzzing using just random inputs is that it can take a very long time to find more complex bugs or bugs hidden deep within the program.

Modern fuzzing engines (such as AFL) do not just use random inputs but smart algorithms tailoring the input to increase the amount of code which is tested/covered by the fuzzer. The term which is often used for this is feedback-driven or feedback-based fuzzing.

In comparison with unit and integration tests, the advantage of feedback-based fuzzing is that it works not just with a predefined set of inputs but is able to evolve these inputs effectively through mutation.

What is code coverage?

Code coverage, also called test coverage, measures the amount/share of code executed during a given test. The most common use case for code coverage is to supply additional data to software tests. Below you will find some common coverage metrics:

Statement coverage – the number of statements in the software source that have been executed during the test divided by the total number of statements in the code
Line coverage – the number of lines of code that have been executed divided by the total number of lines
Function coverage – the share of defined functions that have been called during the test
Basic block coverage – execution of code lines within basic blocks (not considering edges)
Edge coverage – all the edges of the control graph that are being followed during the test divided by all edges (this also includes branch coverage)
Path coverage - the number of logical paths in the program that were taken during execution divided by the number of all possible paths

Higher code coverage through instrumentation

To trace the input flow throughout program execution, for example, comparisons, divisions and some array operations can be instrumented. Instrumentation is a technique to measure code coverage by adding markers to code blocks. Basically, the instrumentation is injected into a compiled program.

Every fuzzing engine has its own method for injecting instrumentation into the target application. But the essential steps are the same. The compiler adds different functions on clearly defined positions like the beginning of a basic block or above an ‘if’-statement. The following image describes this process.

Diagram 2. Instrumentation during fuzzing

Sanitizers are detection tools based on compiler instrumentation. In this example, we use two types of Sanitizers, trace_cmp_function, and trace_pc. In order to log coverage, the function trace_pc will log the program counter. With this information, the fuzzer knows, which paths are traversed on the given input values. Each fuzzing engine runs through this process differently.

Taking the example of AFL, the instrumentation is injected into a compiled program to capture branch (edge) coverage (edge coverage provides considerably more insight into the execution paths of the program than basic block coverage). AFL logs edge coverage, which can be described as follows: transitions from one basic block to another are tracked by XOR’ing the two random values, using the result as an address and incrementing the counter at that address. To be able to differentiate between the transitions A→B and B→A, the address of the originating basic blocks gets shifted by one bit.

LibFuzzer uses another approach. It uses callback functions, which are inserted during the compiling process. One of the callback functions provides a coverage counter and tells the fuzzer which paths are traversed. Another example is trace_cmp which saves the values within a comparison (e.g. ‘if’- statement). Some coverage Sanitizers are solely implemented in libFuzzer.

When a mutated input produces a new state transition within the program, i.e. reaches a new block of code, it is added to the input queue and used as a starting point for future rounds of fuzzing. It supplements but does not automatically replace existing finds.

Inputs that do not trigger new state transitions are discarded, even if their overall control flow sequence is unique.

What is next?

Now that you have learned how fuzzing can greatly enhance your unit and integration testing results, you are wondering how you can use this instrument for yourself. Furthermore, you might be wondering how you can integrate it into your development process without spending too much time and effort.

Given that fuzzers, such as AFL, can only be used by experts, we developed a tool that is both technologically advanced (for example, it combines fuzzing technology of three fuzzing engines: AFL, libFuzzer, and honggfuzz) and easy to use. It integrates directly into your IDE and you will be able to run fuzz tests on every code change quickly and easily. You can learn more about it here.

Summary

In this article, we reviewed what unit and integration tests are and learned how fuzzing relates to those tests. We also learned about code coverage, how it is measured and what it is used for. Finally, we had a look at the process of feedback-based fuzzing using instrumentation techniques. See you in the next post!