import random
from typing import Tuple, List, Callable, Set, Any
from urllib.parse import urlparse
def delete_random_character(s: str) -> str:
if s == "":
return s
= random.randint(0, len(s) - 1)
pos return s[:pos] + s[pos + 1:]
def insert_random_character(s: str) -> str:
= random.randint(0, len(s))
pos = chr(random.randrange(32, 127))
random_character return s[:pos] + random_character + s[pos:]
def flip_random_character(s: str) -> str:
if s == "":
return s
= random.randint(0, len(s) - 1)
pos = s[pos]
c = 1 << random.randint(0, 6)
bit = chr(ord(c) ^ bit)
new_c return s[:pos] + new_c + s[pos + 1:]
Mutation-Based Fuzzing
Overview
This article discusses the Mutation-Based Fuzzing chapter from The Fuzzing Book and how its content could potentially be useful to the development of our tool, chasten. This chapter builds on the concepts that were covered in the Fuzzing: Breaking Things with Random Inputs chapter, which is explored by another article on this blog. Okay, let’s dive into the technical details!
Summary
A mutation-based fuzzing tool slightly modifies valid inputs of a functional piece of code we are trying to test in order to see how the code handles it. This increases the possibility of having a valid input, as opposed to traditional fuzzing, which supplies completely randomized inputs. Mutation-based fuzzing may only flip a bit or randomize a single byte of an input, hence the use of the word mutation. While this may reduce the scope of testing, it helps to avoid some of the redundant testing that happens when running a traditional fuzzer, given that the majority of traditionally-fuzzed inputs will be invalid regardless and not be able to make it past the parser.
As an example, this article walks us through how to create a mutation-based fuzzer for the input of a URL parser. First, we create three functions that “mutate” the input, which slightly modifies the input in some way. Here are more details about the following functions:
delete_random_character(s: str) -> str
: Randomly delete a character froms
and return the mutateds
.insert_random_character(s: str) -> str
: Randomly insert a character intos
and return the mutateds
.flip_random_character(s):
: Randomly flip one bit in the ASCII representation of one of the bytes ins
and return the mutateds
. Note that this is an interesting function even though, organically, it’s virtually impossible with today’s error correction techniques!
With those points in mind we can now investigate the implementation of these functions!
We can then create a function mutate
, which utilizes the functions above:
def mutate(s: str) -> str:
= [
mutators
delete_random_character,
insert_random_character,
flip_random_character
]= random.choice(mutators)
mutator return mutator(s)
Now, let’s create a function to add n
number of mutations to an input:
def multi_mutate(s: str, n: int) -> str:
= s
res for i in range(n):
= mutate(res)
res return res
print("Example:")
print(f"input: 'Testing 123!!', output: {multi_mutate('Testing 123!!',3)}")
Example:
input: 'Testing 123!!', output: Therting 12+3!!
Now we have everything we need to test a basic function with our mutation-based fuzzing strategy we just created! As done in The Fuzzing Book, we will test the function http_program
:
def http_program(url: str) -> bool:
= ["http", "https"]
supported_schemes = urlparse(url)
result if result.scheme not in supported_schemes:
raise ValueError("Scheme must be one of " +
repr(supported_schemes))
if result.netloc == '':
raise ValueError("Host must be non-empty")
return True
# test `http_program()`
= "http://www.google.com/search?q=fuzzing"
seed_input = set()
valid_inputs = 20
trials = 15
n_mutations
for i in range(trials):
input = multi_mutate(seed_input, n_mutations)
try:
= http_program(input)
result # input is valid, we can add it
# to the set of valid inputs
input)
valid_inputs.add(except ValueError:
# input is invalid, do not add it
# to the set of valid inputs
pass
print("Output:")
print(f"{len(valid_inputs) / trials * 100}% of the trials were valid inputs.")
Output:
0.0% of the trials were valid inputs.
This fuzzer poses as a realistic way to simulate things such as typos or the data corruption of URLs. Because of its close alignment with valid inputs, this method can give us a more in-depth look into how a piece of code handles its input, in comparison to traditional fuzzers.
Reflection
The Mutation-Based Fuzzing chapter of the Fuzzing Book combines previous chapters into a single concept of mutation-based fuzzing as a practical technique for software engineering testing. Creating these manual functions that allow for certain subtle details of inputs to be changed allows for similar, but different inputs that can give us an idea of our function’s reactions to different scenarios. Mutation-based fuzzing allows for a large amount of variation that would be difficult to create by testing inputs. Thus, it allows us to save time and ensures a large variation of coverage without much hassle. At the same time, this process also allows more control than traditional fuzzing, as it builds a variation of inputs based off a valid input, rather than random inputs of a certain length like in traditional fuzzing. Finally, with this data, we can make sure that our function can handle inputs that may be tricky to handle.
This form of testing should be kept in the back of our team’s heads as it will allow for many variations in the testing of inputs for chasten
. Implementing these ideas will allow a greater level of efficiency and security when it comes to testing chasten
with many inputs. Ultimately, this will help us to better establish a confidence in the correctness of the functions in chasten
.
Use Cases
We could use the mutation-based fuzzing in the testing of our tool. Chasten
’s command-line arguments and input files must be formatted in a specific way. This means that the use of randomly generated inputs would only be useful for testing our parser. Mutation-based fuzzing would help us to adequately fuzz test all aspects of our program that process input.
This could be done by implementing the same methods the chapter uses. First, we would need to find a valid input that an input parser accepts. Then, we would need to run that input through a function such as mutate
or multi_mutate
to add mutations to the input. Then we would be able run the mutated version of the input through the program to test for any errors.