1.5 Testing Your Work

The last step of the Function Design Recipe is to test your code—but how? In this section, we discuss the different strategies for testing code that you’ll use during the term, and beyond. As you write more and more complex programs in this course, it will be vital to maintain good habits to support you in your programming. One of these habits is developing good tests that will ensure your code is correct, and— often overlooked—using good tools to make those tests as easy to run as possible. You want to get in the habit of writing tests early in the process of programming, and running them as often as possible to detect coding errors as soon as you make them.

Doctests: basic examples in docstrings

Often, beginners test their code by importing their function into the Python interpreter, and then manually copy-and-pasting their examples one at a time and comparing the output with the expected output in the docstring. This approach is both time-consuming and error-prone. It may be good for a quick sanity check, but we can certainly do better.

Our first improvement is to use the Python library doctest, which looks for examples in docstrings and converts them automatically into runnable tests! To use doctest, you can add the following code to the bottom of any Python file:

if __name__ == '__main__':
    import doctest     # import the doctest library
    doctest.testmod()  # run the tests

Then when you run the file, all of the doctest examples are automatically run, and you receive a report about which tests failed.

Creating test suites with `pytest`

The problem with doctest and putting examples in our docstrings is that we can’t include all of the test cases we want to without making the docstrings far too long for the reader.

So while you should continue to put in a few basic doctests inside docstrings, in this course you will primarily use the pytest library to test your code. This library allows us to write our tests in a separate file, and so include an exhaustive set of tests without cluttering our code files. You see an example of pytest in your first lab, and will be seeing plenty more throughout the term. There are two important points we want to remind you of when using pytest:

Each function whose name starts with “test” is a separate test. They are all run independently of each other, and in a random order.
Tests use the assert statement as the actual action that verifies the correctness of your code. The assert statement is used as follows:
```
assert <expression>
```
The <expression> should be a boolean expression (e.g., x == 3) that tests something about your function. We say that an assertion succeeds (or passes) when its expression evaluates to True, and it fails when its expression evaluates to False.

A single test function in pytest can contain multiple assert statements; the test passes if all of the assert statements pass, but it fails when one or more of the assert statements fail.

Choosing test cases

We said earlier that keeping our tests in separate files from our source code enables us to write an exhaustive set of tests without worrying about length. But what exactly do we mean by “exhaustive?” In general, it is actually a pretty hard problem to choose test cases to verify the correctness of your program. You want to capture every possible scenario, while avoiding writing redundant tests. A good rule of thumb is to structure your tests around properties of the inputs. For example:

integers: 0, 1, positive, negative, “small”, “large”
lists: empty, length 1, no duplicates, duplicates, sorted, unsorted
strings: empty, length 1, alphanumeric characters only, special characters like punctuation marks

For functions that take in multiple inputs, we often also choose properties based on the relationships between the inputs. For example, for a function that takes two numbers as input, we might have a test for when the first is larger than the second, and another for when the second is larger than the first. For an input of one object and a list, we might have a test for when the object is in the list, and another for when the object isn’t.

And finally, keep in mind that these are rules of thumb only; none of these properties will always be relevant to a given function. For a complete set of tests, you must understand exactly what the function does, to be able to identify what properties of the inputs really matter.

Property-based testing

The kinds of tests we’ve discussed so far involve defining input-output pairs: for each test, we write a specific input to the function we’re testing, and then use assert statements to verify the correctness of the corresponding output. (For a function that mutates its input, we use assert statements to verify the correctness of the new state of the input after the function executes.) These tests have the advantage that writing any one individual test is usually straightforward, but the disadvantage that choosing and implementing test cases can be challenging and time-consuming.

There is another way of constructing tests that we will explore in this course: property-based testing, in which a single test typically consists of a large set of possible inputs that is generated in a programmatic way. Such tests have the advantage that it is usually straightforward to cover a broad range of inputs in a short amount of code (using a library like hypothesis, as we’ll see); but it isn’t always easy to specify exactly what the corresponding outputs should be. If we were to write code to compute the correct answer, how would we know that that code is correct?

So instead, property-based tests use assert statements to check for properties that the function tested should satisfy. In the simplest case, these are properties that every output of the function should satisfy, regardless of what the input was. For example:

The type of the output: “the function str should always return a string.”
Allowed values of the output: “the function len should always return an integer that is greater than or equal to zero.”
Relationships between the input and output: “the function max(x, y) should return something that is greater than or equal to both x and y.”

These properties may seem a little strange, because they do not capture precisely what each function does; for example, str should not just return any string, but a string that represents its input. This is the trade-off that comes with property-based testing: in exchange for being able to run our code on a much larger range of inputs, we write tests which are imprecise characterizations of the function’s inputs. The challenge with property-based testing, then, is to come up with good properties that narrow down as much as possible the behaviour of the function being tested.

Putting it all together

Ideally, we use all three of these types of testing in combination:

doctest is used to test basic functionality, as well as to communicate what the correct behaviour of the function is.
test suites (developed using a tool like pytest) are used to fully assess the correctness of our function in a range of carefully chosen test cases that we generate by hand.
property-based tests (developed using a tool like hypothesis) are used for a more shallow assessment of correctness but on a much larger number of automatically generated test cases.

Doctests: basic examples in docstrings

Creating test suites with pytest

Choosing test cases

Property-based testing

Putting it all together

Creating test suites with `pytest`