3.4 Analysing Program Running Time

Here is an alternate way of implementing the Stack ADT based on lists, using the front of the list to represent the top of the stack.

class Stack2:
    """Alternate stack implementation.

    This implementation uses the *front* of the Python list to represent
    the top of the stack.
    """
    # === Private Attributes ===
    # _items:
    #     The items stored in the stack. The front of the list represents
    #     the top of the stack.
    _items: List

    def __init__(self) -> None:
        """Initialize a new empty stack."""
        self._items = []

    def is_empty(self) -> bool:
        """Return whether this stack contains no items.

        >>> s = Stack()
        >>> s.is_empty()
        True
        >>> s.push('hello')
        >>> s.is_empty()
        False
        """
        return self._items == []

    def push(self, item: Any) -> None:
        """Add a new element to the top of this stack."""
        self._items.insert(0, item)

    def pop(self) -> Any:
        """Remove and return the element at the top of this stack.

        Raise an EmptyStackError if this stack is empty.

        >>> s = Stack()
        >>> s.push('hello')
        >>> s.push('goodbye')
        >>> s.pop()
        'goodbye'
        """
        if self.is_empty():
            raise EmptyStackError
        else:
            return self._items.pop(0)

Even though this implementation seems to be conceptually the same as the first (one uses the back of the list, the other uses the front), it turns out their runtime performance is quite different!

A simple time profiler

By making use of a built-in Python library called timeit, we can easily get rough estimates of how long our code takes to run. The key function we import is called timeit, Yes, this function has the same name as the library itself. This is actually fairly common. which takes in a piece of Python code to execute, and returns a float representing the amount of time it took to execute it. We illustrate the use of the timeit function in the following example:

def push_and_pop(s: Stack) -> None:
    """Push and pop a single item onto <stack>.

    This is simply a helper for the main timing experiment.
    """
    s.push(1)
    s.pop()


if __name__ == '__main__':
    # Import the main timing function.
    from timeit import timeit

    # The stack sizes we want to try.
    STACK_SIZES = [1000, 10000, 100000, 1000000, 10000000]
    for stack_size in STACK_SIZES:
        # Uncomment the stack implementation that we want to time.
        stack = Stack()
        # stack = Stack2()

        # Bypass the Stack interface to create a stack of size <stack_size>.
        # This speeds up the experiment, but we know this violates encapsulation!
        stack._items = list(range(stack_size))

        # Call push_and_pop(stack) 1000 times, and store the time taken in <time>.
        # The globals=globals() is used for a technical reason that you can ignore.
        time = timeit('push_and_pop(stack)', number=1000, globals=globals())

        # Finally, report the result. The :>8 is used to right-align the stack size
        # when it's printed, leading to a more visually-pleasing report.
        print(f'Stack size {stack_size:>8}, time {time}')

Running this code on a Stack and a Stack2 instance illustrates a stark difference between these two classes. While the Stack instance seems to take the same amount of time per operation regardless of how many items are on the stack, the Stack2 class seems to have the amount of time grow with the number of items on the stack. In fact, the amount of time required in a Stack2 operation is roughly proportional to the size of the stack, while the amount of time required in Stack is independent of the size of the stack!

Memory allocation for lists in Python

To understand why there’s such a dramatic difference, we really need to understand how Python lists are stored in memory.

Recall that a variable in Python stores a reference to an object. A Python list is a special type of object that contains an ordered sequence of references to other objects, which we call the elements of the list. Importantly, these references are stored in consecutive blocks of memory—just as we’ve been drawing them in our memory model diagrams so far. This is what makes accessing list elements so fast: getting the i-th list item can be done just by calculating its address (i.e., location in the computer’s memory), based on the address where the list starts, and offset by i addresses. Think about it like this: suppose you’re walking down a hallway with numbered rooms on just one side and room numbers going up by one. If you see that the first room number is 11, and you’re looking for room 15, you can be confident that it is the fifth room down the hall.

To preserve this characteristic, lists must always be contiguous; there can’t be any “gaps”, or else Python couldn’t calculate the memory address of the i-th list item. But this makes insertion and deletion less efficient: for an item to be deleted, all items after it have to be moved down one block in memory, and similarly, for insertion all items are moved up one block. We have a trade-off: we give up fast insertion and deletion in order to gain fast lookup by index.

There is one more important feature of Python lists that you should know, and it makes adding elements at the end of the list very fast: when you create a new list in Python, it actually allocates (assigns) more memory to the list than the list actually requires. If you create a list of 4 elements, you might get enough space to hold 8 elements. The exact numbers are implementation-specific and not important here; the general idea is that there is usually free space at the end of the list that the list can “expand” into. In particular, if you want to add an object to the end of the list, you can simply add a reference to it into that spot. On the other hand, if you want to add a new item to the list and there is no more free space, a new and larger chunk of memory is allocated for the list and every item is copied into it. You’ll learn more about the details of such an implementation in our data structures course, CSC263.

The net effect of these implementation details is that it’s much faster to add and remove at the end of the list than at its front! Adding at the end usually requires only expanding into the extra space; occasionally there won’t be any extra space left, and some time-consuming work has to be done. But on balance, adding at the end is much less time-consuming than adding at the beginning, which always requires shifting every element down a spot. (Removing items is analogous.) Now the reason for the speed difference in our stack example is clear!

Analysing algorithm running time

In evaluating the quality of our programs, we have talked about two metrics so far. The first is correctness: does our code actually work, even in special corner cases, and handle errors appropriately? The second is design: is the code carefully designed, and well-documented so that it is easy to understand and work with by both clients and implementers of the code?

From what we’ve seen about the two different stack implementations, there is certainly another way in which we can determine the quality of our code: how quickly the code runs. We will now study how to assess the running time efficiency of our programs rigorously, and communicate these ideas with other computer scientists.

Observations about runtime

First, recall that for most algorithms, their running time depends on the size of the input—as the input numbers or lists get larger, for example, we expect algorithms operating on them to increase as well. So when we measure efficiency, we really care about a function of the amount of time an algorithm takes to run in terms of the size of the input. We can write something like T(n) to denote the runtime of a function of size n (but note that this isn’t always necessarily n).

How best to measure runtime? We might try to use tools like the timeit function, but there are many factors that influence the time it takes for code to run: how powerful your machine is, how many other programs are running at the same time. As with Schrödinger’s cat, even the act of observing the runtime can affect performance!

What about the number of basic steps an algorithm takes? This is a bit better, but still subtly misleading: do all “basic” operations take the same amount of time? What counts as a basic operation? Etc. etc.

This is where Big-Oh comes in: it allows an elegant way of roughly characterizing the type of growth of the runtime of an algorithm, without actually worrying about things like how different CPUs implement different operations, whether a for loop is faster than a while loop, etc. Not that these things aren’t important—they are simply at another level of detail. There is no point fussing about this level of detail until we know the vastly larger kind of differences in growth rate that Big-Oh is designed to describe.

When we characterise the Big-Oh property of a function, we really are thinking about general terms like linear, quadratic, or logarithmic growth. For example, when we see a loop through a list like this:

for item in lst:
    # do something with item

we know that the runtime is proportional to the length of the list. If the list gets twice as long, we’d expect this algorithm to take twice as long. The runtime grows linearly with the size of the list, and we write that the runtime is O(n), where n is the length of the list.

Ignoring constants, focusing on behaviour as problem size grows

In CSC165/CSC240, you learn about the formal mathematical definition of Big-Oh notation, but this is not covered in this course. Intuitively, Big-Oh notation allows us to analyse the running time of algorithms while ignoring two details:

  1. The constants and lower-order terms involved in the step counting: 5n, n + 10, 19n − 10, 0.05n are all O(n)—they all have linear growth.
  2. The algorithm’s running time on small inputs. The key idea here is that an algorithm’s behaviour as the input size gets very large is much more important than how quickly it runs on small inputs.

Instead, Big-Oh notation allows us to capture how running time grows as the problem size grows. We say that Big-Oh deals with asymptotic runtime.

Note that these points mean that Big-Oh notation is not necessarily suitable for all purposes. For example, even though the sorting algorithm mergesort runs in time O(nlog n) and the algorithm insertion sort runs in time O(n2), for small inputs (e.g., lists of size <= 10), insertion sort runs significantly faster than mergesort in practice! And sometimes the constants are important too—even though yet another algorithm called quicksort is (on average) an O(nlog n) algorithm, it has smaller constants than mergesort and so typically runs faster in practice. Neither of these practical observations are captured by the sorting algorithms’ respective Big-Oh classes!

Terminology and mathematical intuition

Here is a table that summarizes some of the more common Big-Oh classes, and the English terminology that we use to describe their growth:

Big-Oh Growth term
O(log n) logarithmic
O(n) linear
O(n2) quadratic
O(2n) exponential (with base 2)

Notice that we say growth is “exponential” when the variable is in the exponent, as in 2n (but not n2).

There is a very nice mathematical intuition that describes these classes too. Suppose we have an algorithm which has running time N0 when given an input of size n, and a running time of N1 on an input of size 2n. We can characterize the rates of growth in terms of the relationship between N0 and N1:

Big-Oh Relationship
O(log n) N1 ≈ N0 + c
O(n) N1 ≈ 2N0
O(n2) N1 ≈ 4N0
O(2n) N1 ≈ (N0)2

Constant time

There is one more use of Big-Oh notation that we require, which is to capture the case of a function whose asymptotic growth does not depend on its input! For example, consider the constant function f(n) = 10. This function doesn’t depend on its input, and so we say that it has constant asymptotic behaviour, writing O(1) to represent this.

In the context of running time, we would say that a particular function or method “runs in constant time” to say that its runtime doesn’t depend on the size of its input.

For example, our above discussion about index lookup in array-based Python lists can be summarized by saying that it is a constant time operation: looking up lst[i] takes time that does not depend on either len(lst) or i itself!