6.7 Binary Search Trees and Running Time

Now we return to the reason we started talking about binary search trees in the first place: we wanted a more efficient implementation of the Collection ADT, which supports search, insertion, and deletion.

The implementation of __contains__, insert, and delete for BSTs all have the same structure, in that they all make just one recursive call inside the recursive step (they each use the BST property to decide which subtree to recurse into). Let’s focus on __contains__ here.

    def __contains__(self, item: Any) -> bool:
        """Return whether <item> is in this BST.
        """
        if self.is_empty():
            return False
        else:
            if item == self._root:
                return True
            elif item < self._root:
                return item in self._left   # or, self._left.__contains__(item)
            else:
                return item in self._right  # or, self._right.__contains__(item)

Each recursive call that is made goes down one level into the tree, so the maximum number of recursive calls that can be made when we perform a search in a tree is equal to the height of the tree plus 1, where the extra call comes because our implementation also recurses into the empty subtree of a leaf.

Since each line of code inside __contains__ except the recursive call runs in constant time (i.e., doesn’t depend on the size of the tree), the total running time is proportional to the number of recursive calls made.

Because we argued that the maximum number of recursive calls is roughly the height of the tree, we could say that the running time is O(h), where h is the height of the tree. However, this is only partially correct. In fact, if we “get lucky” and search for the root item of the tree, it doesn’t matter how tall it is: we’ll only ever make one comparison before returning True!

Worst-case vs. best-case running time

So far in this course, we have mainly focused on how the running time of a function or method depends on the size of its inputs. However, it is very often the case that even for a fixed input size, the running time varies depending on some other properties of the inputs—searching for the root item of a BST vs. searching for an item that is very deep in a BST, for example. It is incorrect to say that the time taken to search for an item of a BST is always equal to h + 1 (where h is the height of the tree); really, this quantity h + 1 is just the maximum of a set of possible running times.

We define the worst-case running time of an algorithm as a function WC(n), which maps an input size n to the maximum possible running time for all inputs of size n. What we colloquially refer to as the “worst case” for an algorithm is actually a family of inputs, one per input size, where each input is one that results in the maximum running time for its size.

For example, we could say that the “worst case” for BST __contains__ is when “the item we’re searching for causes us to recurse down to the deepest leaf in the tree, and then search one of its empty subtrees.” This is a description of not just one input of a fixed size, but rather a set of inputs that all have a property in common.

Since the worst-case running time is a function, we can describe it using our Big-Oh notation. We can say that for BST search, the worst-case running time is O(h), where h is the height of the tree.

Similarly, the best-case running time is a function that maps input size to the minimum possible running time for that input size. We can say that the “best case” for BST search is when we search for the root item in the BST; note again that we are not limiting this description to one input size. Since __contains__ returns immediately if it verifies that the root is equal to the item we’re searching for, we can say that the best-case running time of the method is O(1), i.e., independent of the height of the tree.

When defining a worst case or best case situation for an algorithm, don’t make any assumptions about the size of the input! Students often say that “the best case is when the BST is empty”; but this is incorrect, since it only considers one input size (0). Whatever description or properties you give for the “worst case” or “best case” should make sense for any input size.

Tree height and size

You might look at O(h) and recall that we said searching through an unsorted list takes O(n) time, where n is the size of the list. Since both of these expressions look linear, it might seem that BSTs are no better (in terms of Big-Oh) than unsorted lists.

This is where our choice of variables really matters. We can say that BST search is, in the worst-case, proportional to the height of the BST; but remember that the height of a BST can be much smaller than its size!

In fact, if we consider a BST with n items, its height can be as large as n (in this case, the BST just looks like a list). However, it can be as small as log(n)! Why? Put another way, a tree of height h can have at most 2^h - 1 items (draw some examples to convince yourself of this), so if we have n items to store, we need at least log(n) height to store all of them.

So if we can guarantee that BSTs always have height roughly log(n), then in fact all three Collection operations (search, insert, delete) have a worst-case running time of O(h) = O(log n), where h is the height of the BST and n is its size.

Even for sorted lists, for which we can use binary search and find items in O(log n) time in the worst case, they are still limited by insertion and deletion at the front, as we discussed earlier in the course. BSTs aren’t—what matters is not where the item needs to be inserted, but rather the overall height.

Unfortunately, neither the insertion nor deletion algorithms we have covered in this course will guarantee that when we modify the tree, its height remains roughly logarithmic in its size. (One example you explored in the lab is when you insert items into a BST in sorted order.) However, in the later course CSC263, Data Structures and Analysis, you will explore more sophisticated insertion and deletion algorithms that do ensure that the height is always logarithmic, thus guaranteeing the efficiency of these operations!