6.5 Binary Search Tree Implementation and Search

Our implementation of a BinarySearchTree class is heavily based on Tree, but with a few important differences. First, because we know there are only two subtrees, and the left/right ordering matters, we use explicit attributes to refer to the left and right subtrees:

class BinarySearchTree:
    """Binary Search Tree class.

    This class represents a binary tree satisfying the Binary Search Tree
    property: for every node, its value is >= all items stored in its left
    subtree, and <= all items stored in its right subtree.
    """
    # === Private Attributes ===
    # The item stored at the root of the tree, or None if the tree is empty.
    _root: Optional[Any]
    # The left subtree, or None if the tree is empty.
    _left: Optional[BinarySearchTree]
    # The right subtree, or None if the tree is empty.
    _right: Optional[BinarySearchTree]

Another difference between BinarySearchTree and Tree is in how we distinguish between empty and non-empty trees. In the Tree class, an empty tree has a _root value of None, and an empty list [] for its list of subtrees. In the BinarySearchTree class, an empty tree also has a _root value of None, but its _left and _right attributes are set to None as well. Moreover, for BinarySearchTree, an empty tree is the only case where any of the attributes can be None; when we represent a non-empty tree, we do so by storing the root item (which isn’t None) at the root, and storing BinarySearchTree objects in the _left and _right attributes. The attributes _left and _right might refer to empty binary search trees, but this is different from them being None!

Any method we add to the BinarySearchTree class (a) can rely upon these properties, and (b) must maintain these properties, since the other methods rely upon them. This is so important that we document them in our representation invariants, along with the BST property itself.

    # === Representation Invariants ===
    #  - If _root is None, then so are _left and _right.
    #    This represents an empty BST.
    #  - If _root is not None, then _left and _right are BinarySearchTrees.
    #  - (BST Property) All items in _left are <= _root,
    #    and all items in _right are >= _root.

Here are the initializer and is_empty methods for this class, which is based on the corresponding methods for the Tree class:

    def __init__(self, root: Optional[Any]) -> None:
        """Initialize a new BST containing only the given root value.

        If <root> is None, initialize an empty BST.
        """
        if root is None:
            self._root = None
            self._left = None
            self._right = None
        else:
            self._root = root
            self._left = BinarySearchTree(None)   # self._left is an empty BST
            self._right = BinarySearchTree(None)  # self._right is an empty BST

    def is_empty(self) -> bool:
        """Return whether this BST is empty.
        """
        return self._root is None

Note that we do not allow client code to pass in left and right subtrees as parameters to the initializer. This is because binary search trees have a much stronger restriction on where values can be located in the tree, and so a separate method is used to insert new values into the tree that will ensure the BST property is always satisfied.

But before we get to the BST mutating methods (inserting and deleting items), we’ll first study the most important BST non-mutating method: searching for an item.

Searching a binary search tree

Recall that the key insight of the binary search algorithm in a sorted list is that by comparing the target item with the middle of the list, we can immediately cut in half the remaining items to be searched. An analogous idea holds for BSTs.

For general trees, the standard search algorithm is to compare the item against the root, and then search in each of the subtrees until either the item is found, or all the subtrees have been searched. When item is not in the tree, every item must be searched.

In stark contrast, for BSTs the initial comparison to the root tells you which subtree you need to check. That is, only one recursive call needs to be made, rather than two!

    def __contains__(self, item: Any) -> bool:
        """Return whether <item> is in this BST.
        """
        if self.is_empty():
            return False
        else:
            if item == self._root:
                return True
            elif item < self._root:
                return item in self._left   # or, self._left.__contains__(item)
            else:
                return item in self._right  # or, self._right.__contains__(item)

While this code structure closely matches the empty-check for the general Tree class, we can also combine the two levels of nested ifs to get a slightly more concise version:

    def __contains__(self, item: Any) -> bool:
        """Return whether <item> is in this BST.
        """
        if self.is_empty():
            return False
        elif item == self._root:
            return True
        elif item < self._root:
            return item in self._left   # or, self._left.__contains__(item)
        else:
            return item in self._right  # or, self._right.__contains__(item)