6.3 Mutating Trees

Now that we have some experience working with trees, let’s talk about mutating them. There are two fundamental mutating operations that we want to perform on trees: insertion and deletion. We’ll only cover deletion in this section; you’ll implement an insertion algorithm in this week’s lab.

Our goal is to implement the following method:

def delete_item(self, item: Any) -> bool:
    """Delete *one* occurrence of <item> from this tree.

    Return True if <item> was deleted, and False otherwise.
    Do not modify this tree if it does not contain <item>.
    """

We’ll start by filling in the code template, as usual. For this case, we’ll use the three-branch version, which explicitly separates the size-one case. As we work through the code for each case, draw an example tree so that you can trace what happens to it.

def delete_item(self, item: Any) -> bool:
    """Delete *one* occurrence of <item> from this tree.

    Return True if <item> was deleted, and False otherwise.
    Do not modify this tree if it does not contain <item>.
    """
    if self.is_empty():
        ...
    elif self._subtrees == []:
        ...
    else:
        ...
        for subtree in self._subtrees:
            ... subtree.delete_item(item) ...
        ...

The base cases of when this tree is empty and when it has a single value are rather straightforward to implement:

def delete_item(self, item: Any) -> bool:
    if self.is_empty():
        return False              # item is not in the tree
    elif self._subtrees == []:
        if self._root != item:    # item is not in the tree
            return False
        else:                     # resulting tree should be empty
            self._root = None
            return True
    else:
        ...
        for subtree in self._subtrees:
            ... subtree.delete_item(item) ...
        ...

In the recursive step, we’re going to first check whether the item is equal to the root; if it is, then we only need to remove the root, and if not, we need to recurse on the subtrees to look further for the item.

def delete_item(self, item: Any) -> Bool:
    if self.is_empty():
        return False              # item is not in the tree
    elif self._subtrees == []:
        if self._root != item:    # item is not in the tree
            return False
        else:                     # resulting tree should be empty
            self._root = None
            return True
    else:
        if self._root == item:
            self._delete_root()   # delete the root
            return True
        else:
            for subtree in self._subtrees:
                subtree.delete_item(item)

Deleting the root when there are subtrees is a little bit challenging, so we’ll defer that until later. We can use the common strategy of writing a call to a helper method (_delete_root) that doesn’t actually exist yet. The call will remind us to implement the helper later.

The final else branch may look done, but it has serious problems:

  1. It doesn’t return anything, violating this method’s type contract.
  2. If one of the recursive calls successfully finds and deletes the item, no further subtrees should be modified (or even need to be recursed on).

The solution to both of these problems lies in the fact that our current loop doesn’t store the value of the recursive calls anywhere. The key insight is that we should use the return value of each recursive call to determine whether an item was deleted, and whether to continue on to the next subtree:

def delete_item(self, item: Any) -> Bool:
    if self.is_empty():
        return False              # item is not in the tree
    elif self._subtrees == []:
        if self._root != item:    # item is not in the tree
            return False
        else:                     # resulting tree should be empty
            self._root = None
            return True
    else:
        if self._root == item:
            self._delete_root()   # delete the root
            return True
        else:
            for subtree in self._subtrees:
                deleted = subtree.delete_item(item)
                if deleted:
                    # One occurrence of the item was deleted, so we're done.
                    return True
                else:
                    # No item was deleted. Continue onto the next iteration.
                    # Note that this branch is unnecessary; we've only shown it
                    # to write comments.
                    pass

            # If we don't return inside the loop, the item is not deleted from
            # any of the subtrees. In this case, the item does not appear
            # in <self>.
            return False

Next, let’s deal with the one piece we deferred: implementing _delete_root. Note that all it needs to do is delete the root value of the tree, and restructure the tree so that the root value is not None. Why mustn’t we leave a None behind? Hint: Look at the representation invariants for the Tree class.

There are many, many ways of doing this. Here’s one where we just pick the rightmost subtree, and “promote” its root and subtrees by moving them up a level in the tree.

def _delete_root(self) -> None:
    """Remove the root item of this tree.

    Precondition: this tree has at least one subtree.
    """
    # Get the last subtree in this tree.
    chosen_subtree = self._subtrees.pop()

    self._root = chosen_subtree._root
    self._subtrees.extend(chosen_subtree._subtrees)

This maybe isn’t very satisfying, because while the result certainly is still a tree, it feels like we’ve changed around a lot of the structure of the original tree just to delete a single element. We encourage you to explore other ways to delete the root of a tree.

The problem of empty trees

We’re not quite done.

In our current implementation of delete_item, suppose we delete an item that is a leaf of the given tree. We’ll successfully delete that item, but the result of doing so is an empty tree—so its parent will contain an empty tree in its subtrees list! For example:

>>> t = Tree(10, [Tree(1, []), Tree(2, []), Tree(3, [])])  # A tree with leaves 1, 2, and 3
>>> t.delete_item(1)
True
>>> t.delete_item(2)
True
>>> t.delete_item(3)
True
>>> t._subtrees
[<__main__.Tree object at 0x081B4770>, <__main__.Tree object at 0x081B49F0>, <__main__.Tree object at 0x0845BB50>]
>>> t._subtrees[0].is_empty() and t._subtrees[1].is_empty() and t._subtrees[2].is_empty()
True

Our tree t now has three empty subtrees! This is certainly unexpected, and depending on how we’ve written our Tree methods, this may cause errors in our code. At the very least, these empty subtrees are taking up unnecessary space in our program, and make it slower to iterate through a subtree list.

Fixing the problem

So instead, if we detect that we deleted a leaf, we should remove the now-empty subtree from its parent’s subtree list. This actually involves only a very small code change in delete_item:

def delete_item(self, item: Any) -> Bool:
    if self.is_empty():
        return False              # item is not in the tree
    elif self._subtrees == []:
        if self._root != item:    # item is not in the tree
            return False
        else:                     # resulting tree should be empty
            self._root = None
            return True
    else:
        if self._root == item:
            self._delete_root()   # delete the root
            return True
        else:
            for subtree in self._subtrees:
                deleted = subtree.delete_item(item)
                if deleted and subtree.is_empty():
                    # The item was deleted and the subtree is now empty.
                    # We should remove the subtree from the list of subtrees.
                    # Note that mutating a list while looping through it is
                    # EXTREMELY DANGEROUS!
                    # We are only doing it because we return immediately
                    # afterwards, and so no more loop iterations occur.
                    self._subtrees.remove(subtree)
                    return True
                elif deleted:
                    # The item was deleted, and the subtree is not empty.
                    return True
                else:
                    # No item was deleted. Continue onto the next iteration.
                    # Note that this branch is unnecessary; we've only shown it
                    # to write comments.
                    pass

            # If we don't return inside the loop, the item is not deleted from
            # any of the subtrees. In this case, the item does not appear
            # in <self>.
            return False

Note that the code for removing a now-empty subtree is within a loop that iterates through the list of subtrees. In general it is extremely dangerous to remove an object from a list as you iterate through it, because this interferes with the iterations of the loop that is underway. We avoid this problem because immediately after removing the subtree, we stop the method by returning True.

Implicit assumptions are bad! Representation invariants are good!

Up to this point, you’ve probably wondered why we need a base case for an empty tree, since it seems like if we begin with a non-empty tree, our recursive calls would never reach an empty tree. But this is only true if we assume that each _subtrees list doesn’t contain any empty trees! While this may seem like a reasonable assumption, if we don’t make it explicit, there is no guarantee that this assumption will always hold for our trees.

Even though we recognized and addressed this issue in our implementation of delete_item, this is not entirely satisfying—what about other mutating methods? Rather than having to always remember to worry about removing empty subtrees, we can make this assumption explicit as a representation invariant for our Tree class:

class Tree:
    # === Private Attributes ===
    _root: Optional[Any]
    _subtrees: List[Tree]

    # === Representation Invariants ===
    # - If self._root is None then self._subtrees is an empty list.
    #   This setting of attributes represents an empty tree.
    #
    #   Note: self._subtrees may be empty when self._root is not None.
    #   This setting of attributes represents a tree consisting of just one value.
    #
    # - (NEW) self._subtrees does not contain any empty trees.

With this representation invariant written down, future people working on the Tree class won’t have to remember a special rule about empty subtrees—instead, they’ll just need to remember to consult the class’ representation invariants.

Exercises

  1. Currently, the size-one case in delete_item is not redundant; however, it is possible to modify _delete_root so that this case and the recursive step can be merged, by allowing _delete_root to take a non-empty tree that has no subtrees. Modify the current implementations of _delete_root and delete_item to achieve this.
  2. Write a method delete_item_all that deletes every occurrence of the given item from a tree. Think carefully about the order in which you check the root vs. other subtrees here.