6.2 A Tree Implementation

Here is a simple implementation of a tree in Python. As usual, we’ll start with a very bare-bones implementation, and then develop more and more methods for this class throughout the course.

class Tree:
    """A recursive tree data structure.
    """
    # === Private Attributes ===
    # The item stored at this tree's root, or None if the tree is empty.
    _root: Optional[Any]
    # The list of all subtrees of this tree.
    _subtrees: List[Tree]

    # === Representation Invariants ===
    # - If self._root is None then self._subtrees is an empty list.
    #   This setting of attributes represents an empty tree.
    #
    #   Note: self._subtrees may be empty when self._root is not None.
    #   This setting of attributes represents a tree consisting of just one
    #   node.

    def __init__(self, root: Optional[Any], subtrees: List[Tree]) -> None:
        """Initialize a new tree with the given root value and subtrees.

        If <root> is None, this tree is empty.
        Precondition: if <root> is None, then <subtrees> is empty.
        """
        self._root = root
        self._subtrees = subtrees

    def is_empty(self) -> bool:
        """Return whether this tree is empty.

        >>> t1 = Tree(None, [])
        >>> t1.is_empty()
        True
        >>> t2 = Tree(3, [])
        >>> t2.is_empty()
        False
        """
        return self._root is None

Our initializer here always creates either an empty tree (when root is None), or a tree with a value and the given subtrees. Note that it is possible for root to not be None, but subtrees to still be empty: this represents a tree with a single root value, and no subtrees. As we’ll soon see, the empty tree and single value cases are generally the base cases when writing recursive code that operates on trees.

Recursion on trees

There’s a reason we keep asking the same question “What’s the relationship between a tree’s X and the X of its subtrees?” Understanding the relationship between a tree and its subtrees—that is, its recursive structure—allows us to write extremely simple and elegant recursive code for processing trees, just as it did with nested lists earlier in the course.

Here’s a first example: “the size of a non-empty tree is the sum of the sizes of its subtrees, plus 1 for the root; the size of an empty tree is 0.” This single observation immediately lets us write the following recursive function for computing the size of a tree. Again, note the similarity to nested lists. This will be a consistent refrain throughout this section.

def __len__(self) -> int:
    """Return the number of items contained in this tree.

    >>> t1 = Tree(None, [])
    >>> len(t1)
    0
    >>> t2 = Tree(3, [Tree(4, []), Tree(1, [])])
    >>> len(t2)
    3
    """
    if self.is_empty():
        return 0
    else:
        size = 1  # count the root
        for subtree in self._subtrees:
            size += subtree.__len__()  # could also do len(subtree) here
        return size

We can generalize this nicely to a template for recursive methods on trees:

def f(self) -> ...:
    if self.is_empty():
        ...
    else:
        ...
        for subtree in self._subtrees:
            ... subtree.f() ...
        ...

Of course, often the ellipses will contain some reference to self._root as well!

An explicit size-one case

Often when first dealing with trees, students like to think explicitly about the case where the tree consists of just a single item. We can modify our __len__ implementation to handle this case separately by adding an extra check:

def __len__(self):
    if self.is_empty():         # tree is empty
        return 0
    elif self._subtrees == []:  # tree is a single item
        return 1
    else:                       # tree has at least one subtree
        size = 1  # count the root
        for subtree in self._subtrees:
            size += subtree.__len__()
        return size

Sometimes, this check will be necessary: we’ll want to do something different for a tree with a single item than for either an empty tree or one with at least one subtree. And sometimes, this check will be redundant: the action performed by this case is already handled by the recursive step.

In the case of __len__, the latter situation applies. The single-item case is already correctly handled by the recursive step, which will simply return 1 when there are no subtrees, because the loop does not execute.

However, the possibility of having a redundant case shouldn’t discourage you from starting off by including this case. Treat the detection and coalescing of redundant cases as part of the code editing process. Your first draft might have some extra code, but that can be removed once you are confident that your implementation is correct. For your reference, here is the three-case recursive Tree code template:

def f(self) -> ...:
    if self.is_empty():         # tree is empty
        ...
    elif self._subtrees == []:  # tree is a single value
        ...
    else:                       # tree has at least one subtree
        ...
        for subtree in self._subtrees:
            ... subtree.f() ...
        ...

Traversing a tree

Because the elements of a list have a natural order, lists are pretty straightforward to traverse, meaning (among other things) that it’s easy to write a __str__ method that will produce a str containing all of the elements. With trees, there is a non-linear ordering on the elements. How might we write a __str__ method for trees?

Here’s an idea: start with the value of the root, then recursively add on the __str__ for each of the subtrees. That’s pretty easy to implement. The base case is when the tree is empty, and in this case the method returns an empty string.

def __str__(self) -> str:
    """Return a string representation of this tree.
    """
    if self.is_empty():
        return ''
    else:
        # We use newlines (\n) to separate the different values.
        s = f'{self._root}\n'
        for subtree in self._subtrees:
            s += str(subtree)  # equivalent to subtree.__str__()
        return s

Consider what happens when we run this on the following tree structure:

>>> t1 = Tree(1, [])
>>> t2 = Tree(2, [])
>>> t3 = Tree(3, [])
>>> t4 = Tree(4, [t1, t2, t3])
>>> t5 = Tree(5, [])
>>> t6 = Tree(6, [t4, t5])
>>> print(t6)
6
4
1
2
3
5

We know that 6 is the root of the tree, but it’s ambiguous how many children it has. In other words, while the items in the tree are correctly included, we lose the structure of the tree itself.

Drawing inspiration from how PyCharm (among many other programs) display the folder structure of our computer’s files, we’re going to use indentation to differentiate between the different levels of a tree. For our example tree, we want __str__ to produce this:

>>> # (The same t6 as defined above.)
>>> print(t6)
6
  4
    1
    2
    3
  5

In other words, we want __str__ to return a string that has 0 indents before the root value, 1 indent before its children’s values, 2 indents before their children’s values, and so on. But how do we do this? We need the recursive calls to act differently—to return strings with more indentation the deeper down in the tree they are working. In other words, we want information from where a method is called to influence what happens inside the method. This is exactly the problem that parameters are meant to solve!

So we’ll pass in an extra parameter for the depth of the current tree, which will be used to add a corresponding number of indents before each value in the str that is returned. We can’t change the API of the __str__ method itself, but we can define a helper method that has this extra parameter:

def _str_indented(self, depth: int) -> str:
    """Return an indented string representation of this tree.

    The indentation level is specified by the <depth> parameter.
    """
    if self.is_empty():
        return ''
    else:
        s = '  ' * depth + str(self._root) + '\n'
        for subtree in self._subtrees:
            # Note that the 'depth' argument to the recursive call is
            # modified.
            s += subtree._str_indented(depth + 1)
        return s

Now we can implement __str__ simply by making a call to _str_indented:

def __str__(self) -> str:
    """Return a string representation of this tree.
    """
    return self._str_indented(0)

>>> t1 = Tree(1, [])
>>> t2 = Tree(2, [])
>>> t3 = Tree(3, [])
>>> t4 = Tree(4, [t1, t2, t3])
>>> t5 = Tree(5, [])
>>> t6 = Tree(6, [t4, t5])
6
  4
    1
    2
    3
  5

Technical note: optional parameters

One way to customize the behaviour of functions is to make a parameter optional by giving it a default value. This can be done for any function, recursive or non-recursive, inside or outside a class. The syntax for doing so is quite simple; we use it in this revised version of _str_indented to give a default value for depth:

def _str_indented(self, depth: int=0) -> str:
    """Return an indented string representation of this tree.

    The indentation level is specified by the <depth> parameter.
    """
    if self.is_empty():
        return ''
    else:
        s = '  ' * depth + str(self._root) + '\n'
        for subtree in self._subtrees:
            # Note that the 'depth' argument to the recursive call is
            # modified.
            s += subtree._str_indented(depth + 1)
        return s

In this version of _str_indented, depth is an optional parameter that can either be included or not included when this method is called.

So we can call t._str_indented(5), which sets its depth parameter to 5, as we would expect. However, we can also call t._str_indented() (no argument for depth given), in which case the method is called with the depth parameter set to 0.

Optional parameters are a powerful Python feature that allows us to write more flexible functions and methods to be used in a variety of situations. Two important points to keep in mind, though:

All optional parameters must appear after all of the required parameters in the function header.
Do NOT use mutable values like lists for your optional parameters. (If you do, the code will appear to work, until it mysteriously doesn’t. Feel free to ask more about this during office hours.) Instead, use optional parameters with immutable values like integers, strings, and None.