6.1 Introduction to Trees

While the List abstract data type is extremely common and useful, not all data has a natural linear order. Family trees, corporate organization charts, classification schemes like “Kingdom, Phylum, etc.” and even file storage on computers all follow a hierarchical structure, in which each entity is linked to multiple entities “below” it.

In computer science, we use a tree data structure to represent this type of data. Trees are a recursive data structure, with the following definition: A tree is eitherNote the similarity between this definition and the one for nested lists.

empty, or
has a root value connected to any number of other trees, called the subtrees of the tree.

We generally draw the root at the top of the tree; the rest of the tree consists of subtrees that are attached to the root. Note that a tree can contain a root value but not have any subtrees: this occurs in a tree that contains just a single item.

Tree Terminology

Tree diagram

A tree is either empty or non-empty. Every non-empty tree has a root, which is connected to zero or more subtrees. Because subtrees are themselves trees, each one has its own subtrees. This sometimes leads to confusion! The term “subtree” is always relative to an outer tree, where each subtree is connected to the root of that outer tree. The root value of the above tree is labeled A; it is connected to three subtrees.

The size of a tree is the number of values in the tree. What’s the relationship between the size of a tree and the size of its subtrees?

A leaf is a value with no subtrees. The leaves of the above tree are labeled E, F, G, J, and I. What’s the relationship between the number of leaves of a tree and the number of leaves of its subtrees?

The height of a tree is the length of the longest path from its root to one of its leaves, counting the number of values on the path. The height of the above tree is 4. What’s the relationship between the height of a tree and the heights of its subtrees?

The children of a value are all values directly connected underneath that value. The children of A are B, C, and D. Note that the number of children of a value is equal to the number of its subtrees, but that these two concepts are quite different. The descendants of a value are its children, the children of its children, etc. This can be defined recursively as “the descendants of a value are its children, and the descendants of its children.” What’s the relationship between the number of descendants of a value and the number of descendants of its children?

Similarly, the parent of a tree value is the value immediately above and connected to it; each value in a tree has exactly one parent, except the root, which has no parent. The ancestors of a value are its parent, the parent of its parent, etc. This too can be defined recursively: “the ancestors of a value are its parent, and the ancestors of its parent.”

Note: sometimes, it will be convenient to say that descendants/ancestors of a value include the value itself; we’ll make it explicit whether to include the node or not when it comes up. Note that a value is never a child of itself, nor a parent of itself.