\( \newcommand{\NOT}{\neg} \newcommand{\AND}{\wedge} \newcommand{\OR}{\vee} \newcommand{\XOR}{\oplus} \newcommand{\IMP}{\Rightarrow} \newcommand{\IFF}{\Leftrightarrow} \newcommand{\TRUE}{\text{True}\xspace} \newcommand{\FALSE}{\text{False}\xspace} \newcommand{\IN}{\,{\in}\,} \newcommand{\NOTIN}{\,{\notin}\,} \newcommand{\TO}{\rightarrow} \newcommand{\DIV}{\mid} \newcommand{\NDIV}{\nmid} \newcommand{\MOD}[1]{\pmod{#1}} \newcommand{\MODS}[1]{\ (\text{mod}\ #1)} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cL}{\mathcal L} \newcommand{\cK}{\mathcal K} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cZ}{\mathcal Z} \newcommand{\emp}{\emptyset} \newcommand{\bs}{\backslash} \newcommand{\floor}[1]{\left \lfloor #1 \right \rfloor} \newcommand{\ceil}[1]{\left \lceil #1 \right \rceil} \newcommand{\abs}[1]{\left | #1 \right |} \newcommand{\xspace}{} \newcommand{\proofheader}[1]{\underline{\textbf{#1}}} \)

13.1 Introduction to Linked Lists

Back in Section 9.7, we learned that Python lists are an array-based implementation of the List abstract data type. This means they have some efficiency drawbacks: inserting and deleting items in a Python list can require shifting many elements in computer memory. In the extreme case, inserting and deleting at the front of a Python list takes \(\Theta(n)\) time, where \(n\) is the length of the list, because every item in the list needs to be shifted.

In this chapter, we’re going to study a different implementation of the List ADT that will attempt to address this efficiency shortcoming. To do so, we’ll use a new data structure called the linked list. Our goal will be to create a new Python class that behaves exactly the same as the built-in list class, changing only the private implementation of the class. This will mean that, ultimately, code such as this:

for i in range(0, n):
    nums.append(i)

print(nums)

will work whether nums is a Python list or an instance of the class we are going to write. We’ll even learn how to make list indexing operations such as nums[0] and nums[3] = 'spider' work on instances of our class!

The reason why Python list operations often require elements to be shifted back and forth is that the (ids of) the list’s elements are stored in contiguous locations in computer memory. What if we didn’t attempt to have this contiguity? If we had a variable referring to the first element of a list, how would we know where the rest of the elements were? We can solve this if we store along with each element a reference to the next element in the list.

This bundling of data—an element plus a reference to the next element–should suggest something familiar to you: the need for a new data type whose instance attributes are exactly these pieces of data. We’ll call this data type a node, and implement it in Python as follows: We use a preceding underscore for the class name to indicate that this entire class is private: it shouldn’t be accessed by client code directly, but instead is only used by the “main” class described later on in this section.

from __future__ import annotations
from dataclasses import dataclass
from typing import Optional


@dataclass
class _Node:
    """A node in a linked list.

    Instance Attributes:
      - item: The data stored in this node.
      - next: The next node in the list, if any.
    """
    item: Any
    next: Optional[_Node] = None  # By default, this node does not link to any other node

An instance of _Node represents a single element of a list; to represent a list of \(n\) elements, we need \(n\) _Node instances. The references in all of their next attributes link the nodes together into a sequence, even though they are not stored in consecutive locations in memory. It is these references, or “links”, which give linked lists their name.

For example, here are three _Node objects that could represent the sequences of numbers 111, -5, 9000.

Memory model diagram of three nodes

Given a bunch of _Node objects, we can follow each next attribute to recover the sequence of items that these nodes represent. But how do we know where the sequence starts? Or put another way, in our above example how do we know there isn’t another _Node object in memory whose next attribute refers to id3?

Aside: what is __future__?

You might have wondered about the top-level import statement in the above code, from __future__ import annotations. This is a technical import for Python that allows us to refer to a class in a type annotation within the body of that class. This is necessary in our case because the _Node class has an attribute, next, whose type annotation includes _Node.

The LinkedList class

The second class we’ll use in our list implementation is LinkedList, which will represent the list itself. This class is the one we want client code to use, and in it we’ll implement methods that obey the same interface as the built-in list class. Our LinkedList class has a private attribute _first that refers to the first node in the list—this is how we know where the list starts, in answer to our question above.

Our first version of this class has a very primitive initializer that always creates an empty list.

class LinkedList:
    """A linked list implementation of the List ADT.
    """
    # Private Instance Attributes:
    #   - _first: The first node in this linked list, or None if this list is empty.
    _first: Optional[_Node]

    def __init__(self) -> None:
        """Initialize an empty linked list.
        """
        self._first = None

Of course, in order to do anything interesting with linked lists, we need to be able to create arbitrarily long linked lists! We’ll see more sophisticated ways of doing this in subsequent sections, but for practice here we’ll violate privacy concerns and just manipulate the private attributes directly.

>>> linky = LinkedList()  # linky is an empty linked list
>>> linky._first is None
True
>>> node1 = _Node(111)   # New node with item 111
>>> node2 = _Node(-5)    # New node with item -5
>>> node3 = _Node(9000)  # New node with item 900
>>> node1.item
111
>>> node1.next is None   # By default, new nodes do not link to another node
True
>>> node1.next = node2   # Let's set some links
>>> node2.next = node3
>>> node1.next is node2  # Now node1 links to node2!
True
>>> node1.next.item
-5
>>> node1.next.next is node3
True
>>> node1.next.next.item
9000
>>> linky._first = node1   # Finally, set linky's first node to node1
>>> linky._first.item      # linky now represents the list [111, -5, 9000]
111
>>> linky._first.next.item
-5
>>> linky._first.next.next.item
9000

Warning: The most common mistake students make when first starting out with linked lists is confusing an individual _Node object with the item it stores. So in the example above, there’s a big difference between node3 and node3.item: the former is a _Node object containing the value 111, while the latter is the int value 111 itself!

>>> node3
_Node(item=9000, next=None)
>>> node3.item
9000

As you start writing code with linked lists, you’ll sometimes want to operate on nodes, and sometimes want to operate on items. Making sure you always know exactly which type you’re working with is vital to your success.

Linked list diagrams

The following is a diagram illustrating our above linked list example, with a variable linky referring to a LinkedList object representing the sequence 111, -5, 9000. (We’ve omitted the other variables node1, node2, and node3.)

Memory model diagram of a linked list

Because each element of a linked list is wrapped in a _Node object, complete memory model diagrams of linked lists are quite a bit larger than those corresponding to Python’s array-based lists. While memory model diagrams are always a useful tool for understanding subtle memory errors—which certainly come up with linked lists!—they can be overkill if you want a quick and dirty linked list diagram. So below we show a stripped down version of this memory model diagram, writing the values of each item within the node boxes and using arrows to represent next attribute references. Note that we use a separate arrow from a smaller box representing the _first attribute of a LinkedList.

Abstract memory model diagram of a linked list