Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Python

Build the Forest in Python Series: AVL Tree

5.00/5 (6 votes)
6 Jun 2021CPOL14 min read 7.8K  
Use Python to build AVL Tree
In this article, we look at how the AVL tree introduces some complexity on insertion and deletion operations to keep its balance, like how the AVL tree’s self-balancing ability provides O(lg n) time complexity for basic operations, which is better performance than the regular binary search tree.

Introduction

After the Red-Black Tree discussion, this article will implement another variant of the self-balancing binary search tree: the AVL Tree.

Project Setup

Follow the same style and assumption as other articles in the Build the Forest Series, the implementation assumes Python 3.9 or newer. This article adds two modules to our project: avl_tree.py for the AVL tree implementation and test_avl_tree.py for its unit tests. After adding these two files, our project layout becomes the following:

forest-python
├── forest
│   ├── __init__.py
│   ├── binary_trees
│   │   ├── __init__.py
│   │   ├── avl_tree.py
│   │   ├── binary_search_tree.py
│   │   ├── double_threaded_binary_tree.py
│   │   ├── red_black_tree.py
│   │   ├── single_threaded_binary_trees.py
│   │   └── traversal.py
│   └── tree_exceptions.py
└── tests
    ├── __init__.py
    ├── conftest.py
    ├── test_avl_tree.py
    ├── test_binary_search_tree.py
    ├── test_double_threaded_binary_tree.py
    ├── test_red_black_tree.py
    ├── test_single_threaded_binary_trees.py
    └── test_traversal.py

(The complete code is available at forest-python.)

What is an AVL Tree?

An AVL tree (named after the inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. In addition to the binary-search-tree-property, an AVL tree maintains the AVL-tree-property to keep it balanced:

  • For every node in an AVL tree, the heights of its left subtree and its right subtree differ by at most one.

The property is also called balance factor and can be rewritten to the following formula:

Image 1

If a node’s balance factor > 0, we call it left-heavy. If a node’s balance factor < 0, we call it right-heavy. If a node’s balance factor = 0, it is called balanced.

(Note that some people define the balance factor as the height of its right subtree – the height of its left subtree. In this case, left-heavy becomes a node’s balance factor < 0, whereas right-heavy happens when a node’s balance factor > 0. However, no matter which definition is used, the concept of the AVL tree is the same.)

A typical AVL tree can be visualized in the picture below:

Image 2

In the above picture, BF indicates a node’s balance factor. The numbers under the nodes are the height of the nodes. If a node is empty, i.e., None, its height is -1. This way makes it easier to calculate nodes’ balance factor.

Build AVL Tree

This section will walk through the implementation of the AVL tree with some thoughts behind the implementation choices.

Node

Instead of computing a node’s height every time we need it, we store the height in each node. Therefore, the node’s structure has one more field than the binary search tree node.

Image 3

Storing the height saves the compute time, so we do not need to calculate the height every time we check the balance factor. However, it comes with the cost — we need to keep the height up to date when the AVL tree is modified, such as inserting a node or deleting a node. More details about the height update will come in the insertion and deletion sections.

Like the other binary tree nodes, we utilize the dataclass to define the AVL tree node.

Python
from dataclasses import dataclass

@dataclass
class Node:
    """AVL Tree node definition."""

    key: Any
    data: Any
    left: Optional["Node"] = None
    right: Optional["Node"] = None
    parent: Optional["Node"] = None
    height: int = 0

Class Overview

Like other types of binary trees in the Build the Forest project, the AVL tree class has similar functionalities.

Python
class AVLTree:

    def __init__(self) -> None:
        self.root: Optional[Node] = None

    def __repr__(self) -> str:
        """Provie the tree representation to visualize its layout."""
        if self.root:
            return (
                f"{type(self)}, root={self.root}, "
                f"tree_height={str(self.get_height(self.root))}"
            )
        return "empty tree"

    def search(self, key: Any) -> Optional[Node]:
        …

    def insert(self, key: Any, data: Any) -> None:
        …

    def delete(self, key: Any) -> None:
        …

    @staticmethod
    def get_leftmost(node: Node) -> Node:
        …

    @staticmethod
    def get_rightmost(node: Node) -> Node:
        …

    @staticmethod
    def get_successor(node: Node) -> Optional[Node]:
        …

    @staticmethod
    def get_predecessor(node: Node) -> Optional[Node]:
        …

    @staticmethod
    def get_height(node: Optional[Node]) -> int:
        …

    def _get_balance_factor(self, node: Optional[Node]) -> int:
        …

    def _left_rotate(self, node_x: Node) -> None:
        …

    def _right_rotate(self, node_x: Node) -> None:
        …

    def _insert_fixup(self, new_node: Node) -> None:
        …

    def _transplant(self, deleting_node: Node, replacing_node: Optional[Node]) -> None:
        …

    def _delete_no_child(self, deleting_node: Node) -> None:
        …

    def _delete_one_child(self, deleting_node: Node) -> None:
        …

    def _delete_fixup(self, fixing_node: Node) -> None:
        …

We can implement most of the AVL tree functions precisely as the regular binary search tree, such as search and most auxiliary functions. We can also use the traversal functions in Binary Tree Traversal to traverse an AVL tree.

Insert and delete are two operations that may result in an AVL tree imbalance. Therefore, the AVLTree class has methods that help to keep the tree balanced, including _left_rotate(), _right_rotate(), _insert_fixup(), and _delete_fixup(). Since these helper methods mainly keep an AVL tree balanced, we define them as private functions and transparent to the client code.

Rotations

The method to restore the violated AVL-tree-property after insertion or deletion is rotation (similar to the Red-Black Tree: Rotations).

The picture below demonstrates the four cases in that AVL-tree-property could be broken: left-left, left-right, right-left, and right-right.

Image 4

When we deal with an unbalanced AVL tree, we always start from the bottommost unbalanced node (i.e., the node is either BF > 1 or BF < -1). So, for example, node 23 in the picture above is the bottommost unbalanced node. And then check the balance factor of its child that has the higher height. If the bottommost unbalanced node is left-heavy (i.e., BF > 0) and its child with higher height is left-heavy, it is the left-left case (the leftmost case in the picture above). If the balance factor of the child that has the higher height is right-heavy (i.e., BF < 0), it is the left-right case (the second leftmost case in the picture above). The other cases (right-left and right-right) in the picture are symmetric to the left-left cases and left-right cases.

The following subsections show how rotations rebalance the imbalance cases.

Right Rotation (Left-Left Case)

For the left-left case, we perform the right rotation on the bottommost unbalanced node (node 23 in this case). After the rotation, node 17 becomes the parent of node 23. Also, the height and balance factor of both node 17 and node 23 changed after the rotation.

Image 5

Left-Right Rotation (Left-Right Case)

If the bottommost unbalanced node is left-heavy, but its child (who has higher height) is right-heavy, we perform left rotation first on the child (node 17 in this case) and then perform right rotation on the bottommost unbalanced node (node 23).

Image 6

Note that only the nodes that involve the rotation change their balance factor and heights. For example, after the left rotation, node 17 and node 21 change their height and balance factor (highlight in yellow). After the right rotation, only node 21 and node 23 change their height and balance factor (height in green).

Right-Left Rotation (Right-Left Case)

The right-left case is symmetric to the left-right case. Therefore, we perform the right rotation first and the left rotation after.

Image 7

Left Rotation (Right-Right Case)

The right-right case is symmetric to the left-left case. So we can perform left rotation to make it balance.

Image 8

Summary

The table below summarizes the unbalanced cases and their solutions.

Image 9

Insert

Insertion in the AVL tree has one substantial effect: update the height. Therefore, when we insert a node into an AVL tree, the new node may change the tree height, violating the AVL-tree-property. When that happens, we perform the specific rotations to rebalance the tree.

Height Update

Before we implement the insert function, we need to understand how the insertion changes the height. First, note that the new node to be inserted must become a leaf node after the insertion. Therefore, the new node’s height must be 0, and its balance factor is also 0. Besides, if the new node’s parent has a child before the new node’s insertion, the parent and the overall tree’s height remain the same: no height changes, no AVL-tree-property violation.

Image 10

Second, the height change only happens when the new node’s parent did not have a child before the insertion. In this case, we update the new node’s ancestors’ height all the way to the root (like the picture below), and only the new node’s ancestors have a height update. When the height changes, it means potential AVL-tree-property violations may happen.

Image 11

(The picture above does not violate the AVL-tree-property. However, we will handle the case that an AVL tree becomes imbalanced after the insertion in the following sections.)

The insert algorithm is modified from the regular binary search tree insert.  

  1. Insert the new node with the height 0 in the same way as the binary search tree insertion: find the proper location (i.e., the parent of the new node) to insert the new node by walking through the tree from the root and comparing the new node’s key with each node’s key along the way.
  2. Update the height and check if the violated AVL-tree-property happens by traceback from the new node to the root. While traveling back to the root, update each node’s height on the way if necessary. If we find an unbalanced node, perform certain rotation(s) to balance it. After the rotation, the insertion concludes. If no unbalanced node is found, the insertion completes after reaching the root and updating its height.
Python
def insert(self, key: Any, data: Any) -> None:
    new_node = Node(key=key, data=data)
    parent: Optional[Node] = None
    current: Optional[Node] = self.root
    while current:
        parent = current
        if new_node.key < current.key:
            current = current.left
        elif new_node.key > current.key:
            current = current.right
        else:
            raise tree_exceptions.DuplicateKeyError(key=new_node.key)
    new_node.parent = parent
    # If the tree is empty, set the new node to be the root.
    if parent is None:
        self.root = new_node
    else:
        if new_node.key < parent.key:
            parent.left = new_node
        else:
            parent.right = new_node

        # After the insertion, fix the broken AVL-tree-property.
        # If the parent has two children after inserting the new node,
        # it means the parent had one child before the insertion.
        # In this case, neither AVL-tree property breaks nor
        # heights update requires.
        if not (parent.left and parent.right):
            self._insert_fixup(new_node)

Fixup

As the rotations section mentioned, there are four potential unbalanced cases. And we perform specific rotations to restore the AVL-tree-property. After we fix the AVL-tree-property, the AVL tree becomes balanced. No need to keep tracking the ancestors’ balance factor. The following subsections describe the fixup for each case.

Left-Left Case

Image 12

In the picture above, we add node 19. After the insertion, we start checking the balance factor of node 19’s ancestors from node 17 and up. And then, we find that node 37’s balance factor is unbalanced and left-heavy. We also need to check its child with a higher height to determine which rotation to perform. In the insertion case, the child with a higher height must happen in the path that contains the new node, i.e., node 23 in this case, since node 23’s balance factor is 1 after the insertion. We identify that this is a left-left case.

So we perform the right rotation on node 37. After the right rotation, node 23 becomes the new root, and node 37 becomes node 23’s right child. Note that only the nodes involved in the rotation have height and balance factor change. So, in this case, only node 23 and node 37 have height, and the balance factor changed.

Why There Is No Need to Keep Checking the Ancestors’ Balance Factor After the Rotation?

The height of node 37 was 2, and node 23’s height was 1 before the insertion. After the insertion, node 37’s height became 3, and node 23’s height became 2. Then we performed the rotation. After the rotation, node 23 took node 37’s original position, and node 23’s height became 2, and node 37’s height became 1. Therefore, the height of the same position remains the same, i.e.,  the root (node 37) had height 2 before the insertion; the root (node 23) still has height 2 after the rotation. In other words, if node 37 has a parent (say x) before the rotation, after the rotation, x‘s left child becomes node 23, but x’s height is not affected. So, we don’t need to keep checking the x’s ancestors’ heights after the rotation.

Left-Right Case

Image 13

After we identify this is a left-right case by checking the balance factor of the bottommost unbalanced node (node 37) and its child (node 23) with higher height, we perform the left rotation on node 23, so it becomes a left-left case. Then, we perform the right rotation on the unbalanced node (node 37). After that, we restore the violated AVL-tree-property.

Right-Left Case

Image 14

This case is symmetric to the left-right case, so we perform the right on the unbalanced node’s (node 37) right child (node 43), so it becomes the right-right case. Then, we perform the left rotation on the unbalanced node (node 37). After that, we fix the violated AVL-tree-property.

Right-Right Case

Image 15

The right-right case is symmetric to the left-left case, so we can restore its AVL-tree property by performing the left rotation.

After the fixup analysis, we can implement the _insert_fixup function like the following. Note that we always update the node’s height while we are walking up the tree and before the rotation.

Python
def _insert_fixup(self, new_node: Node) -> None:
    parent = new_node.parent

    while parent:
        parent.height = 1 + max(
            self.get_height(parent.left), self.get_height(parent.right)
        )
        grandparent = parent.parent
        # grandparent is unbalanced
        if grandparent:
            if self._get_balance_factor(grandparent) > 1:
                # Case Left-Left
                if self._get_balance_factor(parent) >= 0:
                    self._right_rotate(grandparent)
                # Case Left-Right
                elif self._get_balance_factor(parent) < 0:
                    self._left_rotate(parent)
                    self._right_rotate(grandparent)
                # Since the fixup does not affect the ancestor of the unbalanced
                # node, exit the loop to complete the fixup process.
                break
            elif self._get_balance_factor(grandparent) < -1:
                # Case Right-Right
                if self._get_balance_factor(parent) <= 0:
                    self._left_rotate(grandparent)
                # Case Right-Left
                elif self._get_balance_factor(parent) > 0:
                    self._right_rotate(parent)
                    self._left_rotate(grandparent)
                # Since the fixup does not affect the ancestor of the unbalanced
                # node, exit the loop to complete the fixup process.
                break
        parent = parent.parent

The search function is identical to the Binary Search Tree: Search.

Delete

The basic idea of the AVL tree deletion is similar to the regular binary search tree. There are three cases of the node to be deleted: no child, only one child, and two children. We also use the same transplant method to replace the subtree rooted at the deleting_node with the subtree rooted at the replacing_node.

Transplant

The transplant method is identical to the Binary Search Tree: Delete.

Python
def _transplant(self, deleting_node: Node, replacing_node: Optional[Node]) -> None:
    if deleting_node.parent is None:
        self.root = replacing_node
    elif deleting_node == deleting_node.parent.left:
        deleting_node.parent.left = replacing_node
    else:
        deleting_node.parent.right = replacing_node

    if replacing_node:
        replacing_node.parent = deleting_node.parent

Similar to the insertion, the AVL tree deletion may update the tree height and potentially violate the AVL-tree-property, so we need to check if the AVL tree becomes unbalanced and fix it after the deletion.

The overall delete procedure is like a regular binary search tree with some modifications.

  1. Find the node to be deleted (deleting_node).
  2. If the deleting_node has no child, use the transplant method to replace the deleting_node with None. Then, perform the fixup operation.
  3. If the deleting_node has only one child, use the transplant method to replace the deleting_node with only one child. Then perform the fixup operation.
  4. If the deleting_node has two children, find the deleting_node‘s successor as the replacing_node. Then, replace the deleting_node‘s key and data with the replacing_node‘s key and data, so the deleting_node is replaced by the replacing_node but keep its original balance factor and height (i.e., no height and balance factor change means no AVL-tree-property violation). After that, delete the replacing_node, which is the same as either step 2 (no child) or step 3 (only one child).

To make the implementation clear, we define the _delete_no_child method for the case that the node to be deleted has no child and the _delete_one_child method for the node to be deleted has only one child. If the node is deleted with two children, we can reuse the _delete_no_child and _delete_one_child accordingly. Furthermore, since the node to be deleted has two children using _delete_no_child and _delete_one_child, only these two methods need to call the _delete_fixup function to fix unbalanced nodes. Thus, we can implement the delete, _delete_no_child, and _delete_one_child functions as the following:

Python
def delete(self, key: Any) -> None:
    if self.root and (deleting_node := self.search(key=key)):
        # Case: no child
        if (deleting_node.left is None) and (deleting_node.right is None):
            self._delete_no_child(deleting_node=deleting_node)
        # Case: Two children
        elif deleting_node.left and deleting_node.right:
            replacing_node = self.get_leftmost(node=deleting_node.right)
            # Replace the deleting node with the replacing node,
            # but keep the replacing node in place.
            deleting_node.key = replacing_node.key
            deleting_node.data = replacing_node.data
            if replacing_node.right:  # The replacing node cannot have left child.
                self._delete_one_child(deleting_node=replacing_node)
            else:
                self._delete_no_child(deleting_node=replacing_node)
        # Case: one child
        else:
            self._delete_one_child(deleting_node=deleting_node)

def _delete_no_child(self, deleting_node: Node) -> None:
    parent = deleting_node.parent
    self._transplant(deleting_node=deleting_node, replacing_node=None)
    if parent:
        self._delete_fixup(fixing_node=parent)

def _delete_one_child(self, deleting_node: Node) -> None:
    parent = deleting_node.parent
    replacing_node = (
        deleting_node.right if deleting_node.right else deleting_node.left
    )
    self._transplant(deleting_node=deleting_node, replacing_node=replacing_node)
    if parent:
        self._delete_fixup(fixing_node=parent)

Fixup

We know the delete operation may change the height and violate the AVL-tree-property. Like the insertion, there are four potential unbalanced cases. And we perform specific rotations to restore the AVL-tree-property. We also start the fixup procedure from the bottommost unbalanced node. The bottommost unbalanced node must be one of the ancestors of the node to be deleted. However, unlike the insertion fixup, the unbalanced balance factor may propagate above the node we perform the rotations on. Therefore, after we restore the bottommost unbalanced node, we need to check its parent. If its parent becomes unbalanced, fix it. Repeat this process until we reach the root, and the root is also balanced.

No Child

Like we mentioned in the rotations section, four cases could violate the AVL-tree-property. The following picture shows these four cases and how to restore their balance factor if the node to be deleted has no child.

Image 16

One Child

The following two pictures demonstrate the four cases.

The unbalanced node is left-heavy.

Image 17

The unbalanced node is right-heavy.

Image 18

Two Children

Like other binary trees’ deletion, we can see the two children case as two sub-cases: the replacing node is the deleting_node‘s direct child, and the replacing_node is the deleting_node’s leftmost node. In either case, we can transfer the two children case to either the no-child case or the one-child case by replacing the deleting_node with the replacing_node but keep the original height and balance factor. By doing that, we do not change the height and balance factor. After that, we delete the replacing_node, and it becomes either the no-child case or the one-child case. The following pictures show how does the two children deletion works and how to fix its unbalance.

The replacing_node is the deleting_node’s direct child.

Image 19

The replacing_node is the deleting_node’s leftmost node.

Image 20

The implementation of the complete fixup operation is the following. Similar to the insertion, we update the node’s height while walking up the tree.

Python
def _delete_fixup(self, fixing_node: Node) -> None:
    while fixing_node:
        fixing_node.height = 1 + max(
            self.get_height(fixing_node.left), self.get_height(fixing_node.right)
        )

        if self._get_balance_factor(fixing_node) > 1:
            # Case Left-Left
            if self._get_balance_factor(fixing_node.left) >= 0:
                self._right_rotate(fixing_node)
            # Case Left-Right
            elif self._get_balance_factor(fixing_node.left) < 0:
                # The fixing node's left child cannot be empty
                self._left_rotate(fixing_node.left)
                self._right_rotate(fixing_node)
        elif self._get_balance_factor(fixing_node) < -1:
            # Case Right-Right
            if self._get_balance_factor(fixing_node.right) <= 0:
                self._left_rotate(fixing_node)
            # Case Right-Left
            elif self._get_balance_factor(fixing_node.right) > 0:
                # The fixing node's right child cannot be empty
                self._right_rotate(fixing_node.right)
                self._left_rotate(fixing_node)

        fixing_node = fixing_node.parent

Auxiliary Functions

The auxiliary functions, such as get the leftmost node and get the successor of a node, are identical to the Binary Search Tree: Auxiliary Functions, and the implementation is available at the Github repository. The only auxiliary function different from the binary search tree is the function to get the height.

Get the Height

Since each node has its height stored, to get the node’s height becomes very straightforward: just return the height.

Python
@staticmethod
def get_height(node: Optional[Node]) -> int:
    if node:
        return node.height
    # None has height -1
    return -1

Traversal

Although the AVL tree node has one additional field (i.e., height) than the normal binary search tree node, we can still use the exact implementation we did in Binary Tree Traversal to traverse an AVL tree. The only modification we need to make is to add the AVL tree as a supported type.

Python
# Alisa for the supported node types. For type checking.
SupportedNode = Union[None, binary_search_tree.Node, avl_tree.Node]

SupportedTree = Union[binary_search_tree.BinarySearchTree, avl_tree.AVLTree]
"""Alisa for the supported tree types. For type checking."""

(See traversal.py for the complete source code.)

The supported type is for type checking, as we discussed at Binary Tree Traversal: Function Interface.

Test

As always, we should have unit tests for our code as much as possible. Check test_avl_tree.py for the complete unit test.

Analysis

AVL tree is a self-balancing binary search tree, and it has height O(lg n), where n is the number of nodes (This can be proved by leveraging the Fibonacci number. See AVL Tree Wikipedia for more detail). Therefore, the time complexity of an AVL tree can be summarized in the table below:

Image 21

Example

Like the red-black tree, the AVL tree is widely used in software programs because of its self-balancing ability. For example, this section uses the AVL tree we implement here to implement a key-value Map.

Python
from typing import Any, Optional

from forest.binary_trees import avl_tree
from forest.binary_trees import traversal

class Map:
    """Key-value Map implemented using AVL Tree."""

    def __init__(self) -> None:
        self._avlt = avl_tree.AVLTree()

    def __setitem__(self, key: Any, value: Any) -> None:
        """Insert (key, value) item into the map."""
        self._avlt.insert(key=key, data=value)

    def __getitem__(self, key: Any) -> Optional[Any]:
        """Get the data by the given key."""
        node = self._avlt.search(key=key)
        if node:
            return node.data
        return None

    def __delitem__(self, key: Any) -> None:
        """Remove a (key, value) pair from the map."""
        self._avlt.delete(key=key)

    def __iter__(self) -> traversal.Pairs:
        """Iterate the data in the map."""
        return traversal.inorder_traverse(tree=self._avlt)

    @property
    def empty(self) -> bool:
        """Return `True` if the map is empty; `False` otherwise."""
        return self._avlt.empty

if __name__ == "__main__":

    # Initialize the Map instance.
    contacts = Map()

    # Add some items.
    contacts["Mark"] = "mark@email.com"
    contacts["John"] = "john@email.com"
    contacts["Luke"] = "luke@email.com"

    # Retrieve an email
    print(contacts["Mark"])

    # Delete one item.
    del contacts["John"]

    # Check the deleted item.
    print(contacts["John"])  # This will print None

    # Iterate the items.
    for contact in contacts:
        print(contact)

(The complete example is available at avlt_map.py.)

Summary

Like the red-black tree, the AVL tree introduces some complexity on insertion and deletion operations to keep its balance, but the AVL tree’s self-balancing ability provides O(lg n) time complexity for basic operations, which is better performance than the regular binary search tree.

(Originally published at Shun's Vineyard on June 6, 2021.)

History

  • 7th June, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)