Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Python

Build the Forest in Python Series: AVL Tree vs Red-Black Tree

0.00/5 (No votes)
3 Jul 2021CPOL13 min read 3.5K  
Build a Python Metrics library to compare the AVL tree and the Red-Black Tree

Introduction

Being a good software engineer not only needs to know what tools (e.g., data structures and algorithms) are available but also understand how to choose the right tools. In addition, a good software engineer also requires the knowledge of how to measure if the tool we choose does work as we expect. This article is not going to build a new tree data structure like the previous articles. Instead, we will build a tool, a Python library called Metrics, that can monitor software behavior and performance, and then we can use the tool to prove our AVL tree and red-black tree work as they should be. Although we use the AVL tree and the red-black tree as an example, the idea we measure these trees can be applied to any software and real-world situation.

AVL tree and Red-Black Tree are often compared with each other because they both provide time complexity for their core operations: O(lg n), where n is the number of nodes (in the rest of the article, if not mentioned, n means the number of nodes). Many textbooks and articles have proven these two trees’ performance and complexity in theory. However, computer software (or computer science in general) is a practical subject. There are many ways to implement the same idea, algorithm, and data structure. Moreover, we also need to modify our implementation to fit our practical constraints, such as memory limitation, meaning the implementation does not strictly stick with the original definition. With this thought, the article tries to view these two trees in a more practical way — by measuring them and comparing them using the Metrics library we are going to build.

What do we measure?

The key of the AVL tree and the red-black tree to keep them balanced is rotation, so we will measure the number of rotations when insertion and deletion occur. In addition, we also want to monitor the change of tree height, so we know the trees are balanced all the time.

How to do that?

As we mentioned above, we will use the Metrics library to track our trees. If we want to measure our software’s behavior, we can inject some code into our software, and the piece of code keeps tracking the software’s behavior. Here, the Metrics library we will build is the piece of code that does the work.

AVL Tree vs. Red-Black Tree in Theory

Before we build the Metrics library, let us analyze AVL Tree and Red-Black Tee based on their definition. And we will use the conclusion as the source of truth to verify our implementation via the result generated by the Metrics library.

Search

To find a node, we need to walk the tree from its root to its leaf level in the worst case, which takes O(h) time to find the node, where h is the tree height (if not mention, h means the height).

The red-black tree ensures that no path is twice longer than other paths, whereas the AVL tree guarantees that for every node in an AVL tree, the heights of its left subtree and its right subtree differ by at most one. Therefore, the AVL tree is more balanced than the red-black tree. In other words, the AVL tree has a faster lookup than the red-black tree in general. If our use case requires many lookups, the AVL tree may be a better solution than the red-black tree.

Insert

To insert a new node, we first find the node’s parent to be deleted, which takes O(h) time, and then perform rotations to keep the tree balance which takes constant time. However, the red-black tree requires fewer rotations than the AVL tree, so we can say that the red-black tree has faster insertion than the AVL tree.

Delete

To delete a node, we first spend O(h) time to find the node to be deleted and then perform rotations to keep the tree balanced. Like the insertion, the AVL tree takes more rotations to balance it than the red-black tree. Worse than the insertion, the red-black tree has a constant maximum number of rotations, but the AVL tree can take up to O(h) times rotation. In the worst case, the AVL tree may need to perform rotations from the node that violates the AVL-tree-property up to the root. Therefore, the red-black tree has faster deletion than the AVL tree. If the scenario is a lot of insertions and deletions, the red-black tree has better performance than the AVL tree.

Space Usage

Both trees have O(n) space usage. They both store extra information in their nodes than the general binary search tree node. The red-black tree stores color information in its nodes, whereas the AVL tree stores height of each node in the nodes. However, the actual space cost is a little bit different between these two trees. Since the color can only be red or black, it can be stored as a Boolean value or even just a bit (i.e., 0 or 1).

On the contrary, a node’s height is an integer, which means we have to store it as an integer. This fact does not make too much difference in Python implementation because the Boolean type and the integer type have the same size. We can use the built-in getsizeof function to check their size in bytes.

Python
>>> import sys
>>> color: bool = True  # True indicates Red; False means Black
>>> height: int = 10  # The tree height is an integer
>>> sys.getsizeof(color)
28
>>> sys.getsizeof(height)
28

However, in other languages, the size could be quite different. For example, in C++, we can store color as a char value whose size is 8 bits and store height as a 32 bits integer ( int32_t) (See Fundamental types for more detail about C++ types). Therefore, an AVL tree node could consume four times more space than a red-black tree node in C++ implementation.

Since the space usage does not make much difference in Python implementation, we will not tract the space usage in our experiment.

Project Setup

In addition to the same style and assumption (Python 3.9 or newer) used in the Build the Forest Series, we also assume the use case for the AVL tree and the red-black tree is the single thread, and the whole trees can fit in RAM. This article adds two modules to our project: metrics.py, the Metrics library, and test_metrics.py as its unit tests. After adding these two files, our project layout becomes the following:

forest-python
├── forest
│   ├── __init__.py
│   ├── binary_trees
│   │   ├── __init__.py
│   │   ├── avl_tree.py
│   │   ├── binary_search_tree.py
│   │   ├── double_threaded_binary_tree.py
│   │   ├── red_black_tree.py
│   │   ├── single_threaded_binary_trees.py
│   │   └── traversal.py
│   ├── metrics.py
│   └── tree_exceptions.py
└── tests
    ├── __init__.py
    ├── conftest.py
    ├── test_avl_tree.py
    ├── test_binary_search_tree.py
    ├── test_double_threaded_binary_tree.py
    ├── test_metrics.py
    ├── test_red_black_tree.py
    ├── test_single_threaded_binary_trees.py
    └── test_traversal.py

Metrics

The Metrics project inspires the Metrics library we are going to build, and it has two major components: MetricRegistry and supported types of metrics, including Counter and Histogram.

MetricRegistry

MetricRegistry is the starting point for using the Metrics library, and it is a collection of all the metrics used to track the application. So, for example, if we need to track one operation, says the number of rotations, we can define a Counter type metric for monitoring the rotations and register the Counter metric to the MetricRegistry with a readable name avlt.rotations. With this approach, we can manage the metrics easily, and a client code can retrieve any metric through the MetricRegistry via the metric’s name.

The implementation is straightforward. We use a Python dictionary to keep the registered metrics. The key is the metric’s name, and the value is the metric instance. We also define MetricType for the supported types for type safety.

Python
MetricType = Union[Counter, Histogram]
"""Alias for the supported metric types."""

class MetricRegistry:
    """A registry for metric instances."""

    def __init__(self) -> None:
        self._registry: Dict[str, MetricType] = dict()

    def register(self, name: str, metric: MetricType) -> None:
        """Given a metric, register it under the given name.

        Parameters
        ----------
        name: `str`
            The name of the metric

        metric: `MetricType`
            The type of the metric
        """
        self._registry[name] = metric

    def get_metric(self, name: str) -> MetricType:
        """Return the metric by the given name.

        Parameters
        ----------
        name: `str`
            The name of the metric

        Returns
        -------
        `MetricType`
            The metric instance by the given name.
        """
        return self._registry[name]

Counter

A Counter is a simple metric that counts something, i.e., increments and decrements the counter. Anything that needs to count can use a Counter -for example, the number of rotations.

Python
class Counter:
    """An incrementing and decrementing counter metric."""

    def __init__(self) -> None:
        self._count = 0

    def increase(self, n: int = 1) -> None:
        """Increment the counter.

        Parameters
        ----------
        n: `int`
            The count to be increased.
        """
        self._count += n

    def decrease(self, n: int = 1) -> None:
        """Decrement the counter.

        Parameters
        ----------
        n: `int`
            The count to be decreased.
        """
        self._count -= n

    @property
    def count(self) -> int:
        """Return the current count."""
        return self._count

Histogram

By definition, a histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). The Histogram in the Metrics library measures the distribution of a data set, including the min, mean, max, standard deviation, and percentile.

Since we assume that the software we measure can be fit in RAM, we can use Python List to hold all the values and use Numpy to compute the data set’s distribution.

Python
import numpy as np

class Histogram:
    """A metric which calculates the distribution of a value."""

    def __init__(self) -> None:
        self._values: List[int] = list()

    def update(self, value: int) -> None:
        """Add a recorded value.

        Parameters
        ----------
        value: `int`
            value to be updated
        """
        self._values.append(value)

    def report(self) -> Dict:
        """Return the histogram report."""
        array = np.array(self._values)
        return {
            "min": array.min(),
            "max": array.max(),
            "medium": np.median(array),
            "mean": array.mean(),
            "stdDev": array.std(),
            "percentile": {
                "75": np.percentile(array, 75),
                "95": np.percentile(array, 95),
                "99": np.percentile(array, 99),
            },
        }

(The complete implementation is available at metrics.py)

Test

As always, we should have unit tests for our code as much as possible. Check test_metrics.py for the complete unit test.

Place Metrics in the trees

Red-Black Tree

After we implemented the Metrics library, we can place metrics in the red-black tree class. Since we want to measure the number of rotations, we use a Counter for it. To calculate the change of tree height, we can use a Histogram. Besides, we need a MetricRegistry for the Counter and Histogram to register. MetricRegistry supports managing the metrics in the entire program, so the higher level should pass the MetricRegistry instance to the RBTree class. With that, we modify the RBTree’s __init__ function like the following.

Python
class RBTree:
    def __init__(self, registry: Optional[metrics.MetricRegistry] = None) -> None:
        self._NIL: Leaf = Leaf()
        self.root: Union[Node, Leaf] = self._NIL
        self._metrics_enabled = True if registry else False
        if self._metrics_enabled and registry:
            self._rotate_counter = metrics.Counter()
            self._height_histogram = metrics.Histogram()
            registry.register(name="rbt.rotate", metric=self._rotate_counter)
            registry.register(name="rbt.height", metric=self._height_histogram)

(See red_black_tree.py for the complete implementation)

One thing to be aware of is that although it is essential to get insight into software behavior via a method such as the Metrics library, it does come with the cost. It adds complexity to the code and may slow down its performance because it needs to do more things. Therefore, we set the registry parameter optional. When we do not measure the code, we can disable the metrics to avoid potential side-effect.

Counter for Rotation

As the name counter imply, we need to increment the counter when rotation occurs. Therefore, we call increase() function at the end of the rotation functions like the following.

Python
def _left_rotate(self, node_x: Node) -> None:
    node_y = node_x.right  # Set node y
    if isinstance(node_y, Leaf):  # Node y cannot be a Leaf
        raise RuntimeError("Invalid left rotate")
    # Turn node y's subtree into node x's subtree
    node_x.right = node_y.left
    if isinstance(node_y.left, Node):
        node_y.left.parent = node_x
    node_y.parent = node_x.parent
    # If node's parent is a Leaf, node y becomes the new root.
    if isinstance(node_x.parent, Leaf):
        self.root = node_y
    # Otherwise, update node x's parent.
    elif node_x == node_x.parent.left:
        node_x.parent.left = node_y
    else:
        node_x.parent.right = node_y
    node_y.left = node_x
    node_x.parent = node_y

    if self._metrics_enabled:
        self._rotate_counter.increase()

def _right_rotate(self, node_x: Node) -> None:
    node_y = node_x.left  # Set node y
    if isinstance(node_y, Leaf):  # Node y cannot be a Leaf
        raise RuntimeError("Invalid right rotate")
    # Turn node y's subtree into node x's subtree
    node_x.left = node_y.right
    if isinstance(node_y.right, Node):
        node_y.right.parent = node_x
    node_y.parent = node_x.parent
    # If node's parent is a Leaf, node y becomes the new root.
    if isinstance(node_x.parent, Leaf):
        self.root = node_y
    # Otherwise, update node x's parent.
    elif node_x == node_x.parent.right:
        node_x.parent.right = node_y
    else:
        node_x.parent.left = node_y
    node_y.right = node_x
    node_x.parent = node_y

    if self._metrics_enabled:
        self._rotate_counter.increase()

(See red_black_tree.py for the complete implementation)

Histogram for Height

The height may change after we perform insertion and deletion, and what we want to track is the trend of tree height. Therefore, we can update the Histogram at the end of insertion and deletion like below.

Python
def insert(self, key: Any, data: Any) -> None:
    # Color the new node as red.
    new_node = Node(key=key, data=data, color=Color.RED)
    parent: Union[Node, Leaf] = self._NIL
    current: Union[Node, Leaf] = self.root
    while isinstance(current, Node):  # Look for the insert location
        parent = current
        if new_node.key < current.key:
            current = current.left
        elif new_node.key > current.key:
            current = current.right
        else:
            raise tree_exceptions.DuplicateKeyError(key=new_node.key)
    new_node.parent = parent
    # If the parent is a Leaf, set the new node to be the root.
    if isinstance(parent, Leaf):
        new_node.color = Color.BLACK
        self.root = new_node
    else:
        if new_node.key < parent.key:
            parent.left = new_node
        else:
            parent.right = new_node
        # After the insertion, fix the broken red-black-tree-properties.
        self._insert_fixup(new_node)

    if self._metrics_enabled:
        self._height_histogram.update(value=self.get_height(self.root))

def delete(self, key: Any) -> None:
    if (deleting_node := self.search(key=key)) and (
        isinstance(deleting_node, Node)
    ):
        original_color = deleting_node.color
        # Case 1: no children or Case 2a: only one right child
        if isinstance(deleting_node.left, Leaf):
            replacing_node = deleting_node.right
            self._transplant(
                deleting_node=deleting_node, replacing_node=replacing_node
            )
            # Fixup
            if original_color == Color.BLACK:
                if isinstance(replacing_node, Node):
                    self._delete_fixup(fixing_node=replacing_node)
        # Case 2b: only one left child
        elif isinstance(deleting_node.right, Leaf):
            replacing_node = deleting_node.left
            self._transplant(
                deleting_node=deleting_node, replacing_node=replacing_node
            )
            # Fixup
            if original_color == Color.BLACK:
                if isinstance(replacing_node, Node):
                    self._delete_fixup(fixing_node=replacing_node)
        # Case 3: two children
        else:
            replacing_node = self.get_leftmost(deleting_node.right)
            original_color = replacing_node.color
            replacing_replacement = replacing_node.right
            # The replacing node is not the direct child of the deleting node
            if replacing_node.parent == deleting_node:
                if isinstance(replacing_replacement, Node):
                    replacing_replacement.parent = replacing_node
            else:
                self._transplant(replacing_node, replacing_node.right)
                replacing_node.right = deleting_node.right
                replacing_node.right.parent = replacing_node
            self._transplant(deleting_node, replacing_node)
            replacing_node.left = deleting_node.left
            replacing_node.left.parent = replacing_node
            replacing_node.color = deleting_node.color
            # Fixup
            if original_color == Color.BLACK:
                if isinstance(replacing_replacement, Node):
                    self._delete_fixup(fixing_node=replacing_replacement)

        if self._metrics_enabled:
            self._height_histogram.update(value=self.get_height(self.root))

(See red_black_tree.py for the complete implementation)

With the Histogram in place, after some insertions and deletions, we can get the report like the maximum, the average, and the medium tree height. By combining these numbers with standard deviation and percentile also provided by the Histogram, we can clearly picture how the height change. For example, if we have a 99% percentile close to the median with a relatively low standard deviation, we can say the tree height is pretty consistent. In other words, the tree is fairly balanced.

AVL Tree

Similar to the red-black tree, we use a Counter to track rotation and a Histogram to monitor the tree height.

Python
class AVLTree:
    def __init__(self, registry: Optional[metrics.MetricRegistry] = None) -> None:
        self.root: Optional[Node] = None
        self._metrics_enabled = True if registry else False
        if self._metrics_enabled and registry:
            self._rotate_counter = metrics.Counter()
            self._height_histogram = metrics.Histogram()
            registry.register(name="avlt.rotate", metric=self._rotate_counter)
            registry.register(name="avlt.height", metric=self._height_histogram)

(See avl_tree.py for the complete implementation)

We also increase the Counter at the end of the rotation functions and update the Histogram after insertion and deletion.

Python
def insert(self, key: Any, data: Any) -> None:
    new_node = Node(key=key, data=data)
    parent: Optional[Node] = None
    current: Optional[Node] = self.root
    while current:
        parent = current
        if new_node.key < current.key:
            current = current.left
        elif new_node.key > current.key:
            current = current.right
        else:
            raise tree_exceptions.DuplicateKeyError(key=new_node.key)
    new_node.parent = parent
    # If the tree is empty, set the new node to be the root.
    if parent is None:
       self.root = new_node
    else:
        if new_node.key < parent.key:
            parent.left = new_node
        else:
            parent.right = new_node
        if not (parent.left and parent.right):
            self._insert_fixup(new_node)

    if self._metrics_enabled:
        self._height_histogram.update(value=self.get_height(self.root))

def delete(self, key: Any) -> None:
    if self.root and (deleting_node := self.search(key=key)):
        # Case: no child
        if (deleting_node.left is None) and (deleting_node.right is None):
            self._delete_no_child(deleting_node=deleting_node)
        # Case: Two children
        elif deleting_node.left and deleting_node.right:
            replacing_node = self.get_leftmost(node=deleting_node.right)
            # Replace the deleting node with the replacing node,
            # but keep the replacing node in place.
            deleting_node.key = replacing_node.key
            deleting_node.data = replacing_node.data
            if replacing_node.right:  # The replacing node cannot have left child.
                self._delete_one_child(deleting_node=replacing_node)
            else:
                self._delete_no_child(deleting_node=replacing_node)
        # Case: one child
        else:
            self._delete_one_child(deleting_node=deleting_node)

        if self._metrics_enabled:
            self._height_histogram.update(value=self.get_height(self.root))

def _left_rotate(self, node_x: Node) -> None:
    node_y = node_x.right  # Set node y
    if node_y:
        # Turn node y's subtree into node x's subtree
        node_x.right = node_y.left
        if node_y.left:
            node_y.left.parent = node_x
        node_y.parent = node_x.parent
        # If node's parent is a Leaf, node y becomes the new root.
        if node_x.parent is None:
            self.root = node_y
        # Otherwise, update node x's parent.
        elif node_x == node_x.parent.left:
            node_x.parent.left = node_y
        else:
            node_x.parent.right = node_y
        node_y.left = node_x
        node_x.parent = node_y
        node_x.height = 1 + max(
            self.get_height(node_x.left), self.get_height(node_x.right)
        )
        node_y.height = 1 + max(
            self.get_height(node_y.left), self.get_height(node_y.right)
        )

        if self._metrics_enabled:
            self._rotate_counter.increase()

def _right_rotate(self, node_x: Node) -> None:
    node_y = node_x.left  # Set node y
    if node_y:
        # Turn node y's subtree into node x's subtree
        node_x.left = node_y.right
        if node_y.right:
            node_y.right.parent = node_x
        node_y.parent = node_x.parent
        # If node's parent is a Leaf, node y becomes the new root.
        if node_x.parent is None:
            self.root = node_y
        # Otherwise, update node x's parent.
        elif node_x == node_x.parent.right:
            node_x.parent.right = node_y
        else:
            node_x.parent.left = node_y
        node_y.right = node_x
        node_x.parent = node_y
        node_x.height = 1 + max(
            self.get_height(node_x.left), self.get_height(node_x.right)
        )
        node_y.height = 1 + max(
            self.get_height(node_y.left), self.get_height(node_y.right)
        )

        if self._metrics_enabled:
            self._rotate_counter.increase()

(See avl_tree.py for the complete implementation)

Binary Search Tree

Although this article focuses on the AVL tree and the red-black tree, it is easy to apply the same method to the binary search tree. Since the binary search tree does not have a rotation operation, we only need a Histogram to measure the tree height. In the next section, we can add the binary search tree into the comparison to better understand how self-balanced trees perform better than the binary search tree.

(See binary_search_tree.py for the implementation by adding metrics to the binary search tree)

Compare Red-Black Tree and the AVL Tree with Metrics

Experiment 1

In the first experiment, we use the Python built-in random sampling function ( random.sample) to generate a list ( insert_data) with 1000 unique random integers chosen from range 1 to 2000. And use the same random.sample function to randomize the insert_data as the delete_data. So the order to delete nodes from the trees will be random.

Python
import random

insert_data = random.sample(range(1, 2000), 1000)
delete_data = random.sample(insert_data, 1000)

Then, we define a MetricRegistry for metrics and pass it when we define the trees. Here we also define a binary search tree so we can see the different performance between a binary search tree and self-balancing trees.

Python
from forest import metrics
from forest.binary_trees import avl_tree
from forest.binary_trees import binary_search_tree
from forest.binary_trees import red_black_tree

registry = metrics.MetricRegistry()
bstree = binary_search_tree.BinarySearchTree(registry=registry)
avltree = avl_tree.AVLTree(registry=registry)
rbtree = red_black_tree.RBTree(registry=registry)

Once the trees and metrics are ready, we first insert the insert_data to the trees, and then delete nodes from the trees in the order of delete_data.

Python
for key in insert_data:
    bstree.insert(key=key, data=str(key))
    avltree.insert(key=key, data=str(key))
    rbtree.insert(key=key, data=str(key))

for key in delete_data:
    bstree.delete(key=key)
    avltree.delete(key=key)
    rbtree.delete(key=key)

Finally, we can retrieve the metrics by their names and print out the number of rotations and the height’s histogram.

Python
print("Binary Search Tree:")
bst_report = registry.get_metric(name="bst.height").report()
print(f"  Height:   {bst_report}")

print("AVL Tree:")
avlt_rotation_count = registry.get_metric(name="avlt.rotate").count
print(f"  Rotation: {avlt_rotation_count}")
avlt_report = registry.get_metric(name="avlt.height").report()
print(f"  Height:   {avlt_report}")

print("Red-Black Tree")
rbt_rotation_count = registry.get_metric(name="rbt.rotate").count
print(f"  Rotation: {rbt_rotation_count}")
rbt_repot = registry.get_metric(name="rbt.height").report()
print(f"  Height:   {rbt_repot}")

(The complete code is available at avlt_vs_rbt.py)

Since the input data is randomly generated, the result may be slightly different every time, but the result should be very close. The output may look like the following:

Python
Binary Search Tree:
  Height:   {'min': 0, 'max': 21, 'medium': 17.0, 'mean': 16.52976488244122, 'stdDev': 3.621953722665703, 'percentile': {'75': 20.0, '95': 20.0, '99': 21.0}}
AVL Tree:
  Rotation: 1053
  Height:   {'min': -1, 'max': 11, 'medium': 10.0, 'mean': 9.1765, 'stdDev': 1.7370514528936674, 'percentile': {'75': 10.0, '95': 11.0, '99': 11.0}}
Red-Black Tree
  Rotation: 647
  Height:   {'min': 0, 'max': 12, 'medium': 11.0, 'mean': 9.9605, 'stdDev': 1.78295814589126, 'percentile': {'75': 11.0, '95': 12.0, '99': 12.0}}

From the output, we can see both the AVL tree and red-black tree are pretty balanced — the max height is 11 for the AVL tree and 12 for the red-black tree, and the medium and mean are also very close to their max heights. Also, their percentile is relative to the medium, and the standard deviation is relatively low. Therefore, we can say both trees have consistent heights.

On the contrary, the binary search tree has a worse tree height — 21 max and 17 medium. We know that based on the theorem we mentioned in the Binary Search Tree — Analysis: The expected height of a randomly build binary search tree on n distinct keys is O(lg n). Even if we generate the binary search tree randomly, its tree height is still worse than the AVL tree and the red-black tree. Moreover, the standard deviation of the binary search tree is more significant than the AVL tree and the red-black tree. Here, we show that the AVL tree and the red-black tree are more balanced than the binary search tree even if we randomly generate the binary search tree.

Regarding the number of rotations, the result shows that the AVL tree needs more rotations to keep balanced than the red-black tree, matching our expectations.

Experiment 2

We will be more aggressive in the second experiment than in the first experiment. We generate the insertion and deletion data randomly and perform insertion and deletion in random order. Again, we use the random.sample function to generate a list ( insert_data) with 2000 unique random integers chosen from range 1 to 3000. But this time, we generate the delete_data with 1000 unique random integers from range 1 to 3000. In this case, the insertion and deletion data are not identical. That means when we try to delete a node, the node may not exist, but that is not a problem. As long as we use the same insert_data and delete_data to test our trees in the same order, the experiment is valid. The following code shows how do we generate the test data.

Python
import random

sample_data = random.sample(range(1, 3000), 2000)
insert_data = [("insert", item) for item in sample_data]

sample_data = random.sample(range(1, 3000), 1000)
delete_data = [("delete", item) for item in sample_data]

test_data = random.sample(
    (insert_data + delete_data), len(insert_data) + len(delete_data)
)

We combine the insert_data and the delete_data to a single test_data via random.sample function, so the order will be random. Besides, each item of the insert_data and the delete_data is a pair that contains the operation type (insert or delete) and the data to be inserted or deleted. So, we know which operations, insert or delete, we need to perform when we go through the test_data with the operation type. That is how we make sure we use the same data set and same order to all three trees. The following code demonstrates the experiment.

Python
from forest import metrics
from forest.binary_trees import avl_tree
from forest.binary_trees import binary_search_tree
from forest.binary_trees import red_black_tree


registry = metrics.MetricRegistry()
bstree = binary_search_tree.BinarySearchTree(registry=registry)
avltree = avl_tree.AVLTree(registry=registry)
rbtree = red_black_tree.RBTree(registry=registry)

for operation, key in test_data:
    if operation == "insert":
        bstree.insert(key=key, data=str(key))
        avltree.insert(key=key, data=str(key))
        rbtree.insert(key=key, data=str(key))
    if operation == "delete":
        bstree.delete(key=key)
        avltree.delete(key=key)
        rbtree.delete(key=key)

print("Binary Search Tree:")
bst_report = registry.get_metric(name="bst.height").report()
print(f"  Height:   {bst_report}")

print("AVL Tree:")
avlt_rotation_count = registry.get_metric(name="avlt.rotate").count
print(f"  Rotation: {avlt_rotation_count}")
avlt_report = registry.get_metric(name="avlt.height").report()
print(f"  Height:   {avlt_report}")

print("Red-Black Tree")
rbt_rotation_count = registry.get_metric(name="rbt.rotate").count
print(f"  Rotation: {rbt_rotation_count}")
rbt_repot = registry.get_metric(name="rbt.height").report()
print(f"  Height:   {rbt_repot}")

(The complete code is available at avlt_vs_rbt_2.py)

The test data is also randomly generated so that the result may be slightly different every time. Example output may look like the following:

Python
Binary Search Tree:
  Height:   {'min': 0, 'max': 23, 'medium': 22.0, 'mean': 20.394120153387302, 'stdDev': 3.249885374276778, 'percentile': {'75': 22.0, '95': 23.0, '99': 23.0}}
AVL Tree:
  Rotation: 1455
  Height:   {'min': 0, 'max': 12, 'medium': 11.0, 'mean': 10.175117170856412, 'stdDev': 1.6120322559944114, 'percentile': {'75': 11.0, '95': 12.0, '99': 12.0}}
Red-Black Tree
  Rotation: 1136
  Height:   {'min': 0, 'max': 13, 'medium': 11.0, 'mean': 10.826161056668086, 'stdDev': 1.8772598891928807, 'percentile': {'75': 12.0, '95': 13.0, '99': 13.0}}

We got a similar result as experiment 1 — the red-black tree needs fewer rotations than the AVL tree, the AVL tree is slightly balanced than the red-black tree, and both the AVL tree and the red-black tree are much more balanced than the binary search tree.

Summary

Both AVL Tree and Red-Black Tree are wildly used in computer software. Each of them has strengths and weaknesses. Choosing the right one to fit our use cases best is critical for software development. Furthermore, knowing how to measure our choice and implementation is also essential for the software to be successful. This article introduces a method to measure software behavior using the AVL tree and the red-black tree we implemented in the previous articles as an example. Although the scope of the example in this article is small, the concept used here can be applied to any software. Hopefully, this article provides valuable knowledge so people can benefit from it.

(Originally published at Shun's Vineyard on July2, 2021)

 

 

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)