Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C

Bubble Tree Balanced Binary Tree (for all)

3.30/5 (8 votes)
17 May 2016CPOL5 min read 13.5K  
Fast binary tree algorithmus

Introduction

In computer science, one of the most important data structures for insert, select and update is the binary tree. In the single insert and lookup probably (for sure), a hashtable is faster, then why is a BST more important?

Because a Balanced binary tree allows you to perform query also with comparer, we can select greater from some value, smaller from some value or between some other values. So it is a powerful instrument to handle data, a big amount of data.

But to achieve his job efficiently, the binary tree must be balanced, or we just simulate a quasi linear research in an array. I do not want to explain what a balanced binary tree is. You can find better explanation here. Wikipedia is always the better start for learning something.

Background

Let us do some consideration:

A perfectly balanced tree with its first n-1 depth of leafs plenty filled is just an academic exercise. Practically, it is hard to keep a binary tree perfectly balanced, if "not optimized for read". One of the extreme algorithm is to put the whole tree in an array, that now is sorted, and then insert it starting from the halves. They justify this algorithm with "it is optimized for read". I would tell them, leave it in the array, perform a perfect normal binary search, and if you insert something new, shift the array or if you insert a lot of things, call qsort(), I think it is the better solution in this case.

A binary tree must be fast to insert, must be fast to delete and must be fast to select. If you think of an exercise to do with a binary tree, take a look at my article here. Instead of a hashtable, you use a binary tree. This is the purpose of a balanced binary tree. In this case, it must be fast to insert, delete and find. No weaknesses are allowed. Between you and the memory, there is a datastructure. If you don't remind there is a datastructure, it means it is a good algorithm.

Let us consider something else:

What are the costs of a search?

time=(comparison * depth)

So if you program in C, you know that for the comparison of strings, you must call a function called strcmp(), and if you do not program in C, you should know that your compiler calls a similar function at your place. So the comparison costs for the tree are perfectly equitable to a level of depth more.

The algorithm that may be reached also a quasi-perfect balance, but they must perform two (or more) comparisons for each leaf, they act like they had twice (or more) depths.

So the target is: a perfect binary with just right or left leaf, that with a comparison can run inside its braces, but almost perfectly balanced.

It is a problem with an optimus. And we must count each variable of the system.

The trick is perform rotations: but when, why and where it was a mystery.

After some experiments, I found it.

Using the Code

Let us build our tree:

C++
struct bt_node
{
 	unsigned index;
	void *value;
	long left;
	long right;
	unsigned depth;
}
struct bt_tree
{
 	unsigned n;
	unsigned buffer;
	long current_root;
	struct bt_node* nodes;
	i_stack freeslots;
}
//

The structures are quite easy to understand. I need to explain depth. After several attempts, I found out that the best way to balance the tree is in function of the depth differences. The depth is reversed. That means that the leaves have depth= 1 and the root contains the whole depth. This is because if I perform a rotation, it is more easy to propagate the new depth at just my parent(s), instead of my 200.000 leafs. And if this seems like a good reason, in fact, it is nothing if compared with the true reason. If I start from the root, all brothers have some depth, and I lose my parameter to trigger the rotation. This means that we don't know when the tree requires to be balanced. Practically, we are blind in a motorway. For this reason, we set depth=1 for leaves and propagate ascendants to the new depth.

The add function is here:

C++
// the function add() is the same for each case, and data type, it works just 
//with slots and returns the index of the new value if inserted
// otherwise the index of the existing value. The handling of the "error" or whatelse
// get done from the add_if_not_exists or other interface function (in my version I add also a pointer to function, one
//for compare, for simplicity I don't put it here)
long add(struct bt_tree*bt, unsigned root, unsigned newitemix);
// the function below save first the data in a new slot. this get done
//because I would generalize the function of the tree. So, when found, it 
//finds all the information in the proper slot. Without to modify the 
//parameter list of the core functions. So with different data type I just have to modify my 
//interface function. 

unsigned add_if_not_exists(struct bt_tree* bt, char *name, void*value)
{
	//the function get_new_slot() pop a value from freeslots, if it has
	// otherwise increase the counter of the tree, check if the buffer
	//can satisfy the new request, otherwise realloc() the vector
	 res=get_new_slot(bt->freeslots);
	unsigned tmp=add(bt, cur,res);	
	bt->node[res].name=name; //I just paste the address of the name 
				    //i don't duplicate yet 
	bt->node[res].value=value;
	if (tmp==cur)	//successfully inserted
	{
		bt->node[res].name=strdup(name); 
		bt->node[res].value=value;	

	}
	else 
	{  //already existing
		stack_push(bt->freeslots,res);
		bt->node[res].name=NULL; 
		bt->node[res].value=NULL;
	}
	return tmp;
}
//this is a core function, it returns "newitemix" 
//if successfully inserted otherwise the index of the existing item
//with the same key. What do with it is a matter of interface function (if raise an exception or
//sum, or just add_if_not_exists) 
long add(struct bt_tree*bt,  unsigned newitemix)
{
	int cmp;
	long root=bt->cuurent_root;
	while (root>0)
	{
		if ((cmp=strcmp(bt->nodes[root].name,bt->nodes[newitemix].name)==0)
		{
			return root;
		}else if(cmp<0)
		{
			if(bt->nodes[root].right<0)
			{
				bt->nodes[root].right=newitemix;
				bt->nodes[newitemix].parent=root;
				break;
			}
			else 
				root=bt->nodes[root].right;
		}
		else
		{
			if(bt->nodes[root].left<0)
			{
				bt->nodes[root].left=newitemix;
				bt->nodes[newitemix].parent=root;
				break;			
			}
			else 
				root=bt->nodes[root].left;
		}

	}

	propagate_depth(bt,newitemix);
	bubble(bt, newitemix);
	return newitemix; 
}

//

The function of insertion is a perfect binary search. Faster is not possible to implement I think. There is something after the main loop. First a propagate_depth() to propagate the depth to ascendents. The new inserted leaf probably increases the depth of the tree. And then bubble(). The function that gives the name to the algorithm. Here they are:

C++
void propagate_depth(bt,newitemix)
{
   assert(newitemix>=0);
   unsigned i, depth;
   for(i=newitemix;i>=0;i=bt->nodes[i].parent)
   {
          depth=bt->nodes[i].right<0?0:bt->nodes[bt->nodes[i].right].depth;
          depth=bt->nodes[i].left<0?depth:max(depth,bt->nodes[bt->nodes[i].left].depth);
          depth++;
          bt->nodes[i].depth=depth;
   } 
}

void bubble(struct bt_tree*bt, unsigned index)
{
	unsigned parent=bt->nodes[index].parent;
	struct bt_node*right,*left;
	while (parent>=0)
	{	right=bt->nodes[index].right<0?NULL:&bt->nodes[bt->nodes[index].right];
		left=bt->nodes[index].left<0?NULL:&bt->nodes[bt->nodes[index].left];
		if ((!left && right && right->depth>1)||
		   (!right &&  left && left->depth>1)||
		(left && right && abs(right->depth-left->depth)>1))
			rotate(bt,index);
		index=parent;
		parent=bt->nodes[index].parent;
	}
}

//

What does this function do? The first takes the greater from his leaves, increases it and propagates ascendently the depth. The second one traverses ascendently the tree, and if finds some discrepancies between the depths, calls a rotation. The rotation routine is below, I just wrote for the left side. The right is exactly mirrored.

C++
unsigned rotate(struct bt_tree*bt,unsigned ix)
{  //we save first all the neighborhood in some comfortable variables
	asser (ix>=0 && ix<bt->n);
   struct bt_node *me,*parent,*left,*right;
  char side; //and defines comfortable which side my parent keeps me
  me=&bt->nodes[ix];
  parent=me->parent<0?NULL:&bt->nodes[me->parent];
  left=me->left<0?NULL:&bt->nodes[me->left];
  right=me->right<0?NULL:&bt->nodes[me->right];
   assert(left || right);
  if (parent)
   side=parent->left==me->index?'l':'r';
  else 
   side =0;
   if (!right || (left && left->depth>right->depth)
   {
     me->parent=left->index;
     me->left=left->right;
     left->right=me->index;
     if (me->left>=0)
        bt->nodes[me->left].parent=me->index;
     if (side='l')
        parent->left=left->index;
     else if(side='r')
        parent->right=left->index;
     else
        bt->current_root=left->index;
     propagate_depth(bt, me->index);
    }
    else
   {
      //mirror routine 

   }
}

//

Points of Interest

The points of interest are several: it is easy to implement and it is very fast. The speed comes from its lightness. The find routine gives an idea from its speed:

C++
void* find(struct bt_tree*bt,char *key)
{
	long current_root=bt->current_root;
	int cmp;	
	while (current_root>=0)
	{	if ( (cmp=strcmp(bt->nodes[current_root].name,key)==0)
			return bt->nodes[current_root].value;
		else if (cmp<0)
			current_root=bt->nodes[current_root].right;
		else
			current_root=bt->nodes[current_root].left;
	
	}	
	printf ("item %s not present in the tree\n",key);
	return NULL; 
}

//

But of course, if not right balanced, it can be fast as hell, but it doesn't helps if it has 1000000 leaves to traverse. So I did some tests: from a table I had in my program containing all the variable for the six degree of freedom of a system, it means: a, ta(derivative of a in the time:speed), tta,Rx(the couple a) b,tb,ttb,c... x...fx (force of x direction)y...z... 24 labels. Enough random and enough unbalanced in the "t" zone, to expect some apocalithical scenario. I put a double for and inserted 2400000 items in the tree. Result: the tree reaches a depth of 25 (despite the theoretic 22 ) with the 99% of leaves inside the 23th depth, and the remaining 24000 in the remaining 2 depths. I used it as allocated memory management, where the new insertion are quasi sorted, and repeats constantly this distributions. Here are some statistical tools:

C++
void  fill_ratio(struct bt_tree*bt)
{
	unsigned depth=0:	
	long i,n,buf, fix;
	buf=max(bt->n/10,100);
	n=0;
	 long long *reg= (long long*)calloc(buf,sizeof(long long));
	reg[n++]=bt->current_root; 	
	while (n>0)
	{
		depth ++;
		printf ("depth %d count %d fillratio %g\n",depth, n, (double)n/
      for (i=0;i<fix;i++)
     {
        if (bt<<depth-1)); fix="n;" for="">nodes[reg[i]].left<0 && bt->nodes[reg[i]].right<0)
			{
               if (fix<n)
               {
                 reg[i--]=reg[--fix];
                 reg[fix]=reg[--n];
               }
               else
               {
                 reg[i--]=reg[--fix];
                 n--;
                }
				n--;
		else if (i<n) reg="">nodes[reg[i]].left<0 && bt->nodes[reg[i]].right>0)
				reg [i]=bt->nodes[reg[i]].right;
			else if (bt->nodes[reg[i]].left>=0 && bt->nodes[reg[i]].right<0)
				reg [i]=bt->nodes[reg[i]].left;
			else
			{
				n++;
				if (n>=buf)
				{
					buf+=buf;
					reg=(long long*)realloc(reg, buf*sizeof(long long));
				}
				reg [i]=bt->nodes[reg[i]].right;
				reg [n]=bt->nodes[reg[i]].left;
			}				
		}	
	}
}

//

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)