Building on My SQL Storage Engines and APIs to Conduct Complex Analysis Using SQL Query Language: Part 4

SarmadAsghar

4.60/5 (3 votes)

13 Jul 2014CPOL10 min read

12.4K

124

NOSQL data on MYSQL console

Download source - 152.6 KB

Introduction

In Part 3 of this article series, I presented the idea of mysql storage engine which exposes facebook friends on mysql console for you to conduct complex analysis. This allows doing complex analysis not only on your friends but the friends of your friends and in turns their friends and so on i.e. your complete friend’s space. You can also limit the analysis to few friends. This article series continues on, this time exposing NOSQL data on mysql console for you to conduct complex analysis. Specifically, this allows doing complex analysis on data from Parse.com.

Background

A conventional approach could be to export data from parse and insert into mysql.

One drawback is that each time you have to manually export data from parse and insert into mysql
Another drawback is that you cannot access the up to date data from parse.

Another conventional approach could be to replicate data from parse.com to mysql.

One drawback of this approach is that you have to insert and delete rows into the table which are time consuming.
Another drawback is that you need to trigger that logging application which inserts and delete rows from the table before querying the table
Another drawback is that even if you need to retrieve few rows, all data is maintained in table
Another drawback is the table may be locked during insertion or deletion which can create performance bottlenecks for the retrieval entity.
Another drawback is duplication of data
Another drawback is double storage required

Instead the proposed approach avoids these problems as follows:

No need to insert and delete rows from the table
No need to trigger the logging application before querying the table
No need to get all data when few rows are required
No bottlenecks, as there are no insertion and deletion.

ParseDotComEngine

This mysql storage engine is different from storage engine presented in part 3 because it retrieves data from parse.com and feeds to mysql. A table named Employee with engine ParseDotComEngine is created.

SQL

create table Employee
(id VARCHAR(255),
name VARCHAR(255),
salary INT unsigned
)engine=ParseDotComEngine;

When user enters a select query on Employee table, the control passes to ParseDotComEngine in rnd_init method. This method checks that table name is really “Employee”. Then it checks that where clause parse tree is present or not. If yes, then where clause parse tree is traversed to be converted to parse.com rest api calls. Otherwise all records in employee class on parse are fed to MySQL.
Before continuing to rnd_init following data structures need to be explained.

C++

vector<Employee> m_EmployeeVector; 
//(belongs to ha_example class and maintains all employees for the query)
int m_EmployeeVectorIndex; 
//(belongs to ha_example class and maintains the index to m_EmployeeVector)
struct Employee
{
	string id; 
	string name; 
	int salary;
};

C++

int ha_example::rnd_init(bool scan)
{
	DBUG_ENTER("ha_example::rnd_init");
	
	if(_stricmp(table_share->table_name.str,"Employee")==0)
	{
		THD *thd=this->ha_thd();
		SELECT_LEX *select_lex=&thd->lex->select_lex;
		m_EmployeeVectorIndex=0;

		if(select_lex->where==0)
		{
			GetEmployeesFromParse("",m_EmployeeVector);
		}
		else
		{
			stack<ParseStackElement> parseStack;
			select_lex->where->traverse_cond(My_Cond_traverser,(void*)&parseStack,Item::traverse_order::POSTFIX);
			if(!parseStack.empty()&&parseStack.size()==1)
			{
				ParseStackElement topElem=parseStack.top();
				GetEmployees(topElem.employeeWhereClauseUnitVector,m_EmployeeVector);
			}
			else
			{
				GetEmployeesFromParse("",m_EmployeeVector);
			}
		}
	}
	DBUG_RETURN(0);
}

The stack of ParseStackElement struct is created and maintained during post fix traversal of where clause parse tree.

C++

struct ParseStackElement
{
	vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector;
	Item *item;
};

The ParseStackElement contains vector of type EmployeeWhereClauseUnit:

C++

struct EmployeeWhereClauseUnit
{
	string id;
	string name;
	unsigned int salaryRangeStart;
	unsigned int salaryRangeEnd;
};

The EmployeeWhereClauseUnit is a basic unit of where clause and contains id, name, salaryRangeStart and salaryRangeEnd. For example consider following where clause:

SQL

Select * from employee where id=1 and name="sarmad" and salary between 1000000 and 10000000;

The equivalent EmployeeWhereClauseUnit is:

id=1;
pname=sarmad
salaryRangeStart=1000000
salaryRangeEnd=10000000

The rnd_init calls traverse_cond on where clause parse tree root node in order to traverse it in postfix order. The My_Cond_traverser function is specified to be called during traversal. The parse stack is passed to this function. This function checks for the following:

In operator: In this case, Item_func_in_case is called to get the set of id or name or salary specified in the “in clause” and pushed to the stack
Equality operator: In this case, Item_func_eq_case is called to get the id or name or salary specified and pushed to stack
Less or Less and equal operator: In this case, Item_func_less_case is called to get the salary range specified which starts with zero and pushed to stack
Greater or greater and equal operator: In this case, Item_func_greater_case is called to get the salary range specified which ends with 4294967295 and pushed to stack
between operator: In this case, Item_func_between_case is called to get the salary range specified and pushed to stack
And/Or operator: In this case, stack is popped until the children of item (My_Cond_traverser first argument) are coming. During pop, the vectors of type EmployeeWhereClauseUnit are Anded and Ored to get the resulting set.

C++

void My_Cond_traverser (const Item *item, void *arg)
{
	stack<ParseStackElement> *parseStack=(stack<ParseStackElement> *)arg;
	vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector;
	if(dynamic_cast<const Item_func_in*>(item))
	{
		employeeWhereClauseUnitVector=Item_func_in_case(item);
		ParseStackElement elem;
		elem.employeeWhereClauseUnitVector=employeeWhereClauseUnitVector;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
	else if(dynamic_cast<const Item_func_eq*>(item))
	{
		employeeWhereClauseUnitVector=Item_func_eq_case(item);
		ParseStackElement elem;
		elem.employeeWhereClauseUnitVector=employeeWhereClauseUnitVector;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
	else if(dynamic_cast<const Item_func_ge*>(item)||dynamic_cast<const Item_func_gt*>(item))
	{
		employeeWhereClauseUnitVector=Item_func_greater_case(item);
		ParseStackElement elem;
		elem.employeeWhereClauseUnitVector=employeeWhereClauseUnitVector;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
	else if(dynamic_cast<const Item_func_le*>(item)||dynamic_cast<const Item_func_lt*>(item))
	{
		employeeWhereClauseUnitVector=Item_func_less_case(item);
		ParseStackElement elem;
		elem.employeeWhereClauseUnitVector=employeeWhereClauseUnitVector;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
	else if(dynamic_cast<const Item_func_between*>(item))
	{
		employeeWhereClauseUnitVector=Item_func_between_case(item);
		ParseStackElement elem;
		elem.employeeWhereClauseUnitVector=employeeWhereClauseUnitVector;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
	else if(dynamic_cast<const Item_cond*>(item))
	{
		const Item_cond *itemCondC=dynamic_cast<const Item_cond*>(item);
		Item_cond *itemCond=(Item_cond *)itemCondC;
		
		vector<EmployeeWhereClauseUnit> result;
		bool isAnd=false;
		if(dynamic_cast<const Item_cond_and*>(item))
		{
			isAnd=true;
		}
		if (!parseStack->empty()&&isChildOf((Item_cond*)item,parseStack->top().item))
		{
			result=parseStack->top().employeeWhereClauseUnitVector;
			parseStack->pop();
		}
		while (!parseStack->empty()&&isChildOf((Item_cond*)item,parseStack->top().item))    
		{
			if(result.empty()&&!parseStack->top().employeeWhereClauseUnitVector.empty())
			{
				result=parseStack->top().employeeWhereClauseUnitVector;
			}
			else if(!result.empty()&&parseStack->top().employeeWhereClauseUnitVector.empty())
			{
				result=result;
			}
			else
			{
				if(isAnd)
				{
					result=And(result,parseStack->top().employeeWhereClauseUnitVector);
				}
				else
				{
					result=Or(result,parseStack->top().employeeWhereClauseUnitVector);
				}
			}
			parseStack->pop();			
		}
		ParseStackElement elem;
		elem.employeeWhereClauseUnitVector=result;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
}

The following function extracts the id or name or salary entered by user in the equal clause. The code checks that item is of Item_func type. Then it checks that the first argument is field and second is not field. Then it checks that the field is id or name or salary.

C++

vector<EmployeeWhereClauseUnit> Item_func_eq_case(const Item *item)
{
	vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector;
	const Item_func * itemFunction=dynamic_cast<const Item_func*>(item);
	Item **arguments=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <Item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	if(dynamic_cast <Item_field*>(arguments[1]))
	{
		return employeeWhereClauseUnitVector;
	}
	else
	{
		value = arguments[1];
	}
	Item_field *f;

	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"id")==0)
	{
		EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
		myEmployeeWhereClauseUnit.id=value->item_name.ptr();
		myEmployeeWhereClauseUnit.name="";
		myEmployeeWhereClauseUnit.salaryRangeStart=MYSQL_UNSIGNED_INT_MIN_VALUE;
		myEmployeeWhereClauseUnit.salaryRangeEnd=MYSQL_UNSIGNED_INT_MAX_VALUE;
		employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
	}
	else if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"name")==0)
	{
		EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
		myEmployeeWhereClauseUnit.id="";
		myEmployeeWhereClauseUnit.name=value->item_name.ptr();
		myEmployeeWhereClauseUnit.salaryRangeStart=MYSQL_UNSIGNED_INT_MIN_VALUE;
		myEmployeeWhereClauseUnit.salaryRangeEnd=MYSQL_UNSIGNED_INT_MAX_VALUE;
		employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
	}
	else if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"salary")==0)
	{
		EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
		myEmployeeWhereClauseUnit.id="";
		myEmployeeWhereClauseUnit.name="";
		myEmployeeWhereClauseUnit.salaryRangeStart=value->val_int();
		myEmployeeWhereClauseUnit.salaryRangeEnd=value->val_int();
		employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
	}

	return employeeWhereClauseUnitVector;
}

The following function extracts the set of ids or names or salaries specified in “in clause”. First check that item is of Item_func type. Then check that first argument is field and the following arguments are not field type.

C++

vector<EmployeeWhereClauseUnit> Item_func_in_case(const Item *item)
{
	vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector;
	const Item_func * itemFunction=dynamic_cast<const Item_func*>(item);
	Item **arguments=0;
	int inArgcount=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
		inArgcount=itemFunction->arg_count;
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <Item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"id")==0)
	{
		LONG index;
		for (index = 1; index < inArgcount; index++)
		{
			if(dynamic_cast <Item_field*>(arguments[index]))
			{
				continue;
			}
			else
			{
				value = arguments[index];
			}
			EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
			myEmployeeWhereClauseUnit.id=value->item_name.ptr();
			myEmployeeWhereClauseUnit.name="";
			myEmployeeWhereClauseUnit.salaryRangeStart=MYSQL_UNSIGNED_INT_MIN_VALUE;
			myEmployeeWhereClauseUnit.salaryRangeEnd=MYSQL_UNSIGNED_INT_MAX_VALUE;
			employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
		}
	}
	else if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"name")==0)
	{
		LONG index;
		for (index = 1; index < inArgcount; index++)
		{
			if(dynamic_cast <Item_field*>(arguments[index]))
			{
				continue;
			}
			else
			{
				value = arguments[index];
			}
			EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
			myEmployeeWhereClauseUnit.id="";
			myEmployeeWhereClauseUnit.name=value->item_name.ptr();
			myEmployeeWhereClauseUnit.salaryRangeStart=MYSQL_UNSIGNED_INT_MIN_VALUE;
			myEmployeeWhereClauseUnit.salaryRangeEnd=MYSQL_UNSIGNED_INT_MAX_VALUE;
			employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
		}
	}
	else if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"salary")==0)
	{
		LONG index;
		for (index = 1; index < inArgcount; index++)
		{
			if(dynamic_cast <Item_field*>(arguments[index]))
			{
				continue;
			}
			else
			{
				value = arguments[index];
			}
			EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
			myEmployeeWhereClauseUnit.id="";
			myEmployeeWhereClauseUnit.name="";
			myEmployeeWhereClauseUnit.salaryRangeStart=value->val_int();
			myEmployeeWhereClauseUnit.salaryRangeEnd=value->val_int();
			employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
		}
	}
	return employeeWhereClauseUnitVector;
}

The following function extracts the salary specified in “less or less than equal clause”. First check that item is of Item_func type. Then check that first argument is field and the following arguments are not field type. The less than clause is converted to where clause unit where salaryRangeStart is zero and salaryRangeEnd is equal to what is specified in less clause.

C++

vector<EmployeeWhereClauseUnit> Item_func_less_case(const Item *item)
{
	vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector;
	const Item_func * itemFunction=dynamic_cast<const Item_func*>(item);
	Item **arguments=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <Item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	if(dynamic_cast <Item_field*>(arguments[1]))
	{
		return employeeWhereClauseUnitVector;
	}
	else
	{
		value = arguments[1];
	}
	Item_field *f;
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"salary")==0)
	{
		EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
		myEmployeeWhereClauseUnit.id="";
		myEmployeeWhereClauseUnit.name="";
		myEmployeeWhereClauseUnit.salaryRangeStart=MYSQL_UNSIGNED_INT_MIN_VALUE;
		myEmployeeWhereClauseUnit.salaryRangeEnd=value->val_int();
		employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
	}

	return employeeWhereClauseUnitVector;
}

The following function extracts the salary specified in “greater or greater than equal clause”. First check that item is of Item_func type. Then check that first argument is field and the following arguments are not field type. The greater than clause is converted to where clause unit where salaryRangeStart is what is specified in greater clause and salaryRangeEnd is equal to max value of unsigned int.

C++

vector<EmployeeWhereClauseUnit> Item_func_greater_case(const Item *item)
{
	vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector;
	const Item_func * itemFunction=dynamic_cast<const Item_func*>(item);
	Item **arguments=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <Item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	if(dynamic_cast <Item_field*>(arguments[1]))
	{
		return employeeWhereClauseUnitVector;
	}
	else
	{
		value = arguments[1];
	}
	Item_field *f;
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"salary")==0)
	{
		EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
		myEmployeeWhereClauseUnit.id="";
		myEmployeeWhereClauseUnit.name="";
		myEmployeeWhereClauseUnit.salaryRangeStart=value->val_int();
		myEmployeeWhereClauseUnit.salaryRangeEnd=MYSQL_UNSIGNED_INT_MAX_VALUE;
		employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
	}

	return employeeWhereClauseUnitVector;
}

The following function extracts the salary specified in “between clause”. First check that item is of Item_func type. Then check that first argument is field and the following arguments are not field type. The between clause is converted to where clause unit where salaryRangeStart and salaryRangeEnd is what is specified in between clause.

C++

vector<EmployeeWhereClauseUnit> Item_func_between_case(const Item *item)
{
	vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector;
	const Item_func * itemFunction=dynamic_cast<const Item_func*>(item);
	Item **arguments=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	Item *field=0;
	Item *value1=0;
	Item *value2=0;
	if(dynamic_cast <Item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return employeeWhereClauseUnitVector;
	}
	if(dynamic_cast <Item_field*>(arguments[1]))
	{
		return employeeWhereClauseUnitVector;
	}
	else
	{
		value1 = arguments[1];
	}
	if(dynamic_cast <Item_field*>(arguments[2]))
	{
		return employeeWhereClauseUnitVector;
	}
	else
	{
		value2 = arguments[2];
	}
	Item_field *f;
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"salary")==0)
	{
		EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
		myEmployeeWhereClauseUnit.id="";
		myEmployeeWhereClauseUnit.name="";
		myEmployeeWhereClauseUnit.salaryRangeStart=value1->val_int();
		myEmployeeWhereClauseUnit.salaryRangeEnd=value2->val_int();;
		employeeWhereClauseUnitVector.push_back(myEmployeeWhereClauseUnit);
	}

	return employeeWhereClauseUnitVector;
}

The following is a helper function used by My_Cond_traverser to check the parent child relationship

C++

bool isChildOf(Item_cond *parent,Item *child)
{
	List_iterator<Item> li(*(parent->argument_list()));
	Item *it= NULL;
	while ((it= li++))    
	{
		if(child==it)
			return true;
	}
	return false;
}

The following function Ands the vector A of EmployeeWhereClauseUnit and vector B of EmployeeWhereClauseUnit. In this function, both vectors are traversed in a nested loop. If one struct of type EmployeeWhereClauseUnit from A matches other struct of same type from B, then this EmployeeWhereClauseUnit is added to the resultant vector. The match is found by checking id and name and salary from struct (EmployeeWhereClauseUnit) belonging to A and B. If both id are not empty then they should match otherwise skip them. Similarly if both name are not empty then they should match otherwise skip them. If id in one is empty and other is not empty then common id is equal to the non empty one. Similarly if name in one is empty and other is not empty then common name is equal to the non empty one. Similarly if there is overlap between salary range in EmployeeWhereClauseUnit from A and B, then the common value is inserted into the result vector.

C++

vector<EmployeeWhereClauseUnit> And(vector<EmployeeWhereClauseUnit> A,vector<EmployeeWhereClauseUnit> B)
{
	vector<EmployeeWhereClauseUnit> result;

	for(vector<EmployeeWhereClauseUnit>::iterator iter1=A.begin();iter1!=A.end();iter1++)
	{
		for(vector<EmployeeWhereClauseUnit>::iterator iter2=B.begin();iter2!=B.end();iter2++)
		{
			string id="";
			if(iter1->id!=""&&iter2->id!="")
			{
				if(iter1->id==iter2->id)
				{
					id=iter1->id;
				}
				else
				{
					continue;
				}
			}
			else if(iter1->id!=""&&iter2->id=="")
			{
				id=iter1->id;
			}
			else if(iter1->id==""&&iter2->id!="")
			{
				id=iter2->id;
			}
			else if(iter1->id==""&&iter2->id=="")
			{
			}

			string name="";
			if(iter1->name!=""&&iter2->name!="")
			{
				if(iter1->name==iter2->name)
				{
					name=iter1->name;
				}
				else
				{
					continue;
				}
			}
			else if(iter1->name!=""&&iter2->name=="")
			{
				name=iter1->name;
			}
			else if(iter1->name==""&&iter2->name!="")
			{
				name=iter2->name;
			}
			else if(iter1->name==""&&iter2->name=="")
			{
			}
			unsigned int salaryRangeStart;
			unsigned int salaryRangeEnd;
			bool common=FindAndBetweenTwoRanges(iter1->salaryRangeStart,iter1->salaryRangeEnd,iter2->salaryRangeStart,iter2->salaryRangeEnd,salaryRangeStart,salaryRangeEnd);
			if(common==false)
			{
				continue;
			}
			EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
			myEmployeeWhereClauseUnit.id=id;
			myEmployeeWhereClauseUnit.name=name;
			myEmployeeWhereClauseUnit.salaryRangeStart=salaryRangeStart;
			myEmployeeWhereClauseUnit.salaryRangeEnd=salaryRangeEnd;
			result.push_back(myEmployeeWhereClauseUnit);
		}
	}
	return result;
}

The following function finds and between the two salary ranges. If range 1 start is less than range 2 start and range 2 start is less than range 1 end then there is an overlap. The resultant overlap is from range 2 start and to range 1 end or rang 2 end depending upon which is lesser and vice versa.

C++

bool FindAndBetweenTwoRanges(unsigned int range1Start,unsigned int range1End,unsigned int range2Start,unsigned int range2End,unsigned int &resRangeStart,unsigned int &resRangeEnd)
{
	bool success=false;
	if(range1Start<range2Start)
	{
		if(range2Start<=range1End)
		{
			resRangeStart=range2Start;
			if(range1End<range2End)
			{
				resRangeEnd=range1End;
			}
			else
			{
				resRangeEnd=range2End;
			}
			success=true;
		}
	}
	else
	{
		if(range1Start<=range2End)
		{
			resRangeStart=range1Start;
			if(range2End<range1End)
			{
				resRangeEnd=range2End;
			}
			else
			{
				resRangeEnd=range1End;
			}
			success=true;
		}
	}
	return success;
}

The following function Ors the vector A of EmployeeWhereClauseUnit and vector B of EmployeeWhereClauseUnit. In this function, both vectors are traversed in a nested loop. If one struct of type EmployeeWhereClauseUnit from A has common with other struct of same type from B, then the common value is added to the resultant vector. The common is found by matching id name and salary from struct (EmployeeWhereClauseUnit) belonging to A and B. If both id are not empty then they should match otherwise skip them. Similarly if both name are not empty then they should match otherwise skip them. If id in one is empty and other is not empty then common id is empty. Similarly if name in one is empty and other is not empty then common name is empty. Similarly, if there is an overlap between salary ranges then the common value is inserted into the result vector and erased from vector A and B. After nested loop completes, vector A is traversed and all elements are added to result vector. Similarly vector B is traversed and all elements are added to result vector.

C++

vector<EmployeeWhereClauseUnit> Or(vector<EmployeeWhereClauseUnit> A,vector<EmployeeWhereClauseUnit> B)
{
	vector<EmployeeWhereClauseUnit> result;
	
	for(vector<EmployeeWhereClauseUnit>::iterator iter1=A.begin();A.size()>0&&iter1!=A.end();iter1++)
	{
		for(vector<EmployeeWhereClauseUnit>::iterator iter2=B.begin();B.size()>0&&iter2!=B.end();iter2++)
		{
			bool commonFound=false;
			string id="";
			if(iter1->id!=""&&iter2->id!="")
			{
				if(iter1->id==iter2->id)
				{
					id=iter1->id;
					commonFound=true;
				}
				else
				{
					continue;
				}
			}
			string name="";
			if(iter1->name!=""&&iter2->name!="")
			{
				if(iter1->name==iter2->name)
				{
					name=iter1->name;
					commonFound=true;
				}
				else
				{
					continue;
				}
			}
			unsigned int salartRangeStart;
			unsigned int salaryRangeEnd;
			if(FindOrBetweenTwoRanges(iter1->salaryRangeStart,iter1->salaryRangeEnd,iter2->salaryRangeStart,iter2->salaryRangeEnd,salartRangeStart,salaryRangeEnd))
			{
				commonFound=true;
			}
			if(commonFound==true)
			{
				EmployeeWhereClauseUnit myEmployeeWhereClauseUnit;
				myEmployeeWhereClauseUnit.id=id;
				myEmployeeWhereClauseUnit.name=name;
				myEmployeeWhereClauseUnit.salaryRangeStart=salartRangeStart;
				myEmployeeWhereClauseUnit.salaryRangeEnd=salaryRangeEnd;
				result.push_back(myEmployeeWhereClauseUnit);
				A.erase(iter1);
				B.erase(iter2);
				iter1=A.begin();
				iter2=B.begin();
			}
		}
	}
	for(vector<EmployeeWhereClauseUnit>::iterator iter1=A.begin();iter1!=A.end();iter1++)
	{
		result.push_back(*iter1);
	}
	for(vector<EmployeeWhereClauseUnit>::iterator iter2=B.begin();iter2!=B.end();iter2++)
	{
		result.push_back(*iter2);
	}
	return result;
}

The following function finds or between the two salary ranges. If range 1 start is less than range 2 start and range 2 start is less than range 1 end then there is an overlap. The resultant overlap is from range 1 start and to range 1 end or rang 2 end depending upon which is greater and vice versa.

C++

bool FindOrBetweenTwoRanges(unsigned int range1Start,unsigned int range1End,unsigned int range2Start,unsigned int range2End, unsigned int &resRangeStart,unsigned int &resRangeEnd)
{
	bool success=false;
	if(range1Start<range2Start)
	{
		if(range2Start<=range1End)
		{
			resRangeStart=range1Start;
			if(range1End>=range2End)
			{
				resRangeEnd=range1End;
			}
			else
			{
				resRangeEnd=range2End;
			}
			success=true;
		}
	}
	else
	{
		if(range1Start<=range2End)
		{
			resRangeStart=range2Start;
			if(range2End>=range1End)
			{
				resRangeEnd=range2End;
			}
			else
			{
				resRangeEnd=range1End;
			}
			success=true;
		}
	}
	return success;
}

When traverse_cond function completes, a check is made that parse stack contains only one element of type ParseStackElement. The employeeWhereClauseUnitVector vector present in ParseStackElement is passed to GetEmployees function which iterates through that vector and prepares parse where clause and finally calls GetEmployeesFromParse.

C++

void GetEmployees(vector<EmployeeWhereClauseUnit> employeeWhereClauseUnitVector,	vector<Employee> &employees)
{
	vector<EmployeeWhereClauseUnit>::iterator iter=employeeWhereClauseUnitVector.begin();
	string parseWhereClause="?where={\"$or\":[";
	char num[100];
	int i=0;
	for(;iter!=employeeWhereClauseUnitVector.end();iter++,i++)
	{
		parseWhereClause=parseWhereClause+"{";
		if(iter->id!="")
		{
			parseWhereClause=parseWhereClause+"\"employeeid\":\""+iter->id+"\",";
		}
		if(iter->name!="")
		{
			parseWhereClause=parseWhereClause+"\"name\":\""+iter->name+"\",";
		}

		_ultoa(iter->salaryRangeStart,num,10);

		parseWhereClause=parseWhereClause+"\"salary\":{\"$gte\":"+num;
		_ultoa(iter->salaryRangeEnd,num,10);
		parseWhereClause=parseWhereClause+",\"$lte\":"+num+"}}";
		if(i<employeeWhereClauseUnitVector.size()-1)
		{
			parseWhereClause=parseWhereClause+",";
		}
	}
	parseWhereClause=parseWhereClause+"]}";
	GetEmployeesFromParse(parseWhereClause,employees);
}

The GetEmployeesFromParse function forms a rest api call to retrieve data from Employee class on parse.com. The resultant json is traversed to retrieve individual employee records and pushed to myEmployees.

C++

void GetEmployeesFromParse(string parseWhereClause,vector<Employee> &myEmployees)
{

	string url=commonUrl+"Employee";
	url=url+parseWhereClause;
	string employees=CurlClientResponse(url);
	int index=0;
	char num[100];
	itoa(index,num,10);
	string count=num;
	string employeeId=GetJsonPathString(employees,"$.results["+count+"].employeeid");
	index++;
	while(employeeId!="")
	{
		Employee e;
		e.id=employeeId;
		e.name=GetJsonPathString(employees,"$.results["+count+"].name");
		e.salary=GetJsonPathNum(employees,"$.results["+count+"].salary");
		myEmployees.push_back(e);
		itoa(index,num,10);
		count=num;
		employeeId=GetJsonPathString(employees,"$.results["+count+"].employeeid");
		index++;
	}

}

The following function uses curl library to perform a rest api request to retrieve objects from Parse.com. Please provide your X-Parse-Application-Id and X-Parse-REST-API-Key.

C++

string CurlClientResponse(string url)
{
	string response="";
	CURL *curl;
	CURLcode res; 
	curl_global_init(CURL_GLOBAL_DEFAULT);
	curl = curl_easy_init();
	if(curl) 
	{
		struct curl_slist *headers = NULL;
		headers=curl_slist_append(headers, "X-Parse-Application-Id:Be2UmXVLHRcHFHd7Tkjy2ih2rB756DD0NWb4ZDX8");  
		headers=curl_slist_append( headers, "X-Parse-REST-API-Key:NyNHbYuWyyYxKBdfC9bMcfnxDd1ToKXgPegM5SKD");
		//curl_easy_setopt(curl, CURLOPT_HTTPGET,1);
		curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
		curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
		curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
		curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
		curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback_func);
		char *responsePtr = NULL;
		curl_easy_setopt(curl, CURLOPT_WRITEDATA, &responsePtr);
		/* Perform the request, res will get the return code */ 
		res = curl_easy_perform(curl);

		/* Check for errors */ 
		if(res != CURLE_OK)
			fprintf(stderr, "curl_easy_perform() failed: %s\n",
				curl_easy_strerror(res));
	
		response=responsePtr;
		free(responsePtr);
		/* always cleanup */ 
		curl_easy_cleanup(curl);
	}
  curl_global_cleanup();
  return response;
}

The rnd_next is called by MySQL to feed the next row. If m_EmployeeVectorIndex is less than size of m_EmployeeVector which means there are rows to feed then call field -> store on id, name and salary. After that increment m_EmployeeVectorIndex. If there are no more employees to feed then return HA_ERR_END_OF_FILE otherwise 0

C++

int ha_example::rnd_next(uchar *buf)
{
	int rc;
	DBUG_ENTER("ha_example::rnd_next");
	MYSQL_READ_ROW_START(table_share->db.str, table_share->table_name.str,
					TRUE);
	if(m_EmployeeVectorIndex<m_EmployeeVector.size())
	{
		Field **field=table->field;	  
		bitmap_set_bit(table->write_set, (*field)->field_index);
		(*field)->set_notnull();
		(*field)->store(m_EmployeeVector[m_EmployeeVectorIndex].id.c_str(),strlen(m_EmployeeVector[m_EmployeeVectorIndex].id.c_str()), system_charset_info);
		field++;

		bitmap_set_bit(table->write_set, (*field)->field_index);
		(*field)->set_notnull();
		(*field)->store(m_EmployeeVector[m_EmployeeVectorIndex].name.c_str(),strlen(m_EmployeeVector[m_EmployeeVectorIndex].name.c_str()), system_charset_info);
		field++;

		bitmap_set_bit(table->write_set, (*field)->field_index);
		(*field)->set_notnull();
		(*field)->store(m_EmployeeVector[m_EmployeeVectorIndex].salary,true);
		field++;

		m_EmployeeVectorIndex++;
		rc=0;
	}
	else
	{
		rc= HA_ERR_END_OF_FILE;
	}
	MYSQL_READ_ROW_DONE(rc);
	DBUG_RETURN(rc);
}

How to Run

Create an account on parse.com

Click create new app

Give app name

Note the Application ID and REST API Key.

Go to Data Browser

Click new class

Give class name Employee and click create class

Add column employeeid (string)

Add column name (string)

Add column salary (number)

Add rows to class

Download MySQL 5.6.19 from the following link:
- http://dev.mysql.com/downloads/mysql/
Make sure that you download the 32 bit version (mysql-5.6.19-win32.zip).
Copy ParseDotComEngine.dll from ParseDotComEngineBinaries.zip
Go to the directory (mysql-5.6.19-win32) where you extracted MYSQL.
Find subdirectory lib and open it.
Find and open the plugin directory inside.
Paste ParseDotComEngine.dll.
Copy curllib.dll, libeay32.dll, libsasl.dll, openldap.dll and ssleay32.dll from ParseDotComEngineBinaries.zip
Find subdirectory bin and open it.
Paste copied dlls
Launch mysqld.exe (run as administrator)
Launch a command prompt (run as administrator)
Change directory to mysql-5.6.19-win32/bin
Run the following command:
mysql –uroot
This will launch the MYSQL console
Run the following command.
Install plugin ParseDotComEngine soname 'ParseDotComEngine.dll';
This will install the ParseDotComEngine.
Create a database name test.
Now create a table named Employee and specify ParseDotComEngine as the storage engine.

SQL

create table Employee
(id VARCHAR(255),
name VARCHAR(255),
salary INT unsigned
)engine=ParseDotComEngine;

Before continuing if you are behind a firewall then allow mysqld.exe through it.

Enter the following query:

select * from Employee;

Now enter any query of your choice and see the results.

Next we demonstrate how you can do complex analysis about your employees using Tableau.

In the above gui click Connnect to data.

Select MySQL from “On a server” list

Type server name, user name and password in MySQL Connection dialog and click Connect

Select test database on the server

Select Employee table

Select Connect live

Drag id to Columns.

Drag name to rows

Now experiment with different ways to analyze data.

Where to Get the Source Code

Download MYSQL source code from the following link:
- http://dev.mysql.com/downloads/mysql/
Extract the source code to C:/mysql-5.6.19

How to Build the Source Code

Please follow the instruction given at:
- http://dev.mysql.com/doc/refman/5.6/en/installing-source-distribution.html
I build the source code using Visual Studio 2012 with the following instruction:
- cmake . -G "Visual Studio 11"
Download the attachment named ParseDotComEngineSourceCode.zip with this article.
It contains the source code of ParseDotComEngine in folder example.
Copy the example folder to mysql-5.6.19\storage

Conclusion

We have found a way to harness the power of mysql parser to do analysis on data from parse.com which was previously never done. This opens up the horizon for new kinds of analysis This approach can be generalized. I will build on this approach in the coming parts of this article series.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)