Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / Win32

Building on My SQL Storage Engines and APIs to Conduct Complex Analysis Using SQL Query Language: Part 2

4.60/5 (3 votes)
16 Jul 2014CPOL4 min read 18.5K   259  
Building on My SQL Storage Engines and APIs to Conduct Complex Analysis Using SQL Query Language: Part 2

Introduction

In Part 1 of this article series, I presented the idea of mysql storage engines which retrieve data from APIs and feed it to MYSQL. Specifically, I needed an up to date list of running processes in the system on mysql console. This article builds on that first part by providing a solution to this problem without needing any change to mysql parser.

ProcessInfoEngine

This mysql storage engine is different from normal storage engines like MyISAM and InnoDB in a way that it retrieves data from Windows Process related APIs and feeds to mysql. A table runningprocesslist with engine ProcessInfoEngine is created.

SQL
CREATE TABLE `runningprocesslist` (
  `PROCESSNAME` varchar(2048) DEFAULT NULL,
  `ProcessID` int(11) DEFAULT NULL,
  `Threadcount` int(11) DEFAULT NULL,
  `ParentprocessID` int(11) DEFAULT NULL,
  `Priorityclass` int(11) DEFAULT NULL
) ENGINE= ProcessInfoEngine

When user enters a select query on runningprocesslist, the control passes to ProcessInfoEngine in rnd_init method. This method checks that table name is really “RunningProcessList”. Then it checks that where clause parse tree is present or not. If yes, then where clause parse tree is traversed to be converted to OpenProcess calls. Otherwise processes snapshot is created and all process detail are fed to MySQL.

C++
int ha_example::rnd_init(bool scan)
{
	DBUG_ENTER("ha_example::rnd_init");
	
	if(_stricmp(table_share->table_name.str,"RunningProcessList")==0)
	{
		THD *thd=this->ha_thd();
		SELECT_LEX *select_lex=&thd->lex->select_lex;
		m_processListIndex=0;

		if(select_lex->where==0)
		{
			GetAllProcessList(m_processList);
		}
		else
		{
			stack<parsestackelement> parseStack;
			select_lex->where->traverse_cond(My_Cond_traverser,(void*)&parseStack,Item::traverse_order::POSTFIX);
			if(!parseStack.empty()&&parseStack.size()==1)
			{
				GetProcessListSpecifiedInWhereClause(m_processList,parseStack.top().setOfProcessIds);
			}
			else
			{
				GetAllProcessList(m_processList);
			}
		}
	}
	DBUG_RETURN(0);
}

The stack of ParseStackElement struct is created and maintained during post fix traversal of where clause parse tree.

C++
struct ParseStackElement
{
	set<int> setOfProcessIds;
	Item *item;
};

The rnd_init calls traverse_cond on where clause parse tree root node in order to traverse it in postfix order. The My_Cond_traverser function is specified to be called during traversal. The parse stack is passed to this function. This function checks for the following:

 

  • In operator: In this case, Item_func_in_case is called to get the set of process ids specified in the “in clause” and pushed to the stack
  • Equality operator: In this case, Item_func_eq_case is called to get the process id specified and pushed to stack
  • And/Or operator: In this case, stack is popped until the children of item (My_Cond_traverser first argument) are coming. During pop, the process ids sets are Anded and Ored to get the resulting set.
C++
void My_Cond_traverser (const Item *item, void *arg)
{
	stack<ParseStackElement> *parseStack=(stack<ParseStackElement> *)arg;
	set<int> setOfProcessIds;
	if(dynamic_cast<const Item_func_in*>(item))
	{
		setOfProcessIds=Item_func_in_case(item);
		ParseStackElement elem;
		elem.setOfProcessIds=setOfProcessIds;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
	else if(dynamic_cast<const Item_func_eq*>(item))
	{
		setOfProcessIds=Item_func_eq_case(item);
		ParseStackElement elem;
		elem.setOfProcessIds=setOfProcessIds;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
	else if(dynamic_cast<const Item_cond*>(item))
	{
		const Item_cond *itemCondC=dynamic_cast<const Item_cond*>(item);
		Item_cond *itemCond=(Item_cond *)itemCondC;
		
		set<int> result;
		bool isAnd=false;
		if(dynamic_cast<const Item_cond_and*>(item))
		{
			isAnd=true;
		}
		if (!parseStack->empty()&&isChildOf((Item_cond*)item,parseStack->top().item))
		{
			result=parseStack->top().setOfProcessIds;
			parseStack->pop();
		}
		while (!parseStack->empty()&&isChildOf((Item_cond*)item,parseStack->top().item))    
		{
			if(isAnd)
			{
				result=And(result,parseStack->top().setOfProcessIds);
			}
			else
			{
				result=Or(result,parseStack->top().setOfProcessIds);
			}
			parseStack->pop();			
		}
		ParseStackElement elem;
		elem.setOfProcessIds=result;
		elem.item=(Item *)item;
		parseStack->push(elem);
	}
}

The following function extracts the process id entered by user in the equal clause. The code checks that item is of Item_func type. Then it checks that the first argument is field and second is not field. Then it checks that the field is ProcessID.

C++
set<int> Item_func_eq_case(const Item *item)
{
	set<int> setOfProcessIds;
	const Item_func * itemFunction=dynamic_cast<const Item_func*>(item);
	Item **arguments=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
	}
	else
	{
		return setOfProcessIds;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <Item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return setOfProcessIds;
	}
	if(dynamic_cast <Item_field*>(arguments[1]))
	{
		return setOfProcessIds;
	}
	else
	{
		value = arguments[1];
	}
	
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"ProcessID")==0)
	{
		setOfProcessIds.insert(value->val_int());
	}
	return setOfProcessIds;
}

The following function extracts the set of process ids specified in “in clause”. First check that item is of Item_func type. Then check that first argument is field and ProcessID and the following arguments are not field type.

C++
set<int> Item_func_in_case(const Item *item)
{
	set<int> setOfProcessIds;
	const Item_func * itemFunction=dynamic_cast<const Item_func*>(item);
	Item **arguments=0;
	int inArgcount=0;
	if(itemFunction)
	{
		arguments=itemFunction->arguments();
		inArgcount=itemFunction->arg_count;
	}
	else
	{
		return setOfProcessIds;
	}
	Item *field=0;
	Item *value=0;
	if(dynamic_cast <Item_field*>(arguments[0]))
	{
		field = arguments[0];
	}
	else
	{
		return setOfProcessIds;
	}
	if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"ProcessID")==0)
	{
		LONG index;
		for (index = 1; index < inArgcount; index++)
		{
			if(dynamic_cast <Item_field*>(arguments[index]))
			{
				continue;
			}
			else
			{
				value = arguments[index];
			}
			setOfProcessIds.insert(value->val_int());
		}
	}
	return setOfProcessIds;
}

The following is a helper function used by My_Cond_traverser to check the parent child relationship.

C++
bool isChildOf(Item_cond *parent,Item *child)
{
	List_iterator<Item> li(*(parent->argument_list()));
	Item *it= NULL;
	while ((it= li++))    
	{
		if(child==it)
			return true;
	}
	return false;
}

The following function Ands the set A and B of process ids.

C++
set<int> And(set<int> setAOfProcessIds,set<int> setBOfProcessIds)
{
	set<int> setOfProcessIds;
	set_intersection(setAOfProcessIds.begin(),setAOfProcessIds.end(),
	setBOfProcessIds.begin(),setBOfProcessIds.end(),inserter(setOfProcessIds,setOfProcessIds.end()));
	return setOfProcessIds;
}

The following function Ors the set A and B of process ids.

C++
set<int> Or(set<int> setAOfProcessIds,set<int> setBOfProcessIds)
{
	set<int> setOfProcessIds;
	set_union(setAOfProcessIds.begin(),setAOfProcessIds.end(),setBOfProcessIds.begin(),
	setBOfProcessIds.end(),inserter(setOfProcessIds,setOfProcessIds.end()));
	return setOfProcessIds;
}

Consider an example of the following query:

SQL
select * from  runningprocesslist where ProcessID=0x1C8C or ProcessID in (0xD6C,0x1D3C);

Following is the parse tree of the above query:

Image 1

This parse tree is converted to the following set of process ids. {0x1C8C,0xD6C,0x1D3C}

How to Run

  • Install MySQL 5.6.16 from the following link:
  • Make sure that during installation, you select the 32 bit version.
  • Copy ha_example.dll from ProcessInfoEngineBinaries.zip
  • Go to the directory where you installed MYSQL.
  • Find subdirectory lib and open it.
  • Find and open the plugin directory inside.
  • Paste ha_example.dll.
  • Go to MySQL console and run the following command.
  • Install plugin ProcessInfoEngine soname 'ha_example.dll';
  • This will install the ProcessInfoEngine.
  • Create a database name test.
  • Now create a table named runningprocesslist and specify ProcessInfoEngine as the storage engine.
    SQL
    CREATE TABLE `runningprocesslist` (
      `PROCESSNAME` varchar(2048) DEFAULT NULL,
      `ProcessID` int(11) DEFAULT NULL,
      `Threadcount` int(11) DEFAULT NULL,
      `ParentprocessID` int(11) DEFAULT NULL,
      `Priorityclass` int(11) DEFAULT NULL
    ) ENGINE= ProcessInfoEngine
  • Enter the following query:
    SQL
    select * from runningprocesslist; 
  • This will show all the processes running in the system.
  • Now enter any query of your choice and see the results.

Where to Get the Source Code

How to Build the Source Code

  • Please follow the instruction given at:
  • I build the source code using Visual Studio 2012 with the following instruction:
    • cmake . -G "Visual Studio 11"
  • Download the attachment named ProcessInfoEngineSourceCode.zip with this article.
  • It contains the source code of ProcessInfoEngine in folder example.
  • Copy the example folder to mysql-5.6.16\storage

Conclusion

You might be thinking why parse the where clause, why not just feed all the processes to MySQL, but when data is very big then this approach will benefit. The next part will expose the Facebook data on MySQL console. Specifically, you will be able to do complex analysis about your friends.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)