Introduction
MySQL provides support for many storage engines which manage storage and retrieval. MyISAM and InnoDB are common. On table creation engine can be specified. A basic storage engine allows read only tables. Most engines read data from files and feed to MYSQL. On the other hand, our idea is to retrieve data from API’s and feed it to MYSQL. This will allow us to conduct complex analysis using sql on data retrieved from API’s. The idea is to write an engine for each set of related API’s but this does not mean that we have to write each engine from scratch but we are proposing a generic component which can be used by all storage engine and it will simplify the task of writing engine very much.
Background
The idea came into my mind when I was thinking about doing analysis on running processes in the system. I needed up to date list of running processes in the system on mysql console. Also I needed to conduct complex analysis like count of modules in the process. In order to provide simplistic view of data, we expose tables and write an engine which calls the API’s to retrieve data. Here API’s were already available. A conventional approach could be to create a table and insert into it row for new process and delete row for terminated process.
- One drawback of this approach is that you have to insert and delete rows into the table which are time consuming.
- Another drawback is that you need to trigger that logging application which inserts and delete rows from the table before querying the table
- Another drawback is that even if you need the information about a single process, all processes information is maintained in table
- Another drawback is that you can get conflict in process id’s in case one process was terminated but a new process with the same id came for insertion before the deletion of the terminated process.
- Another drawback is the table may be locked during insertion or deletion which can create performance bottlenecks for the retrieval entity.
Instead the proposed approach avoids these problems as follows:
- No need to insert and delete rows from the table
- No need to trigger the logging application before querying the table
- No need to get info about all processes when single process info is required
- No inconsistency in process id’s as it represents a state of the system
- No bottlenecks, as there are no insertion and deletion.
Why is this topic important to our audience and how will they benefit?
A specific problem which the proposed approach solves better than the conventional techniques is that we have a time series data in tables. When a table size reaches a certain limit a new table is created. The requirement was to do analysis on that data using Tableau Business Intelligence tool. The tableau generates sql queries. For convenience of the user a single table was required instead of the underlying many tables. Programmatically queries can be generated to retrieve data for certain duration and display in web pages but tableau cannot utilize those programs. One conventional technique is to replicate data in a single table but this will void the concept behind maintaining multiple tables. Another conventional technique is to use views but the query behind view need to be programmatically generated and this will void the use of view. The proposed technique exposes a single table and uses available API’s to retrieve data from multiple tables. The solution is to write an engine. The simplest engine is to feed all the data and let MySQL to filter but this is highly inefficient. The proposed storage engine builds on the power of mysql query parser. During MYSQL query parsing the where clause is converted to functions/API calls to retrieve only required data to some extent. The idea can be generalized to any other API’s without having to rewrite the core logic of generating API calls.
What are the key points the reader should remember after reading this article?
This idea will solve problems like joining data from completely disparate sources. This will provide opportunity to calculate aggregates, joins on the data retrieved from API’s which was never done. This will allow simplistic picture of data and hide the complex details of database design. This will allow us to avoid time consuming joins by combining two tables into one. This will build on already written API’s to hide the complex queries behind and provide simple tables to user for even more complex analysis.
How our Storage Engine is different from others
Now I will tell how our proposed engines are different from generally available storage engines. Blackhole engine as the name suggests is used when the applications want the information to disappear automatically. Federated storage engine provides a view of remote MYSQL table. Maria storage engine provides millisecond support. XtraDB storage engine is a replacement for innodb. Falcon aims as a better transactional storage engine. PrimeBase XT does not provide durability in ACID compliance. AWSS3 storage engine is used to access data from Amazon Web Services S3 and exposes the data as tables. What is different in our proposed engines as compared to AWSS3 is that we provide a generic component which can be used by all the engines and simplify writing a storage engine. Mdbtools storage engine can be used to move data from Microsoft Access to mysql. Ritmark FS storage allows to upload files on ftp. Q4M provides message queuing functionality.
Proposed Storage Engines
Our proposed storage engine support tables that are populated on the fly and don’t retain any data. The proposed storage engine builds on the power of mysql query parser. During MYSQL query parsing the where clause is converted to functions/API calls. The functions/APIs which don’t use global or static variables and behave like mathematical functions are the candidate for complex analysis. Those functions/APIs return data structures, xml etc. when traversed can be converted into relations (tables). Different functions can be combined to feed data to a table .The functions input parameters (not necessarily all) are passed in the where-clause and functions result is shown as table. For example a storage engine named ProcessInfoEngine is created along with following tables:
Create table RunningProcessList(
PROCESS NAME varchar(2048),
Process ID int,
Thread count int,
Parent process ID int,
Priority Class int,
)engine= ProcessInfoEngine;
Create table ProcessModulesList
(
Process ID int,
MODULE NAME varchar(2048),
Executable varchar(2048),
Baseaddress varchar (255),
Basesize int
) engine= ProcessInfoEngine;
When the user issues a query select * from RunningProcessList the control passes to ProcessInfoEngine which checks the table name and where clause.
int ha_example::rnd_init(bool scan)
{
DBUG_ENTER("ha_example::rnd_init");
if(_stricmp(table_share->table_name.str,"RunningProcessList")==0)
{
THD *thd=this->ha_thd();
SELECT_LEX *select_lex=&thd->lex->select_lex;
m_processListIndex=0;
if(select_lex->where==0||select_lex->where->RunningProcessListSize==0)
{
GetAllProcessList(m_processList);
}
else
{
GetProcessListSpecifiedInWhereClause(m_processList,select_lex->where->RunningProcessList,select_lex->where->RunningProcessListSize);
}
}
DBUG_RETURN(0);
}
In this case where clause is empty therefore the engine takes a snapshot of all running processes using following win32 api:
- CreateToolhelp32Snapshot
- Process32First
- Process32Next
Note: Following code is adapted from [1]
BOOL GetAllProcessList(vector< runningprocess> &procList )
{
HANDLE hProcessSnap;
HANDLE hProcess;
PROCESSENTRY32 pe32;
DWORD dwPriorityClass;
hProcessSnap = CreateToolhelp32Snapshot( TH32CS_SNAPPROCESS, 0 );
if( hProcessSnap == INVALID_HANDLE_VALUE )
{
return( FALSE );
}
pe32.dwSize = sizeof( PROCESSENTRY32 );
if( !Process32First( hProcessSnap, &pe32 ) )
{
CloseHandle( hProcessSnap ); return( FALSE );
}
do
{
dwPriorityClass = 0;
hProcess = OpenProcess( PROCESS_ALL_ACCESS, FALSE, pe32.th32ProcessID );
if( hProcess == NULL )
{
}
else
{
dwPriorityClass = GetPriorityClass( hProcess );
CloseHandle( hProcess );
}
struct RunningProcess myRunningProcess;
myRunningProcess.ProcessName=pe32.szExeFile;
myRunningProcess.ParentProcessID=pe32.th32ParentProcessID;
myRunningProcess.PriorityClass=dwPriorityClass;
myRunningProcess.ProcessID=pe32.th32ProcessID;
myRunningProcess.ThreadCount=pe32.cntThreads;
procList.push_back(myRunningProcess);
} while( Process32Next( hProcessSnap, &pe32 ) );
CloseHandle( hProcessSnap );
return( TRUE );
}
The engine feeds the process information rows to mysql one by one. MySQL shows the table RunningProcessList data
int ha_example::rnd_next(uchar *buf)
{
int rc;
DBUG_ENTER("ha_example::rnd_next");
MYSQL_READ_ROW_START(table_share->db.str, table_share->table_name.str,
TRUE);
if(m_processListIndex<m_processlist.size()) **field="table-">field;
bitmap_set_bit(table->write_set, (*field)->field_index);
(*field)->set_notnull();
(*field)->store(m_processList[m_processListIndex].ProcessName.c_str(),strlen(m_processList[m_processListIndex].ProcessName.c_str()), system_charset_info);
field++;
bitmap_set_bit(table->write_set, (*field)->field_index);
(*field)->set_notnull();
(*field)->store(m_processList[m_processListIndex].ProcessID,true);
field++;
bitmap_set_bit(table->write_set, (*field)->field_index);
if(m_processList[m_processListIndex].ThreadCount!=-1)
{
(*field)->set_notnull();
(*field)->store(m_processList[m_processListIndex].ThreadCount,true);
}
else
{
(*field)->set_null();
}
field++;
bitmap_set_bit(table->write_set, (*field)->field_index);
(*field)->set_notnull();
(*field)->store(m_processList[m_processListIndex].ParentProcessID,true);
field++;
bitmap_set_bit(table->write_set, (*field)->field_index);
(*field)->set_notnull();
(*field)->store(m_processList[m_processListIndex].PriorityClass,true);
field++;
m_processListIndex++;
rc=0;
}
else
{
rc= HA_ERR_END_OF_FILE;
}
MYSQL_READ_ROW_DONE(rc);
DBUG_RETURN(rc);
}
PROCESS NAME | Process ID | Thread count | Parent process ID | Priority base |
[System Process] | 0x00000000 | 4 | 0x00000000 | 0 |
smss.exe | 0x00000124 | 3 | 0x00000004 | 11 |
csrss.exe | 0x00000194 | 9 | 0x0000018C | 13 |
wininit.exe | 0x00000274 | 3 | 0x0000018C | 13 |
services.exe | 0x000002B4 | 7 | 0x00000274 | 9 |
lsass.exe | 0x000002CC | 7 | 0x00000274 | 9 |
explorer.exe | 0x0000062C | 30 | 0x000005CC | 8 |
When user issues a query like select * from RunningProcessList where processid=0x00000000
. One option is to feed all running processes to mysql and let mysql filter the other process id’s. Another option is to call OpenProcess API
and pass processid=0x00000000 as parameter and feed only one row.
void GetProcessListSpecifiedInWhereClause(vector < RunningProcess > & procList, int *RunningProcessList,int RunningProcessListSize)
{
for(int i=0;i < RunningProcessListSize;i++)
{
HANDLE hProcess;
hProcess = OpenProcess( PROCESS_ALL_ACCESS, FALSE,RunningProcessList[i] );
if( hProcess != NULL )
{
char lpExeName[MAX_PATH];
DWORD dwSize=MAX_PATH;
dwSize=QueryFullProcessImageName(hProcess,0,lpExeName,&dwSize);
if(dwSize==0) continue;
ULONG_PTR ppid=GetParentProcessId(hProcess);
DWORD dwPriorityClass;
dwPriorityClass = GetPriorityClass( hProcess );
struct RunningProcess myRunningProcess;
basic_string < char > str=lpExeName;
basic_string < char >::size_type index;
static const basic_string < char > ::size_type npos = -1;
index=str.find_last_of('\\');
if ( index != npos )
{
myRunningProcess.ProcessName=str.substr(index+1,str.length()-index);
}
else
{
myRunningProcess.ProcessName=str;
}
myRunningProcess.ParentProcessID=ppid;
myRunningProcess.PriorityClass=dwPriorityClass;
myRunningProcess.ProcessID=RunningProcessList[i];
myRunningProcess.ThreadCount=-1;
procList.push_back(myRunningProcess);
CloseHandle( hProcess );
}
}
}
Note: Following code is taken from [2]
ULONG_PTR GetParentProcessId(HANDLE hProcess) {
ULONG_PTR pbi[6];
ULONG ulSize = 0;
LONG (WINAPI *NtQueryInformationProcess)(HANDLE ProcessHandle, ULONG ProcessInformationClass,
PVOID ProcessInformation, ULONG ProcessInformationLength, PULONG ReturnLength);
*(FARPROC *)&NtQueryInformationProcess =
GetProcAddress(LoadLibraryA("NTDLL.DLL"), "NtQueryInformationProcess");
if(NtQueryInformationProcess)
{
if(NtQueryInformationProcess(hProcess, 0,
&pbi, sizeof(pbi), &ulSize) >= 0 && ulSize == sizeof(pbi))
return pbi[5];
}
return (ULONG_PTR)-1;
}
The result of the query is:
PROCESS NAME | Process ID | Thread count | Parent process ID | Priority base |
[System Process] | 0x00000000 | 4 | 0x00000000 | 0 |
Now the question is from where process list specified in where clause come from? The answer is that during SQL query parsing the where clause is converted to running process list. The SQL grammar is augmented with semantic rules. The changes in semantic rules are highlighted in bold below:
expr:
expr or expr %prec OR_SYM
{
Item_cond_or *item1;
Item_cond_or *item3;
if (is_cond_or($1))
{
item1= (Item_cond_or*) $1;
if (is_cond_or($3))
{
item3= (Item_cond_or*) $3;
SemanticRule($1,$3,$3,false);
item3->add_at_head(item1->argument_list());
$$ = $3;
}
else
{
HandleCase($3);
SemanticRule($1,$3,$1,false);
item1->add($3);
$$ = $1;
}
}
else if (is_cond_or($3))
{
item3= (Item_cond_or*) $3;
HandleCase($1);
SemanticRule($1,$3,$3,false);
item3->add_at_head($1);
$$ = $3;
}
else
{
$$ = new (YYTHD->mem_root) Item_cond_or($1, $3);
if ($$ == NULL)
MYSQL_YYABORT;
HandleCase($1);
HandleCase($3);
SemanticRule($1,$3,$$,false);
}
}
| expr XOR expr %prec XOR
{
$$ = new (YYTHD->mem_root) Item_func_xor($1, $3);
if ($$ == NULL)
MYSQL_YYABORT;
}
| expr and expr %prec AND_SYM
{
Item_cond_and *item1;
Item_cond_and *item3;
if (is_cond_and($1))
{
item1= (Item_cond_and*) $1;
if (is_cond_and($3))
{
item3= (Item_cond_and*) $3;
SemanticRule($1,$3,$3,true);
item3->add_at_head(item1->argument_list());
$$ = $3;
}
else
{
HandleCase($3);
SemanticRule($1,$3,$1,true);
item1->add($3);
$$ = $1;
}
}
else if (is_cond_and($3))
{
item3= (Item_cond_and*) $3;
HandleCase($1);
SemanticRule($1,$3,$3,true);
item3->add_at_head($1);
$$ = $3;
}
else
{
$$ = new (YYTHD->mem_root) Item_cond_and($1, $3);
if ($$ == NULL)
MYSQL_YYABORT;
HandleCase($1);
HandleCase($3);
SemanticRule($1,$3,$$,true);
}
}
| NOT_SYM expr %prec NOT_SYM
{
$$= negate_expression(YYTHD, $2);
if ($$ == NULL)
MYSQL_YYABORT;
}
| bool_pri IS TRUE_SYM %prec IS
{
$$= new (YYTHD->mem_root) Item_func_istrue($1);
if ($$ == NULL)
MYSQL_YYABORT;
}
| bool_pri IS not TRUE_SYM %prec IS
{
$$= new (YYTHD->mem_root) Item_func_isnottrue($1);
if ($$ == NULL)
MYSQL_YYABORT;
}
| bool_pri IS FALSE_SYM %prec IS
{
$$= new (YYTHD->mem_root) Item_func_isfalse($1);
if ($$ == NULL)
MYSQL_YYABORT;
}
| bool_pri IS not FALSE_SYM %prec IS
{
$$= new (YYTHD->mem_root) Item_func_isnotfalse($1);
if ($$ == NULL)
MYSQL_YYABORT;
}
| bool_pri IS UNKNOWN_SYM %prec IS
{
$$= new (YYTHD->mem_root) Item_func_isnull($1);
if ($$ == NULL)
MYSQL_YYABORT;
}
| bool_pri IS not UNKNOWN_SYM %prec IS
{
$$= new (YYTHD->mem_root) Item_func_isnotnull($1);
if ($$ == NULL)
MYSQL_YYABORT;
}
| bool_pri
;
where_clause:
{ Select->where= 0; }
| WHERE
{
Select->parsing_place= IN_WHERE;
}
expr
{
SELECT_LEX *select= Select;
select->where= $3;
HandleCase($3);
select->parsing_place= NO_MATTER;
if ($3)
$3->top_level_item();
}
;
The following function extracts process id from equal expression For example processid=0x00000000
.
void Item_func_eq_case(Item *item)
{
Item_func * itemFunction=dynamic_cast<item_func*>(item);
Item **arguments=0;
if(itemFunction)
{
arguments=itemFunction->arguments();
}
else
{
return;
}
Item *field=0;
Item *value=0;
if(dynamic_cast <item_field*>(arguments[0]))
{
field = arguments[0];
}
else
{
return;
}
if(dynamic_cast <item_field*>(arguments[1]))
{
return;
}
else
{
value = arguments[1];
}
if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"ProcessID")==0)
{
item->RunningProcessList=(int*)sql_alloc(sizeof(int)*1);
item->RunningProcessListSize=1;
item->RunningProcessList[0]=value->val_int();
}
}
The following function extracts list of process ids from in expression. For example processid in (0x00000000,0x00000124)
void Item_func_in_case(Item *item)
{
Item_func * itemFunction=dynamic_cast<item_func*>(item);
Item **arguments=0;
int inArgcount=0;
if(itemFunction)
{
arguments=itemFunction->arguments();
inArgcount=itemFunction->arg_count;
}
else
{
return;
}
Item *field=0;
Item *value=0;
if(dynamic_cast <item_field*>(arguments[0]))
{
field = arguments[0];
}
else
{
return;
}
if(field&&field->item_name.ptr()&&_stricmp(field->item_name.ptr(),"ProcessID")==0)
{
item->RunningProcessList=(int*)sql_alloc(sizeof(int)*inArgcount);
item->RunningProcessListSize=0;
LONG index;
for (index = 1; index < inArgcount; index++)
{
if(dynamic_cast < item_field* >(arguments[index]))
{
continue;
}
else
{
value = arguments[index];
}
item->RunningProcessList[item->RunningProcessListSize]=value->val_int();
item->RunningProcessListSize++;
}
}
}
The following function calls one of the above functions depending upon type of Item
void HandleCase(Item *item)
{
if(dynamic_cast<item_func_in*>(item))
{
Item_func_in_case(item);
}
else if(dynamic_cast<item_func_eq*>(item))
{
Item_func_eq_case(item);
}
}
The following function performs and between the arrays containing running processes. The process id's common to both arrays are in the resulting running process array.
void CombineAnd(Item *arg1,Item *arg2,int *&resultRunningProcessList,int &resultRunningProcessListSize)
{
if(arg1->RunningProcessListSize+arg2->RunningProcessListSize>0)
{
resultRunningProcessList=(int*)sql_alloc(sizeof(int)*(arg1->RunningProcessListSize+arg2->RunningProcessListSize));
resultRunningProcessListSize=0;
for(int i=0;i<arg1->RunningProcessListSize;i++)
{
for(int j=0;j<arg2->RunningProcessListSize;j++)
{
if(arg1->RunningProcessList[i]==arg2->RunningProcessList[j])
{
resultRunningProcessList[resultRunningProcessListSize]=arg1->RunningProcessList[i];
resultRunningProcessListSize++;
break;
}
}
}
}
}
The following function performs or between the arrays containing running processes. The process id's common to both arrays are included one time and rest are also the part of the resulting running process array.
void CombineOR(Item *arg1,Item *arg2,int *&resultRunningProcessList,int &resultRunningProcessListSize)
{
if(arg1->RunningProcessListSize+arg2->RunningProcessListSize>0)
{
resultRunningProcessList=(int*)sql_alloc(sizeof(int)*(arg1->RunningProcessListSize+arg2->RunningProcessListSize));
resultRunningProcessListSize=0;
for(int i=0;i<arg1->RunningProcessListSize;i++)
{
resultRunningProcessList[resultRunningProcessListSize]=arg1->RunningProcessList[i];
resultRunningProcessListSize++;
}
for(int j=0;j<arg2->RunningProcessListSize;j++)
{
resultRunningProcessList[rultRunningProcessListSize]=arg2->RunningProcessList[j];
resultRunningProcessListSize++;
}
}
}
The following function calls above functions depending upon the type of the operator (And,Or).
void SemanticRule(Item *arg1,Item *arg2,Item *result,bool isAnd)
{
int *resultRunningProcessList;
int resultRunningProcessListSize;
if(isAnd)
{
CombineAnd(arg1,arg2,resultRunningProcessList,resultRunningProcessListSize);
}
else
{
CombineOR(arg1,arg2,resultRunningProcessList,resultRunningProcessListSize);
}
result->RunningProcessList=resultRunningProcessList;
result->RunningProcessListSize=resultRunningProcessListSize;
}
The user can enter complex queries. During parsing where clause is converted to calls to OpenProcess API and CreateToolhelp32Snapshot, Process32First, Process32Next when needed. For example if a portion of where clause contains process id the OpenProcess API should be called otherwise CreateToolhelp32Snapshot, Process32First, Process32Next should be called. Similarly, When the user issues a query select * from ProcessModulesList the control passes to ProcessInfoEngine which checks the table name and where clause. In this case where clause is empty therefore the engine takes a snapshot of all running processes using following win32 api
CreateToolhelp32Snapshot
Process32First
Process32Next
Module32First
Module32Next
The engine feeds the process module information rows to mysql one by one. MySQL shows the table ProcessModulesList data
Process ID | MODULE NAME | Executable | Base address | Base size |
0x00000000 | ntdll.dll | C:\Windows\SysWOW64\ntdll.dll | 0x77430000 | 1572864 |
0x00000000 | kernel32.dll | C:\Windows\syswow64\kernel32.dll | 0x75000000 | 1048576 |
0x00000ED0 | EXCEL.EXE | C:\Program Files (x86)\Microsoft Office\Office12\EXCEL.EXE | 0x2F8F0000 | 18411520 |
0x00000ED0 | ntdll.dll | C:\Windows\SysWOW64\ntdll.dll | 0x77430000 | 1572864 |
0x00000ED0 | kernel32.dll | C:\Windows\syswow64\kernel32.dll | 0x75000000 | 1048576 |
0x00000ED0 | ADVAPI32.dll | C:\Windows\syswow64\ADVAPI32.dll | 0x76480000 | 655360 |
0x00000ED0 | msvcrt.dll | C:\Windows\syswow64\msvcrt.dll | 0x765E0000 | 704512 |
The user can also join RunningProcessList and ProcessModulesList. For example select * from RunningProcessList, ProcessModulesList where RunningProcessList .Process ID= ProcessModulesList .Process ID The engine has nothing to do with the join it is done byMYSQL.
The idea can be extended to API’s available:
- Flickr API(almost all the functionality that runs flickr.com is available through the API)
- Graph API
- foursquare API
- Tumblr API
- The Remember The Milk API
- Google+ API
- FriendFeed API
- Geolocation API
- The Goodreads API
For each source of API’s/functions a storage engine can be created like Flick Storage engine. Flick Storage engine will call Flickr API [3] to retrieve data from Flickr. The user will write MySQL queries to retrieve data from Flickr. For example: Create a table named “Group” with engine=Flickr_Storage_Engine having following columns:
- Nsid
- Name
- Iconfarm
- Iconserver
- Admin
- Eighteenplus
- invitation_only
- members
- pool_count
- api_key
- user_id
- extras
The user writes following query: select * from Group where api_key=’ABCD’ and user_id=1; During MYSQL parsing the above query generates calls to Flickr API. Flickr storage engine calls api flickr.people.getGroups passing api_key and user_id as parameter. The API returns following xml
<group nsid="17274427@N00" name="Cream of the Crop - Please read the rules" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="11935" pool_count="12522">
<group nsid="20083316@N00" name="Apple" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="11776" pool_count="62438">
<group nsid="34427469792@N01" name="FlickrCentral" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="168055" pool_count="5280930">
<group nsid="37718678610@N01" name="Typography and Lettering" iconfarm="1" iconserver="1" admin="0" eighteenplus="0" invitation_only="0" members="17318" pool_count="130169">
</group></group> </group></group></groups>
Flickr engine parses the XML and feeds the rows to mysql. The result of the query is:
Nsid<o:p>
| Name<o:p>
| Iconfarm<o:p>
| Iconserver<o:p>
| Admin<o:p>
| Eighteenplus<o:p>
| invitation_only<o:p>
| members<o:p>
| pool_count<o:p>
|
17274427@N00<o:p> | Cream of the Crop - Please read the rules<o:p>
<o:p>
| 1<o:p>
| 1<o:p>
| 0<o:p>
| 0<o:p>
| 0<o:p>
| 11935<o:p>
<o:p>
| 12522<o:p>
<o:p>
|
20083316@N00<o:p>
| Apple<o:p>
<o:p>
| 1<o:p>
| 1<o:p>
| 0<o:p>
| 0<o:p>
| 0<o:p>
| 11776<o:p>
<o:p>
| 62438<o:p>
<o:p>
|
34427469792@N01<o:p>
<o:p>
| FlickrCentral<o:p>
<o:p>
| 1<o:p>
| 1<o:p>
| 0<o:p>
| 0<o:p>
| 0<o:p>
| 168055<o:p>
<o:p>
| 5280930<o:p>
<o:p>
|
37718678610@N01<o:p>
<o:p>
| Typography and Lettering<o:p>
<o:p>
| 1<o:p>
| 1<o:p>
| 0<o:p>
| 0<o:p>
| 0<o:p>
| 17318<o:p>
<o:p>
| 130169<o:p>
<o:p>
|
This is a simple query but user can write complex queries involving joins and aggregates. The user can even join the data retrieved from different API’s.
Where to get the source code:
Download the attachment with this article. It contains following files:
- ha_example.cc
- ha_example.h
- item.cc
- item.h
- sql_yacc.yy
Download the MySQL source code from the following link: http://sourceforge.net/projects/mysql.mirror/files/MySQL%205.6.13/
- Copy ha_example.cc to mysql-5.6.13\storage\example
- Copy ha_example.h to mysql-5.6.13\storage\example
- Copy item.cc to mysql-5.6.13\sql
- Copy item.h to mysql-5.6.13\sql
- Copy sql_yacc.yy to mysql-5.6.13\sql
How to build the source code:
Please follow the instruction given at: http://dev.mysql.com/doc/refman/5.6/en/installing-source-distribution.html
Conclusion
We have found a way to harness the power of mysql parser to do analysis on data that was previously never done. This opens up the horizon for new kinds of analysis This approach can be generalized. I want to write a series of articles on the topic
Tutorial on Storage Engines:
For more information: http://dev.mysql.com/doc/internals/en/custom-engine.html
References:
- [1]http://msdn.microsoft.com/en-us/library/windows/desktop/ms686701(v=vs.85).aspx
- [2] http://stackoverflow.com/questions/185254/how-can-a-win32-process-get-the-pid-of-its-parent
- [3]http://www.flickr.com/services/developer/