Introduction
In my previous article (bencodelex.aspx), I presented a somewhat naked lexer for bencode streams. All it can do is scan the incoming stream and notify the client code about each token being parsed. In this article, I'm taking it to the next level and presenting a bencode parser. The difference is that now it puts together a whole data-structure the client can browse, iterate, dump, whatever...
Using the code
Another significant difference between this parser and the lexer presented in my previous article is the way the client code utilizes it. The approach of throwing arbitrary chunks of data at the parser has not changed. However, there's no more callback function. You just feed the parser until your data is fully eaten, or an error occurs.
Once you're done, the parser instance will give you access to the root node where you can begin your searches.
How a BENCODE node looks like
The internal data-structure of the bencode_parser
class is based on the bencode_variant
structure, which looks like this:
typedef struct _bencode_variant
{
bntype vt;
union
{
longlong i64;
bnstring str;
struct
{
struct _bencode_variant* first;
struct _bencode_variant* last;
} lst;
} v;
bnstring key;
struct _bencode_variant* next;
} bencode_variant;
It's actually a typed union with additional information to allow it to link to other nodes and be associated with names. bntype
and bnstrings
are defined in the header file.
The following code is a snippet from a sample program I used to test the class, which also demonstrates a typical use of the class:
FILE* pf = fopen(argv[1], "rb");
if( pf )
{
bencode::bencode_parser bed;
char chBuf[1024];
for(;;)
{
int cbRead = fread(chBuf, 1, sizeof(chBuf), pf);
if( cbRead == 0 )
{
dump(bed.top());
break;
}
int cbParsed = bed.lex_chunk(chBuf, cbRead);
if( cbParsed < cbRead )
break;
}
fclose(pf);
}
The code above invokes a function called dump()
that recursively dumps the data collected by the parser, as a simple XML. Here is how it might look like:
void dump(const bencode::bencode_variant* pvar)
{
int i;
switch( pvar->vt )
{
case bencode::bntInteger:
{
if( pvar->key.len )
fprintf(stdout, "<integer name=\"%s\">%lu</integer>\n",
pvar->key.pch, pvar->v.i64);
else
fprintf(stdout, "<integer>%lu</integer>\n", pvar->v.i64);
break;
}
case bencode::bntString:
{
if( pvar->key.len )
fprintf(stdout, "<string length=\"%lu\" name=\"%s\">",
pvar->v.str.len, pvar->key.pch);
else
fprintf(stdout, "<string length=\"%lu\">", pvar->v.str.len);
dump_string(pvar->v.str.pch, pvar->v.str.len);
fprintf(stdout, "</string>\n");
break;
}
case bencode::bntList:
{
if( pvar->key.len )
fprintf(stdout, "<list name=\"%s\">\n", pvar->key.pch);
else
fprintf(stdout, "<list>\n");
for(const bencode::bencode_variant* p = pvar->v.lst.first; p; p = p->next)
dump(p);
fprintf(stdout, "</list>\n");
break;
}
case bencode::bntDictionary:
{
if( pvar->key.len )
fprintf(stdout, "<dictionary name=\"%s\">\n", pvar->key.pch);
else
fprintf(stdout, "<dictionary>\n");
for(const bencode::bencode_variant* p = pvar->v.lst.first; p; p = p->next)
dump(p);
fprintf(stdout, "</dictionary>\n");
break;
}
}
}
Points of interest
Somehow it seems to me that the next step could be converting the gathered data into a DOM structure so searches can be done using XPath...what could be more elegant than that, huh...