BEncode Lexing in C++ - Second Stage

airlobster

5.00/5 (1 vote)

22 Jun 2009CPOL1 min read

19.5K

138

Taking my previous lexer one step ahead.

Download source code - 2.34 KB

Introduction

In my previous article (bencodelex.aspx), I presented a somewhat naked lexer for bencode streams. All it can do is scan the incoming stream and notify the client code about each token being parsed. In this article, I'm taking it to the next level and presenting a bencode parser. The difference is that now it puts together a whole data-structure the client can browse, iterate, dump, whatever...

Using the code

Another significant difference between this parser and the lexer presented in my previous article is the way the client code utilizes it. The approach of throwing arbitrary chunks of data at the parser has not changed. However, there's no more callback function. You just feed the parser until your data is fully eaten, or an error occurs.

Once you're done, the parser instance will give you access to the root node where you can begin your searches.

How a BENCODE node looks like

The internal data-structure of the bencode_parser class is based on the bencode_variant structure, which looks like this:

C++

typedef struct _bencode_variant
{
 bntype vt;
 union
 {
  longlong i64;
  bnstring str;
  struct
  {
   struct _bencode_variant* first;
   struct _bencode_variant* last;
  } lst;
 } v;
 bnstring key;
 struct _bencode_variant* next;
} bencode_variant;

It's actually a typed union with additional information to allow it to link to other nodes and be associated with names. bntype and bnstrings are defined in the header file.

The following code is a snippet from a sample program I used to test the class, which also demonstrates a typical use of the class:

C++

FILE* pf = fopen(argv[1], "rb");
 if( pf )
 {
  bencode::bencode_parser bed;
  char chBuf[1024];

  for(;;)
  {
   int cbRead = fread(chBuf, 1, sizeof(chBuf), pf);
   if( cbRead == 0 )
   {
    dump(bed.top());
    break;
   }
   int cbParsed = bed.lex_chunk(chBuf, cbRead);
   if( cbParsed < cbRead )
    break;
  }

  fclose(pf);
}

The code above invokes a function called dump() that recursively dumps the data collected by the parser, as a simple XML. Here is how it might look like:

C++

void dump(const bencode::bencode_variant* pvar)
{
 int i;
 switch( pvar->vt )
 {
  case bencode::bntInteger:
  {
   if( pvar->key.len )
    fprintf(stdout, "<integer name=\"%s\">%lu</integer>\n", 
            pvar->key.pch, pvar->v.i64);
   else
    fprintf(stdout, "<integer>%lu</integer>\n", pvar->v.i64);
   break;
  }
  case bencode::bntString:
  {
   if( pvar->key.len )
    fprintf(stdout, "<string length=\"%lu\" name=\"%s\">", 
            pvar->v.str.len, pvar->key.pch);
   else
    fprintf(stdout, "<string length=\"%lu\">", pvar->v.str.len);
   dump_string(pvar->v.str.pch, pvar->v.str.len);
   fprintf(stdout, "</string>\n");
   break;
  }
  case bencode::bntList:
  {
   if( pvar->key.len )
    fprintf(stdout, "<list name=\"%s\">\n", pvar->key.pch);
   else
    fprintf(stdout, "<list>\n");
   for(const bencode::bencode_variant* p = pvar->v.lst.first; p; p = p->next)
    dump(p);
   fprintf(stdout, "</list>\n");
   break;
  }
  case bencode::bntDictionary:
  {
   if( pvar->key.len )
    fprintf(stdout, "<dictionary name=\"%s\">\n", pvar->key.pch);
   else
    fprintf(stdout, "<dictionary>\n");
   for(const bencode::bencode_variant* p = pvar->v.lst.first; p; p = p->next)
    dump(p);
   fprintf(stdout, "</dictionary>\n");
   break;
  }
 }
}

Points of interest

Somehow it seems to me that the next step could be converting the gathered data into a DOM structure so searches can be done using XPath...what could be more elegant than that, huh...

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)