Introduction
BENCODE is a data encoding method which is strongly associated with .torrent files. This article introduces a C++ class that can decode streams in that format.
Background
Actually, I needed a BENCODE lexing code in a project I'm working on, and no matter how extensive my web searches were, all I could come
up with was either non-C/C++ code, the code was too dependant on other files which I couldn't locate, or the code took care
of things I didn't want it to. So, as usual, the only sensible solution for me was to spend couple of hours and do it myself, the way I want it.
Using the code
Before we go to the actual usage part which is totally straightforward, I would like to emphasis a few points about the bencode_lexer
class:
- The
bencode_lexer
is stream-aware, which means you can feed it arbitrarily-sized chunks of data in the order they are read in by your code. - The class acts as the parsing context as well. In other words, a single
bencode_lexer
instance cannot process more than one content stream at a time. - The class is not thread-safe, but since it's not supposed to run on parallel streams, that is perfectly acceptable in my opinion.
Here is a simple code that dumps the content of a bencode stream onto the standard output device. It might not be too useful in the real world, but it shows how to use class:
#include <stdio.h>
#include "bencode.h"
int g_iIdent = 0;
bool Callback(bencode_lexer::Event e, const void* pv, int cb, long lParam)
{
switch( e )
{
case bencode_lexer::eBeginDictionary:
{
for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
fputs("dictionary {\n", stdout);
++g_iIdent;
break;
}
case bencode_lexer::eEndDictionary:
{
--g_iIdent;
for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
fputs("}\n", stdout);
break;
}
case bencode_lexer::eBeginList:
{
for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
fputs("list {\n", stdout);
++g_iIdent;
break;
}
case bencode_lexer::eEndList:
{
--g_iIdent;
for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
fputs("}\n", stdout);
break;
}
case bencode_lexer::eInteger:
{
for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
fprintf(stdout, "integer=%lu\n", *(int*)pv);
break;
}
case bencode_lexer::eString:
{
for(int i=0; i < g_iIdent; ++i) fputc('\t', stdout);
fprintf(stdout, "string=%s\n", (const char*)pv);
break;
}
}
return true;
}
int main(int argc, char* argv[])
{
if( argc < 2 )
return 1;
FILE* pf = fopen(argv[1], "rb");
if( pf )
{
bencode_lexer bed(Callback);
char chBuf[1024];
for(;;)
{
int cbRead = fread(chBuf, 1, sizeof(chBuf), pf);
if( cbRead < 1 )
break;
int cbParsed = bed.lex_chunk(chBuf, cbRead, 0);
if( cbParsed < cbRead )
break;
}
fclose(pf);
}
return 0;
}
Again, since bencode_lexer
doesn't create any data-structures, it is up to you to construct the extracted data according to your project's needs.
Points of Interest
The bencode_lexer
uses a hand-written stack class for managing its internal state machine. This class mimics the STL's stack class
in a lot more limited but efficient way. In a real project, you will have to replace the hard-coded breakpoints I put into the ASSERT
calls.