Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Python

XEndian: Fast and Extensible Header-Only Endian-Aware Serializer (or The Fight for DRY)

5.00/5 (1 vote)
12 Sep 2014Apache2 min read 10K   108  
In this tip, XEndian, a header-only library will be presented

Introduction

Endianness is a problem that arises mostly when our programs have to deal with raw data. Until now, the common wisdom involved rolling your own functions or having to deal with non-standard compiler extensions (__builtin_bswapXX) or functions (htoleXX, htobeXX). However, this approach quickly leads to code-duplication and/or a great amount of boilerplate.

Although other solutions exist (such as the great Boost.Serialization library), these libraries deal with much more complex issues than my library, like versioning, different input/output formats, etc. All those features can make them somewhat heavy and are out of scope for XEndian.

Background

Design Rationale

XEndian is part of libhdbg, a work-in-progress library trying to offer a cross-platform debugging interface. As such, the main focus of XEndian has always been to remove code duplication in the loading-unloading of mostly fixed structures (think of the Elf file format). It was never meant for the serialization of ever-changing complex objects (although it can be used as such).

Using the Code

Say you have a custom structure named Foo, such as:

C++
struct Foo {
  std::uint32_t a;
  std::uint16_t b;
  std::uin8_t   c;
};

You only have to (partially) specialize the xe_impl_for_type template class like this:

C++
template <class XeImpl>
struct xe_impl_for_type<Foo, XeImpl>
{
  template <class Rw, class Self, class Mem>
  static void serialize(Self & self, Mem * mem)
  {
    Rw::field( self.a, mem + offsetof(Self, a) );
    Rw::field( self.b, mem + offsetof(Self, b) );
    Rw::field( self.c, mem + offsetof(Self, c) );
  }
};

The XeImpl parameter encodes the selected endianness, while the Rw parameter encodes the operation. The Self and Mem parameters hide the const/non-const differences in parameters during loading-unloading. Now you can use Foo with the {le/be}_load, {le/be}_load_from, {le/be}_load_into and {le/be}_store family of functions like this:

C++
int main()
{
  static const unsigned char foo_bytes[] = {
    /* Foo::a */ 0xdd, 0xcc, 0xbb, 0xaa,
    /* Foo::b */ 0x11, 0x22,
    /* Foo::c */ 0xff
  };
  
  const auto be_foo = be_load<Foo>(foo_bytes); // loaded as big-endian
  const auto le_foo = le_load<Foo>(foo_bytes); // loaded as little-endian
  
  const auto foo_p = reinterpret_cast<const Foo *>(foo_bytes)
  const auto be_foo_a = be_load_from(foo_p->a); // loaded as big-endian
  const auto le_foo_a = le_load_from(foo_p->a); // loaded as little-endian
  
  Foo into_foo; // xe_load_into is also valid with arrays
  be_load_into(foo_bytes, into_foo); // loaded as big-endian
  le_load_into(foo_bytes, into_foo); // loaded as little-endian
  
  const Foo foo { 0x11223344, 0xaabb, 0xff };
  unsigned char buffer[ sizeof(Foo) ];
  be_store(foo, buffer); // stored as big-endian
  le_store(foo, buffer); // stored as little-endian
}

Disassembly

The following code:

C++
int main()
{
  static const unsigned char foo_bytes[] = {
    /* Foo::a */ 0xdd, 0xcc, 0xbb, 0xaa,
    /* Foo::b */ 0x22, 0x11,
    /* Foo::c */ 0xff
  };

  const auto be_foo = be_load<Foo>(foo_bytes);
  if(be_foo.a != 0xddccbbaa || be_foo.b != 0x2211 || be_foo.c != 0xff)
    return EXIT_FAILURE;

  const auto le_foo = le_load<Foo>(foo_bytes);
  if(le_foo.a != 0xaabbccdd || le_foo.b != 0x1122 || le_foo.c != 0xff)
    return EXIT_FAILURE;
}

...compiled with g++ with optimizations enabled gives the following disassembly:

C++
0000000000400690 <main>:
  400690: 8b 05 da 01 00 00     mov    eax,DWORD PTR [rip+0x1da] # 400870 <main::foo_bytes>
  400696: 0f b7 0d d7 01 00 00  movzx  ecx,WORD PTR [rip+0x1d7]  # 400874 <main::foo_bytes+0x4>
  40069d: 89 c2                 mov    edx,eax
  40069f: 0f ca                 bswap  edx
  4006a1: 66 c1 c1 08           rol    cx,0x8
  4006a5: 81 fa aa bb cc dd     cmp    edx,0xddccbbaa
  4006ab: 74 06                 je     4006b3 <main+0x23>
  4006ad: b8 01 00 00 00        mov    eax,0x1
  4006b2: c3                    ret
  4006b3: 0f b7 c9              movzx  ecx,cx
  4006b6: 81 c9 00 00 ff 00     or     ecx,0xff0000
  4006bc: 81 f9 11 22 ff 00     cmp    ecx,0xff2211
  4006c2: 75 e9                 jne    4006ad <main+0x1d>
  4006c4: 3d dd cc bb aa        cmp    eax,0xaabbccdd
  4006c9: 75 e2                 jne    4006ad <main+0x1d>
  4006cb: 0f b7 05 a2 01 00 00  movzx  eax,WORD PTR [rip+0x1a2]  # 400874 <main::foo_bytes+0x4>
  4006d2: 48 ba 00 00 00 00 00  movabs rdx,0xff000000000000
  4006d9: 00 ff 00
  4006dc: 48 c1 e0 20           shl    rax,0x20
  4006e0: 48 09 d0              or     rax,rdx
  4006e3: 48 c1 e8 20           shr    rax,0x20
  4006e7: 3d 22 11 ff 00        cmp    eax,0xff1122
  4006ec: 0f 95 c0              setne  al
  4006ef: 0f b6 c0              movzx  eax,al
  4006f2: c3                    ret    

License

Licensed under the Apache License, Version 2.0

History

  • 12/09/2014 - Published XEndian header, samples and unit tests
  • 18/09/2014 - Less macros and even more DRY
  • 20/12/2014 - Simplified interface, improved naming and added more examples

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0