Introduction
Before I get used to the methodology as introduced in this article, I often asked myself:
- Everything underlying is saved and transported in the binary form, but why is it so complicated to deal with types (especially OOP)?
- Markup Language like XML is so elegant to handle structural text (in a form of Context Free Grammar), it is lexer/parser-friendly, but when you want to go beyond the type of text, things became so dirty? Like
<![CDATA[]]>
.... embed text-coded data inside well-formed hierarchical structure? There must be something wrong.
We know binary is so essential and XML is so powerful, but the mainstream technology does not marshal data using binary XML. There is some progress in the aspect of industrial standards, however, developers don't care much about it. Why? Probably I think there shall be some methodological considerations. I would like talk about a couple of real world scenarios that you must be very familiar with.
1. Database or XML?
After many projects, I found it is inconvenient to design and maintain a RDBMS in a few small projects. Typically, a small project contains less than 20 tables and stores some pictures and files. If there is only text-based info that shall be saved, XML is a good choice. However, what about pictures and files? Using file system and record file/path inside XML nodes/attributes is more annoying.
Database provides various types. However, it is relational, especially for applications with complicated business data. Relational mode is strong, weighted and not easy to maintain but good to persist. Programming is quite different. Highly efficient programming is based on memory models, usually various data types with arrays/collections based on those types. XML is more convenient to manipulate in programming scenarios. However, we know XML is a "language" that is essentially text-based.
If we need various types of data saved/loaded in our application, if the data could be organized in a hierarchical manner, like a markup language, the application could be very light weighted. We will discuss later about how to make it possible, using a simple binary XML processing technique.
2. RFC-style Protocol vs XML-based WebAPI
Have you ever been involved in the development of communication or distributed system? Conventionally, you need to specify the bytes offsets and meanings for every field, like those big RFC documents. Let's look at a piece of RFC 973 (TCP) specification:
It is really difficult days when I was reading and implementing the protocol. We don't want to explain what it means as shown in the figure. The only reason I illustrate it here is give you an intuition how terrible it is in the implementation of data serialization within a typical protocol design process. When there is a bug related to invalid byte offset, the debugging could be a nightmare.
In contrast, let's see how web services implement a protocol, here is a segment of Google Maps API:
So simple, no byte alignments, no offset calculations. You have a hierarchical structure at the first glance. It is a programmer-friend implementation. However, what about transfer messages with multimedia rather than text-only XML?
Using binary XML tools, we don't need any byte offset specification. It is as easy as XML-based protocol design that usually appear in webservice APIs. The difference is significant, on the one hand, our programmers enjoy communication as convenient as XML, on the other hand, any binary data could be handled within the binary XML message without additional codes.
A Light Weight Binary XML Solution
Now it is our solution. The binary XML is represented by a simple tree structure with offspring nodes and attributes. The note content and attribute values can save binary objects. I implemented many predefined types, such as int
, double
, time
, bitmap
, ...they are saved in binary form but the developer access using common known types. Using this tool, we accelerated the design and coding of small projects with simpler data structure.
Using the Code
1. IDump
Before we go to binary XML, let's look at {IDump.cs}. Here, we defined a simple interface. Any class inherits IDump
is available to be saved into or loaded from a list of bytes. The binary XML frequently uses IDump
for functioning.
{IDump.cs}
public interface IDump
{
int AppendToBytes(ref List<byte> byte_segments, ref int index);
void LoadFromBytes(byte[] bytes, ref int index);
}
2. Content
Every element is content. A content is few bytes that saves real data, with additional bytes that specify the type of the content. We have several predefined types as follows. You can define your own in your specific projects.
{Content.cs}
public enum EnumType
{
Null,
RawBin,
IDump,
String,
Int32,
Int64,
Double,
DateTime,
TimeSpan,
Bmp,
Boolean
}
When there is a type specification, we can virtually handle anything in a binary form. Content could be created simply through:
{Content.cs}
public Content()
{
}
public Content(string val)
{
this.type = EnumType.String;
this.content = System.Text.Encoding.UTF8.GetBytes(val);
}
public Content(int val)
{
this.type = EnumType.Int32;
this.content = System.BitConverter.GetBytes(val);
}
public Content(long val)
{
this.type = EnumType.Int64;
this.content = System.BitConverter.GetBytes(val);
}
public Content(double val)
{
this.type = EnumType.Double;
this.content = System.BitConverter.GetBytes(val);
}
public Content(DateTime val)
{
this.type = EnumType.DateTime;
this.content = System.BitConverter.GetBytes(val.Ticks);
}
public Content(TimeSpan val)
{
this.type = EnumType.TimeSpan;
this.content = System.BitConverter.GetBytes(val.Ticks);
}
public Content(IDump val)
{
this.type = EnumType.IDump;
List<byte> ls = new List<byte>();
int index = 0;
byte[] encoded = System.Text.Encoding.UTF8.GetBytes(val.GetType().FullName);
ls.AddRange(System.BitConverter.GetBytes(encoded.Length)); index += 4;
ls.AddRange(encoded); index += encoded.Length;
val.AppendToBytes(ref ls, ref index);
}
public Content(System.Drawing.Bitmap bmp)
{
this.type = EnumType.Bmp;
System.IO.MemoryStream ms = new System.IO.MemoryStream();
bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Bmp);
this.content = ms.ToArray();
}
public Content(byte[] val)
{
this.type = EnumType.RawBin;
this.content = val;
}
3. Binary XML
Now we have Content
class to save binary data with any type. Next, we will built hierarchical structure using Content
as elements.
In XML, we have:
<Master ID="1" Name="William Shakespeare" Icon="image-src">
<Works>
<Work>Hamlet</Work>
<Work>Otello</Work>
<Work>King Lear</Work>
</Works>
</Master>
First, we need specify a stronger type of ID
. Let ID="1"
be more stronger form of a long 64-bit integer. Second, we need the binary content of the Icon
attribute directed embedded inside the XML, rather than an external link. Let's see how binary XML does:
{Program.cs}
BinTree bt = new BinTree("Master");
bt["ID"] = new Content(1L);
bt["Name"] = new Content("William Shakespeare");
Bitmap bmp = new Bitmap(100, 100);
using (Graphics g = Graphics.FromImage(bmp))
g.Clear(Color.Aqua);
bt["Icon"] = new Content(bmp);
BinTree bt_tags = bt.FindChildOrAppend("Works");
bt_tags.children.Add(new BinTree("Work") { content = new Content("Hamlet") });
bt_tags.children.Add(new BinTree("Work") { content = new Content("Otello") });
bt_tags.children.Add(new BinTree("Work") { content = new Content("King Lear") });
The line bt["ID"] = new Content(1L);
specifies the long type using -L
suffix, i.e., a long type binary representation form. Then, since we don't have an icon of William Shakespeare in hand, we painted a blank bitmap with a background color of Aqua
. The icon bitmap is directed saved in the Icon
attribute. If you like, you can also save it into the content of a separate sub-node, like what we usually do in XML.
In the sample code above, we also see how the binary XML populates offspring nodes and manipulate collections. We could find how simple it is to handle misc issues like save, transfer and display:
{Program.cs}
List<byte> bin = new List<byte>();
int index = 0;
(bt as IDump).AppendToBytes(ref bin, ref index);
StringBuilder sb = new StringBuilder();
bt.PopulateXml(sb, 0, System.Environment.NewLine, " ");
Console.WriteLine(sb.ToString());
Here is a text-based XML derived from the binary XML. Since Icon
is an binary bitmap, here we see Icon = "{100 x 100}"
.
Points of Interest
Okay. We've almost done. What is left to you is simplify your application. If you are using a light weighted relational database, consider move it to a concise binary XML file. If you a design a binary communication protocol, consider specify the details in a markup language format and transfer the entire binary XML data without much specification.
Another issue is performance. Using relational databases or RFC-style protocols usually involves much performance considerations. I have not discussed in this article about how the binary XML is handled internally. You may find it in the source code. If you have interest, I give some tips:
- XML is lexer/parser-based. In the perspective of computational complexity, binary XML is more advanced due to its linear indexing mechanism as implemented in the source code.
- When the binary XML grows to a certain scale, we could use multiple binary offset (pointer) to access different offset address within the file, in a form of palatalization. When dealing with I/O, we can also separate the file or using other similar parallelized methods. Actually, high performance databases access physical files in exactly the same way.
History
- 21st September, 2015: Initial post