Introduction
First off: This article is made for intermediate to advanced programmers. If you don't understand generic classes or how to implement non-generic base-classes you either want to learn it first or try to understand this article (as I won't explain much how the generic pattern works). Same applies to things like serialization, XML and binary writing and reading.
You can download and use the source, though. It'd be nice if you mentioned this article if you use it, but I don't require it. Feel free to use the classes, it should provide anything you need and is perfectly easy to extend.
Arrays, Lists, Stacks, Queues and Trees all have something in common: They store data and can be enumerated. Unlike the first four, trees don't have a linear one-dimensional data structure. Each element can have various sub-elements. Enumeration of one-dimensional, linear collections is easy. This article will cover enumeration in a basic abstract tree. The implementation I will use is held abstract and simple and is far from a perfect solution. But it's a good start and can be easily extended with everything your project might require. As of version 3 of this article, I also implemented ISerializable
and a custom method of serializing/deserializing the tree as XML file, which both work with each kind of implementation. As of version 4 there is also a custom binary de-/serialization implementation which works exactly like XML. This implementation boosts performance and output size compared to legacy binary serialization. Legacy serialization is implemented for compatibility reasons but you probably want to avoid using it after you've seen the performance comparisions at the bottom of the article.
The whole thing consists of the following 8 lightweight classes:
Tree
Tree<T>
TreeBranch
DepthTreeEnumerator
DepthTreeEnumerator<T>
BreadthTreeEnumerator
BreadthTreeEnumerator<T>
This article is based upon .NET 4.5 and C# 6 language features. The code is not downwards compatible with older .NET versions since it uses IReadOnlyList<T>
which was introduced first in 4.5.
Table of contents
The tree we will implement is not based on interfaces. Most implementations are bound to interfaces, but I personally prefer base classes which provide some functionality which is useful for any kind of specific tree.
First, we create a class called Tree
. As simple as that. This class will hold the most basic functionality and thus keep the more generic implementation clean. This class sticks to some very simple conventions:
- The tree has a root element
Root
of base type TreeBranch
.
- The tree allows adding and removing of elements via two methods
AddBranch(TreeBranch)
and bool RemoveBranch(TreeBranch)
.
- The tree implements
IEnumerable<TreeBranch>
for enumeration.
- The tree implements
ISerializable
for binary serialization.
- The tree implements a
protected
serialization constructor: .ctor(SerializationInfo, StreamingContext)
which has to be inherited by every class derived from Tree
.
- The tree implements the
SerializableAttribute
.
- The tree implements methods for writing down and parsing a XML file of itself.
- The tree implements a method
WriteXml(string)
which serializes the tree into a XML file. This method is fixed and will not be overridden by any derived type.
- The tree implements a
static
method ParseXml(string)
that allows for re-creation of any tree from an XML file.
- The tree implements methods for writing down and parsing a binary file of itself.
- The tree implements a method
WriteBinary(string, Encoding)
which serializes the tree into a binary file. This method is fixed and will not be overriden by any derived type.
- The tree implements a
static
method ParseBinary(string, Encoding)
that allows for re-creation of any tree from a binary file.
- The tree has a property
EnumerationMode
that will allow for enumeration mode switching.
- (Optional) For debugging reasons (and maybe other practical applications), it has a method
Dump(string)
which allows creation of a text dump of the whole tree.
[Serializable]
public abstract class Tree : IEnumerable<TreeBranch>, ISerializable
{
private readonly TreeBranch _rootBranch;
public TreeBranch Root => _rootBranch;
public TreeEnumerationMode EnumerationMode { get; set; }
protected Tree(TreeBranch rootBranch, TreeEnumerationMode mode = TreeEnumerationMode.DepthFirst)
{
if (rootBranch == null) throw new ArgumentNullException(nameof(rootBranch));
_rootBranch = rootBranch;
rootBranch.Tree = this;
EnumerationMode = mode;
}
protected Tree(SerializationInfo info, StreamingContext context)
{
EnumerationMode = (TreeEnumerationMode) info.GetValue("EnumerationMode", typeof (TreeEnumerationMode));
_rootBranch = (TreeBranch) info.GetValue("Root", typeof (TreeBranch));
_rootBranch.Tree = this;
}
public void AddBranch(TreeBranch branch)
{
Root.AddBranch(branch);
}
public bool RemoveBranch(TreeBranch branch)
{
return Root.RemoveBranch(branch);
}
public virtual void Dump(string file)
{
using (FileStream fs = new FileStream(file, FileMode.OpenOrCreate))
{
using (StreamWriter writer = new StreamWriter(fs))
{
Root.Dump(writer, 0);
}
}
}
public IEnumerator<TreeBranch> GetEnumerator()
{
return EnumerationMode == TreeEnumerationMode.DepthFirst
? new DepthTreeEnumerator(Root, null)
: (IEnumerator<TreeBranch>) new BreadthTreeEnumerator(Root, null);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
[SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("EnumerationMode", EnumerationMode);
info.AddValue("Root", Root);
}
public void WriteXml(string file)
{
XmlDocument document = new XmlDocument();
XmlElement docElement = document.CreateElement("Tree");
docElement.SetAttribute("Type", GetType().FullName);
docElement.SetAttribute("BranchType", Root.GetType().FullName);
docElement.SetAttribute("Mode", ((int) EnumerationMode).ToString());
document.AppendChild(docElement);
Root.WriteXml(document, docElement);
using (FileStream fs = new FileStream(file, FileMode.Create))
document.Save(fs);
}
public static Tree ParseXml(string file)
{
XmlDocument document = new XmlDocument();
using (FileStream fs = new FileStream(file, FileMode.Open))
document.Load(fs);
Type treeType = Type.GetType(document.DocumentElement?.GetAttribute("Type") ?? "", true);
Type branchType = Type.GetType(document.DocumentElement?.GetAttribute("BranchType") ?? "", true);
TreeEnumerationMode mode =
(TreeEnumerationMode) int.Parse(document.DocumentElement?.GetAttribute("Mode") ?? "");
TreeBranch branch = (TreeBranch) Activator.CreateInstance(branchType);
Tree tree = (Tree) Activator.CreateInstance(treeType, branch);
XmlElement rootElement = document.DocumentElement?.ChildNodes[0] as XmlElement;
branch.ParseXml(rootElement, null);
return tree;
}
public static TTree ParseXml<TTree>(string file) where TTree : Tree
{
return (TTree) ParseXml(file);
}
public void WriteBinary(string file, Encoding textEncoding = null)
{
using (FileStream fs = new FileStream(file, FileMode.Create, FileAccess.Write, FileShare.None))
{
BinaryWriter writer = new BinaryWriter(fs, textEncoding ?? Encoding.Unicode);
writer.Write(GetType().FullName);
writer.Write(Root.GetType().FullName);
writer.Write((byte) EnumerationMode);
Root.WriteBinary(writer);
}
}
public static Tree ParseBinary(string file, Encoding textEncoding = null)
{
Tree tree;
using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read))
{
BinaryReader reader = new BinaryReader(fs, textEncoding ?? Encoding.Unicode);
Type treeType = Type.GetType(reader.ReadString(), true);
Type branchType = Type.GetType(reader.ReadString(), true);
TreeEnumerationMode mode = (TreeEnumerationMode) reader.ReadByte();
TreeBranch branch = (TreeBranch) Activator.CreateInstance(branchType);
tree = (Tree) Activator.CreateInstance(treeType, branch);
tree.EnumerationMode = mode;
branch.ParseBinary(reader, null);
}
return tree;
}
public static TTree ParseBinary<TTree>(string file, Encoding textEncoding = null) where TTree : Tree
{
return (TTree) ParseBinary(file, textEncoding);
}
}
Because we want that our non-generic Tree
behaves equivalent to any generic version, we need to define these basics. The generic Tree<TBranch>
class will only override those properties and methods that use TreeBranch
as type. Using the EnumerationMode
property is necessary since foreach
loops do always call IEnumerable.GetEnumerator()
. This cannot be prevented nor can you give it the IEnumerator<T>
directly. So we're using this property which returns the corresponding IEnumerator<T>
when IEnumerable.GetEnumerator()
is called.
Our goal is to make some sort of serialization logic that accepts any tree type, not only specific types. Therefore, our method WriteXml(string)
creates a XmlDocument
which first element is called Tree
by convention. This element has two attributes: Type which is the full name of the tree's type and BranchType
which is the full name of the branch-type that is used in this tree. These are required to re-create the tree in our static
method ParseXml(string)
. After that, it calls the same method on our root branch. The WriteXml(XmlDocument, XmlElement)
method will be covered in the next part of the article.
In reverse with ParseXml(string)
, an XmlDocument
is being loaded. The attributes of the document node are being read out, a new TreeBranch
is being instantiated as well as a Tree
. These are trees and branches of the actual implementation, of course. It calls ParseXml(XmlElement, TreeBranch)
and provides the root branches' corresponding XmlElement
. More about the ParseXml(XmlElement, TreeBranch)
method is covered in the next part of the article.
We implement the same for binary files. With WriteBinary(string, Encoding)
and ParseBinary(string, Encoding)
we simply create a BinaryWriter
and de-/serialize the tree into/from a binary file.
As of version 4.1 of the article I also implemented two more neat parser methods: The generic versions ParseXml<TTree>(...)
and ParseBinary<TTree>(...)
. These are the easiest and best-to-use methods. Because we define that TTree
has to be of type Tree
we ensure that you can only deserialize specific tree implementations (as Tree
and Tree<T>
are abstract and thus cannot be directly used).
To be able to serialize the tree using a BinaryFormatter
and thus using the serialization utilities .NET gives us, we need three things:
- The class has to implement
ISerializable
which brings us to
- The class has to implement a method
GetObjectData(SerializationInfo, StreamingContext)
which writes down data as well as a serialization constructor .ctor(SerializationInfo, StreamingContext)
which reads data. The GetObjectData(...)
method must be marked with a SecurityPermission
attribute. For a more detailed explanation, see the MSDN article about Custom Serialization.
- The class has to be marked with the
SerializableAttribute
in order to be able to serialize this class.
In our serialization constructor, we simply read out the used enumeration mode and the root node. Because our TreeBranch
will implement the same stuff, we can simply read out one by using info.GetValue(string, Type)
. In our writer method, we do the same thing vice versa. As simple as that.
Every element of our tree will be called Branch
. Therefore, our base class will be TreeBranch
. This class sticks to the following conventions:
- The branch has a
protected List<TreeBranch>
which stores all branches of this current branch.
- The branch has a
public IReadOnlyList<TreeBranch>
which wraps the branch list as read-only collection (will be explained in a bit why).
- The branch has a property
Tree
which stores the instance of the belonging tree.
- The branch has a property
Parent
which stores the parent branch.
- The branch has a property
EnumerationMode
that allows for enumeration mode switching.
- The branch has two methods
AddBranch(TreeBranch)
and bool RemoveBranch(TreeBranch)
which allow for adding/removing of further branches.
- The branch implements
IEnumerable<TreeBranch>
for enumeration (so you can use
foreach
even on single branches, not just the tree).
- The branch implements
ISerializable
for binary serialization.
- The branch implements a
protected
serialization constructor: .ctor(SerializationInfo, StreamingContext)
which has to be inherited by every class derived from Tree
.
- The branch implements the
SerializableAttribute
.
- (Optional) The branch implements a method
Dump(StreamWriter, int)
which allows for data dumping.
[Serializable]
public abstract class TreeBranch : IEnumerable<TreeBranch>, ISerializable
{
protected List<TreeBranch> BranchList;
public Tree Tree { get; internal set; }
public TreeBranch Parent { get; private set; }
public TreeEnumerationMode EnumerationMode { get; set; }
public IReadOnlyList<TreeBranch> Branches => BranchList.AsReadOnly();
protected TreeBranch(TreeEnumerationMode mode = TreeEnumerationMode.DepthFirst)
{
BranchList = new List<TreeBranch>();
EnumerationMode = mode;
}
public void AddBranch(TreeBranch branch)
{
if (!BranchList.Contains(branch))
{
BranchList.Add(branch);
branch.Parent = this;
branch.Tree = Tree;
}
}
public bool RemoveBranch(TreeBranch branch)
{
return BranchList.Remove(branch);
}
public void Dump(string file)
{
using (FileStream fs = new FileStream(file, FileMode.OpenOrCreate))
{
using (StreamWriter writer = new StreamWriter(fs))
{
Dump(writer, 0);
}
}
}
public abstract void Dump(StreamWriter writer, int level);
protected string TabFromLevel(int level)
{
string tabs = "";
for (int i = 1; i <= level; i++)
tabs += '\t';
return tabs;
}
public IEnumerator<TreeBranch> GetEnumerator()
{
return EnumerationMode == TreeEnumerationMode.DepthFirst
? new DepthTreeEnumerator<TreeBranch>(this, null)
: (IEnumerator<TreeBranch>) new BreadthTreeEnumerator<TreeBranch>(this, null);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
[SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]
public abstract void GetObjectData(SerializationInfo info, StreamingContext context);
public abstract void WriteXml(XmlDocument document, XmlElement parent);
public abstract void ParseXml(XmlElement element, TreeBranch parent);
public abstract void WriteBinary(BinaryWriter writer);
public abstract void ParseBinary(BinaryReader reader, TreeBranch parent);
}
We're using read-only collections because we require that every TreeBranch
has defined a Tree
to which it belongs as well as a parent branch (except for the root branch of a tree). These will be set (better inherited) in our AddBranch()
and RemoveBranch()
methods. So we force using these methods by providing only a read-only collection of all branches to the public.
The Dump()
method takes in a StreamWriter
as well as an int
. The method basically writes the current branch's data on a new line which is indented by level
. The resulting text file is a visual representation of the tree (or branch) which was dumped. TabFromLevel(int)
simply constructs a string
with x (level
) tab chars ('\t'). Sidenote: I fixed this method - in Version 1 of this article, the method had a little bug (first level was not indented).
All five of our serialization methods are made abstract
. This is because we cannot say which kind of data and how much any implementation actually requires to write / read. These methods provide a base framework, though, which allows any implementation to read and write whatever it requires while our base class Tree
still does most work automatically. A serialization constructor is not necessary in our TreeBranch
class because unlike our Tree
it does not have to read anything. The abstract
branch contains no data that makes sense to be saved or loaded. Our Tree
otherwise does deserialize the root branch and the enumeration mode. Because of inheritance, it does load the correct type as any implementation is of type TreeBranch
and .NET stores the data type it writes down (you will probably see the type name if you open the binary file with a text editor). Our custom implementation of binary serialization does the same - though it does save it only once. The Tree
base-class writes it similiar to XML at the very beginning of the file. Because our tree has a fixed branch type, it is enough to save it once. That's the difference between .Net serialization and custom serialization. No overhead.
Now, after we've built our abstract
base tree and an abstract
branch type, we will implement the generic version of our Tree
which now allows maximum flexibility. The generic Tree<TBranch>
class only overrides certain methods and properties by using the new
keyword. Additionally, it implements IEnumerable<TBranch>
. So with this tree, you can iterate through its elements by using the base type TreeBranch
or the generic type TBranch
.
[Serializable]
public abstract class Tree<TBranch> : Tree, IEnumerable<TBranch> where TBranch : TreeBranch
{
public new TBranch Root => (TBranch) base.Root;
protected Tree(TBranch root, TreeEnumerationMode mode = TreeEnumerationMode.DepthFirst) : base(root, mode)
{
}
protected Tree(SerializationInfo info, StreamingContext context) : base(info, context)
{
}
public void AddBranch(TBranch branch)
{
Root.AddBranch(branch);
}
public bool RemoveBranch(TBranch branch)
{
return Root.RemoveBranch(branch);
}
public new IEnumerator<TBranch> GetEnumerator()
{
return EnumerationMode == TreeEnumerationMode.DepthFirst
? new DepthTreeEnumerator<TBranch>(Root, null)
: (IEnumerator<TBranch>) new BreadthTreeEnumerator<TBranch>(Root, null);
}
public new static Tree<TBranch> ParseXml(string file)
{
return (Tree<TBranch>) Tree.ParseXml(file);
}
public new static Tree<TBranch> ParseBinary(string file, Encoding textEncoding = null)
{
return (Tree<TBranch>) Tree.ParseBinary(file, textEncoding);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
As you might have already seen, we have the same scheme with our enumerators. These will be next to be implemented. The tree does have to implement a serialization constructor, because our base class Tree
contains logic in its serialization constructor. So we need to chain the constructors. Any derived type needs to call the base-constructor in order to successfully deserialize the whole tree. Moreover we do override the parser methods with new
so we have methods that give back the right type when using for example Tree<FileSystemTreeBranch>.ParseXml(...)
.
public enum TreeEnumerationMode
{
DepthFirst,
BreadthFirst
}
This enumeration should be pretty much self-explanatory.
Once again, we have a non-generic base enumerator which does most of the work. It is compatible to our abstract
base implementation Tree
. This enumerator iterates depth-first (see: Wikipedia 'Depth-first search').
1 public class DepthTreeEnumerator : IEnumerator<TreeBranch>
2 {
3 protected TreeBranch Leaf { get; set; }
4 protected DepthTreeEnumerator SubEnumerator { get; private set; }
5 private DepthTreeEnumerator ParentEnumerator { get; set; }
6 private int _currentIndex;
7
8 public DepthTreeEnumerator(TreeBranch leaf, DepthTreeEnumerator parent)
9 {
10 Leaf = leaf;
11 ParentEnumerator = parent;
12 }
13
14 public bool MoveNext()
15 {
16 if (SubEnumerator != null) return SubEnumerator.MoveNext();
17
18 if (Current.Branches.Count == 0) return ParentEnumerator.IndicateEndOfLevel();
20
21 SubEnumerator = new DepthTreeEnumerator(Current.Branches[0], this);
23 return true;
24 }
25
26 private bool IndicateEndOfLevel()
27 {
28 SubEnumerator = null;
29 if (Leaf.Branches.Count <= _currentIndex + 1)
31 return ParentEnumerator != null && ParentEnumerator.IndicateEndOfLevel();
32 SubEnumerator = new DepthTreeEnumerator(Leaf.Branches[++_currentIndex], this);
33 return true;
34 }
35
36 public void Reset()
37 {
38 _currentIndex = 0;
39 SubEnumerator?.Dispose();
40 SubEnumerator = null;
41 }
42
43 public TreeBranch Current => SubEnumerator != null ? SubEnumerator.Current : Leaf;
44
45 object IEnumerator.Current => Current;
46
47 public void Dispose()
48 {
49 SubEnumerator?.Dispose();
50 SubEnumerator = null;
51 ParentEnumerator = null;
52 }
53 }
Now this enumerator works by basically creating a tree on its own. Each enumerator has a SubEnumerator
and a ParentEnumerator
. Each enumerator is allocated towards a certain leaf in the tree. The _currentIndex
defines on which of the sub-branches the enumeration pointer currently lies at.
The system is rather easy. There are two cases which can occur:
SubEnumerator
is null
then there are no more sub-branches on the current branch (enumerator)
SubEnumerator
is not null
then there are sub-branches and the current element is the current of the sub-enumerator (inheritance).
The enumeration will be started by calling GetEnumerator
on a branch or the tree. The tree anon does call the method on its root branch. The first element is pointed at (depth level 1, branch 1). Once you call MoveNext()
(which is being done by foreach
automatically), the topmost enumerator checks if SubEnumerator
is null
(line 16). If it is null
and the current (root) branch has further sub-branches, a sub-enumerator is being instantiated (line 22). This one now points to the first branch on the next level.
Another call to MoveNext
is now passed through to the sub-enumerator (line 16). This anon has no sub-enumerator (in the current cycle at least). We pretend the branch on level 2 has no sub-branches. The IndicateEndOfLevel()
-method is being called on its parent enumerator (enumerator of level 1; line 19). This call now sets the sub-enumerator of the level 1 enumerator to null
. It then checks if the current branch (level 1, branch 1) has another branch on the same level. If so, CurrentIndex
is being increased by one and another sub-enumerator gets instantiated for branch 2 on level 1. This continues until the last branch, we pretend now it is branch 2 on level 1, has no more branches on the same level. If ParentEnumerator
is null
, no more IndicateEndOfLevel()
call is possible (line 30). This indicates that all branches have been iterated through, MoveNext()
returns false
and therefore ends the enumeration process.
Information: I currently don't have the chance to create some kind of visualization for this description. I know it is a little bit hard to understand, but you will eventually get it by analyzing the code (which is not that much and difficult) and reading the description. If someone could provide some graphics for the states described, I'd be thankful. Nonetheless, I reference wikipedia articles about depth- and breadth-first search algorithms which should explain it perfectly.
public sealed class DepthTreeEnumerator<TBranch> : DepthTreeEnumerator, IEnumerator<TBranch> where TBranch : TreeBranch
{
public DepthTreeEnumerator(TBranch leaf, DepthTreeEnumerator parent) : base(leaf, parent)
{
}
public new TBranch Current => (TBranch) base.Current;
}
This generic version of the enumerator allows enumeration of the generic Tree<TBranch>
. Once again, it simply wraps TreeBranch
properties with the actual generic type. The only required here is the Current
property which again just uses the base class' data.
This enumerator has the same structure as our DepthTreeEnumerator
. The only difference is that this one works breadth-first (see Wikipedia 'Breadth-first search').
public class BreadthTreeEnumerator : IEnumerator<TreeBranch>
{
protected TreeBranch Leaf { get; set; }
protected BreadthTreeEnumerator SubEnumerator { get; private set; }
private BreadthTreeEnumerator ParentEnumerator { get; set; }
private int _currentIndex = -1;
public BreadthTreeEnumerator(TreeBranch leaf, BreadthTreeEnumerator parent)
{
Leaf = leaf;
ParentEnumerator = parent;
}
public bool MoveNext()
{
if (SubEnumerator != null) return SubEnumerator.MoveNext();
if (Leaf.Branches.Count > _currentIndex + 1)
{
_currentIndex++;
return true;
}
_currentIndex = -1;
return PointToNext();
}
private bool PointToNext()
{
int start = _currentIndex < 0 ? 0 : _currentIndex;
for (_currentIndex = start + 1; _currentIndex < Leaf.Branches.Count; _currentIndex++)
{
if (Leaf.Branches[_currentIndex].Branches.Count <= 0) continue;
SubEnumerator = new BreadthTreeEnumerator(Leaf.Branches[_currentIndex], this);
SubEnumerator._currentIndex = 0;
return true;
}
SubEnumerator = null;
return ParentEnumerator != null && ParentEnumerator.PointToNext();
}
public void Reset()
{
_currentIndex = -1;
SubEnumerator = null;
}
public TreeBranch Current =>
SubEnumerator != null ? SubEnumerator.Current : Leaf.Branches[_currentIndex];
object IEnumerator.Current => Current;
public void Dispose()
{
SubEnumerator?.Dispose();
SubEnumerator = null;
ParentEnumerator = null;
}
}
This enumerator has slight differences, though. First our _currentIndex
starts at -1
. This is because the first call to MoveNext()
iterates already through the branches of the current level. DepthTreeEnumerator
first creates SubEnumerator
. First in the next cycle, when the current SubEnumerator
has no more sub-branches, it goes up the tree and increments _currentIndex
.
Basically, the breadth enumerator works like this: Each enumerator cycles through the branches on its current level. Once through, _currentIndex
is being reset to -1
and PointToNext()
is being called. PointToNext()
uses the _currentIndex
and cycles through all branches on the current
level and checks if those branches have sub-branches. This way, it skips branches that are a dead end. If there are no more branches left, it either calls ParentEnumerator.PointToNext()
or returns false
. The latter occurs only when we are at the topmost branch which ends the enumeration. Important here is that newly created sub-enumerators have their index set to 0
. This is because after the cycle that creates a new sub-enumerator, the Current
property already points at this enumerator. If it is set to -1
, it would return the first file (we would have to implement a check in the Current
-property, though, which returns the 0-element when the index is -1
). The next cycle would increase the index to 0
and points again to the same branch. The first entries are doubled. Sidenote: Fix for Version 2 of the article.
public class BreadthTreeEnumerator<T> : BreadthTreeEnumerator,
IEnumerator<T> where T : TreeBranch
{
public BreadthTreeEnumerator(TreeBranch leaf,
BreadthTreeEnumerator parent) : base(leaf, parent)
{
}
public T Current => (T)base.Current;
}
Now if we want to implement binary serialization based on the .NET ISerializable
-interface, we need to actually implement some kind of TreeBranch
. I will use the FileSystemTreeBranch
that is being used in the sample application. The class looks like that:
[Serializable]
public sealed class FileSystemTreeBranch : TreeBranch
{
public string Name { get; private set; }
public bool IsDirectory { get; private set; }
public new Tree<FileSystemTreeBranch> Tree => (Tree<FileSystemTreeBranch>) base.Tree;
public FileSystemTreeBranch()
{
}
public FileSystemTreeBranch(string name, bool isDirectory)
{
Name = name;
IsDirectory = isDirectory;
}
private FileSystemTreeBranch(SerializationInfo info, StreamingContext context)
{
}
public override void Dump(StreamWriter writer, int level)
{
writer.WriteLine($"{TabFromLevel(level)}{Name} |
" + (IsDirectory ? "Directory" : "File"));
foreach (var branch in Branches) branch.Dump(writer, level + 1);
}
[SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]
public override void GetObjectData(SerializationInfo info, StreamingContext context)
{
}
public override void WriteXml(XmlDocument document, XmlElement parent)
{
}
public override void ParseXml(XmlElement element, TreeBranch parent)
{
}
public override void WriteBinary(BinaryWriter writer)
{
}
public override void ParseBinary(BinaryReader reader, TreeBranch parent)
{
}
public override string ToString()
{
return Name;
}
}
Now, if we want to serialize our branch type, we need to implement the GetObjectData(SerializationInfo, StreamingContext)
method. This will be our implementation:
[SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]
public override void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("Name", Name);
info.AddValue("IsDirectory", IsDirectory);
info.AddValue("Branches", BranchList.Count);
for (int i = 0; i < Branches.Count; i++)
info.AddValue("Branch" + i, Branches[i]);
}
First, we set our name and the IsDirectory
property, then the amount of branches this one has. It is not as easy as just serializing the list itself, so we have to improve a little bit. Once we have written down the amount of branches, we iterate them and write them down. Once again, as our implementation is serializable, we can just pass through the instance itself. This is basically a recursive method. The objects are named like 'Branch0
', 'Branch1
', and so on.
If we want to deserialize it, we need the constructor:
private FileSystemTreeBranch(SerializationInfo info, StreamingContext context)
{
Name = info.GetString("Name");
IsDirectory = info.GetBoolean("IsDirectory");
int branchCount = info.GetInt32("Branches");
if (branchCount == 0) return;
for (int i = 0; i < branchCount; i++)
{
FileSystemTreeBranch branch = (FileSystemTreeBranch)info.GetValue
("Branch" + i, typeof(FileSystemTreeBranch));
AddBranch(branch);
}
}
First, we read our properties, then the amount of branches we have and eventually we run a for
-loop and read out all sub-branches. Once again, this is basically recursive. The constructor can be private
. The formatter does work with any constructor with this exact signature, whatever accessor it has.
In order to write down an XML file of our whole tree, the base class Tree
creates a XmlDocument
and calls WriteXml(XmlDocument, XmlElement)
on the root branch. Now, we implement this abstract
method:
public override void WriteXml(XmlDocument document, XmlElement parent)
{
XmlElement element = document.CreateElement(IsDirectory ? "Directory" : "File");
element.SetAttribute("Name", Name);
foreach (var branch in Branches)
branch.WriteXml(document, element);
parent.AppendChild(element);
}
First, we create a new element with our document
which we call either "File
" or "Directory
" depending on the type. We set the name as attribute and then iterate through all sub-branches. We recursively work us through the tree by calling WriteXml(...)
on the sub-branches. Last but not least, we add the element to the parent one. Using this on the sample folder in the sample gives us this neat little file:
<Tree Type="TreeIterator.FileSystemTree"
BranchType="TreeIterator.FileSystemTreeBranch">
<Directory Name="Test Folder">
<File Name="Level 1 Document 1.txt" />
<File Name="Level 1 Document 2.txt" />
<File Name="Level 1 Document 3.txt" />
<File Name="Level 1 Document 4.txt" />
<Directory Name="Level 1 Folder 1">
<File Name="Level 1 Folder 1 Document 1.txt" />
<File Name="Level 1 Folder 1 Document 2.txt" />
<Directory Name="Level 1 Folder 1 Folder 1">
<File Name="Level 1 Folder 1 Folder 1 Document 1.txt" />
<File Name="Level 1 Folder 1 Folder 1 Document 2.txt" />
<File Name="Level 1 Folder 1 Folder 1 Document 3.txt" />
</Directory>
</Directory>
<Directory Name="Level 1 Folder 2">
<File Name="Level 1 Folder 2 Document 1.txt" />
<File Name="Level 1 Folder 2 Document 2.txt" />
<File Name="Level 1 Folder 2 Document 3.txt" />
</Directory>
</Directory>
</Tree>
In order to re-create a whole tree out of this file, we need to implement the ParseXml(XmlElement, TreeBranch)
method:
public override void ParseXml(XmlElement element, TreeBranch parent)
{
Name = element.GetAttribute("Name");
IsDirectory = element.Name == "Directory";
foreach (XmlElement sub in element.ChildNodes.OfType<XmlElement>())
{
FileSystemTreeBranch branch = new FileSystemTreeBranch();
branch.ParseXml(sub, this);
}
parent?.AddBranch(this);
}
We just read out the name, set the IsDirectory
property and then iterate through all elements of this one, which is equal to sub-branches. Important here is that we use the LINQ-method OfType<XmlElement>()
. Else if you would insert comments or other XML node types, it would throw an InvalidCastException
. We create a branch for each element and call the ParseXml(...)
method on the newly created branch, passing through the current branch as parent. Last but not least, we add the current to the parent. Here, it is important that we use the null
propagation operator (?.
) because parent can be null
for the root node.
In order to serialize a branch, we need to implement WriteBinary(writer)
. Because binary serialization is sequential, we do not need any parent element. The tree will be "morphed" into a one-dimensional format.
public override void WriteBinary(BinaryWriter writer)
{
writer.Write(Name);
writer.Write(IsDirectory);
writer.Write(Branches.Count);
foreach (var branch in Branches)
branch.WriteBinary(writer);
}
What's important here is that we don't forget to serialize the amount of branches contained. We cannot get this information anymore afterwards. Wrong values would end fatally with an exception or reconstruction of only a part of the actual tree.
public override void ParseBinary(BinaryReader reader, TreeBranch parent)
{
Name = reader.ReadString();
IsDirectory = reader.ReadBoolean();
int branches = reader.ReadInt32();
for (int i = 0; i < branches; i++)
{
FileSystemTreeBranch branch = new FileSystemTreeBranch();
branch.ParseBinary(reader, this);
}
parent?.AddBranch(this);
}
The parser method does the same vice-versa. We read out the necessary information and set it again. We get the number of branches that are children of this one, create a new branch and call the ParseBinary(...)
method on it, providing the current instance this
as parent. After reading all sub-branches, we add it to the parent. Once again: Null propagation as the root node won't have any parent.
First off, you need to implement a tree that inherits from either Tree
or Tree<T>
. Then, you can use it like in the sample project:
tree.EnumerationMode = TreeEnumerationMode.BreadthFirst;
foreach (var branch in tree)
{
Console.WriteLine(branch);
}
tree.EnumerationMode = TreeEnumerationMode.DepthFirst;
foreach (var branch in tree)
{
Console.WriteLine(branch);
}
Serialization
BinaryFormatter formatter = new BinaryFormatter();
using (FileStream fs = new FileStream(Path.Combine(Environment.CurrentDirectory, "tree.legacybin"), FileMode.OpenOrCreate))
{
formatter.Serialize(fs, tree);
}
Deserialization
FileSystemTree fsTree;
using (FileStream fs = new FileStream(file, FileMode.Open))
fsTree = (FileSystemTree) formatter.Deserialize(fs);
Serialization
tree.WriteXml(Path.Combine(Environment.CurrentDirectory, "tree.xml"));
Deserialization
FileSystemTree fsTree = (FileSystemTree) Tree.ParseXml(file);
Serialization
tree.WriteBinary(Path.Combine(Environment.CurrentDirectory, "tree.bin"));
Deserialization
fsTree = (FileSystemTree)Tree<FileSystemTreeBranch>.ParseBinary(file);
Sidenote: You can use both classes' parsing methods:
fsTree = (FileSystemTree)Tree.ParseBinary(file);
is equal to
fsTree = (FileSystemTree)Tree<FileSystemTreeBranch>.ParseBinary(file);
is equal to
fsTree = Tree.ParseBinary<FileSystemTree>(file);
It is recommended though, that you use the Tree.Parse*<TTree>(...)
or Tree.Parse*(...)
methods as those only cast once. The Tree<T>.Parse*(...)
version casts, in worst case, twice (once to Tree<T>
and that form you will likely cast to CustomTreeImplementation
). This method exists for those rare usage cases where you actually have generic base-definition, e.g. in an interface which can be used by certain at compile-time unknown classes who all use some kind of Tree<MyDataBranch>
. You've seen that most logic is placed inside TreeBranch
implementations, not in the tree.
For more details, download the sample project.
I haven't tested the performance yet. In the test application, I've built in a simple test, but no real performance test. If anyone likes to do a test on it, I'd be glad to add your results here. What I've seen so far really depends on the size of the tree. Using the sample folder (very small - 15 items), the results are as follows:
- Iteration Speeds
- Breadth-first: ~ 2.454 ms (cold) / 0.021 ms (warm)
- Depth-first: ~ 1.111 ms (cold) / 0.008 ms (warm)
- Output Sizes
- Dump: 1 KB
- Legacy binary: 3 KB
- XML: 1 KB
- Binary: 2 KB
- Serialization Speeds
- Dump: ~ 3.880 ms
- Legacy binary: ~ 1.530 ms
- XML: ~ 11.31 ms
- Binary: ~ 3.351 ms
- Deserialization Speeds
- Legacy binary: ~ 1.960 ms (cold) / ~ 0.151 ms (warm) / ~ 300 KB memory
- XML: ~ 4.544 ms (cold ) / ~ 0.228 ms (warm) / ~ 300 KB memory
- Binary: ~ 1.980 ms (cold) / ~ 0.168 ms (warm) / ~ 100 KB memory
Whereas a structure like my whole system drive (664093 items) results in these timings:
- Iteration Speeds
- Breadth-first: ~ 17.56 ms (cold) / 13.72 ms (warm)
- Depth-first: ~ 60.45 ms (cold) / 59.59 ms (warm)
- Output Sizes
- Dump: 28.133 KB / ~ 27.4 MB
- Legacy binary: 108.041 KB / ~ 105 MB
- XML: 44.190 KB / ~ 43.1 MB
- Binary: 36.161 KB / ~ 35.3 MB
- Serialization Speeds
- Dump: ~ 323.8 ms (~ 0.324 s)
- Legacy binary: ~ 5161 ms (~ 5.161 s)
- XML: ~ 1425 ms (~ 1.425 s)
- Binary: ~ 203.5 ms (~ 0.203 s)
- Deserialization Speeds
- Legacy binary: ~ 3538 * 10^1 ms [~ 35.38 s] (cold) / ~ 3677 * 10^1 ms [~ 36.77 s] (warm) / ~ 1.328 MB memory
- XML: ~ 3547 ms [~ 3,547 s] (cold) / ~ 3582 ms [~ 3.582 s] (warm) / ~ 268 MB memory
- Binary: ~ 2947 ms [~ 2.947 s] (cold) / ~ 2.990 [~ 2.990 s] (warm) / ~ 125 MB memory
Sidenote: Enumeration timings were wrong previously to article version 3.1. Console.WriteLine()
was distorting the results, of course. Performance results should be a lot more accurate now (but still no thorough, correct performance test).
Conclusion
If you don't require to iterate depth-first, then you should probably stick to breadth-first, as it is way fast, especially with big trees >100.000 items.
The best choice for saving any tree is probably the custom binary implementation. Not only does it beat all other implementations in output size but also in serialization and deserialization speeds. And this not just slightly as you can see in the charts (based on the data above):
(Lower is better)
(Lower is better / cold timings)
Info: Cold timings because you probably will rarely deserialize in warm state. Moreover the ratios are nearly the same.
Using these classes, you can implement any kind of tree-based data structure. The sample I provided demonstrates this with an implementation of a file system tree.
History
- 18th January, 2016: Version 1
- Update: Information about compatibility (.NET Framework) and used C# language version; Conclusion about
TreeEnumerator
functionality
- 19th January, 2016: Version 2
- Added breadth-first search
- Fixed a bug in
TreeBranch.TabFromLevel()
- Added wikipedia links for depth-first and breadth-first searches (as explanation for how these enumerators work)
- Improved test application (with sample folder, output and timings)
- 19th January, 2016: Version 3
- Fixed a bug in the
BreadthTreeEnumerator
that made the first sub-branch of a deeper branch being enumerated twice
- Added
ISerializable
interface and custom XML serialization method
- Minor refactoring (accessors, overrides, ... that were unnecessary)
- 19th January, 2016: Version 3.1
- Added
SecurityPermission
on ISerializable
classes where it was missing, made serialization constructors private
- Updated
DepthTreeEnumerator<T>
as some parts where redundant
- Added some more detailed performance results (warm and cold)
- Updated performance tests and results (made a big mistake, results were totally wrong - sorry for that)
- 19th January, 2016: Version 4
- Added information in introduction
- Added Table of contents
- Added custom binary implementation
- Added overrides for parsing methods to
Tree<T>
- Added more detailed performance tests and charts
- 20th January, 2016: Version 4.1
- Added generic parser methods to
Tree
- Expanded deserialization code usage description
- Minor changes