Introduction
During my working experience, I had to process some user error reports concerning one of our company's products. These reports included call stack information intended to help us with the detection of error causes.
We use an obfuscation tool upon our production code, so the call stack information provided by an error report requires some “hopping around” with an obfuscation map and manual text search. This “hopping” is not always an easy thing to do – the obfuscation map is a huge XML file with a size of more than 25 MB, and most text editors do not appreciate such information volume at all. Such editors' preferences are reasonable, assuming that a usual human-made file rarely runs over the 1 MB boundary.
Things get worse when you need some syntax highlighting, or even more – XML tree parsing/navigation. The other big problem is that there is a huge number of XML elements with the same obfuscated name, and to identify their type, you should manually analyze the parent XML elements.
Task definition
Facing these problems, I decided to help our support team by automating the name resolving process. The automation task requirements were the following:
- It should be a tool that allows finding original Class or Class member names based on an obfuscated name.
- The UI should be as simple as possible – I think that simple tasks should not require complex user manipulations.
- Use LINQ to XML – this is a convenient and easy way to handle XML data. But, maybe, the main factor was that I, at last, had a chance to use this technology in practice.
Let’s move to the concrete steps. First, we define the input data.
Here is the call stack content example:
Type: System.ArgumentException
Stack:
at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
at System.Collections.Generic.Dictionary.Insert(TKey key, TValue value, Boolean add)
at ne.c(IError A_0)
at ne.c(ErrorListEventArgs A_0)
at ne.c.a()
at ne.c(Object A_0, EventArgs A_1)
at System.Windows.Forms.Timer.OnTick(EventArgs e)
at System.Windows.Forms.Timer.TimerNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg,
IntPtr wparam, IntPtr lparam)
Our company uses the Dotfuscator tool, its Community Edition is shipped with Visual Studio. Obfuscation maps have the same format for all obfuscator editions, so anyone can test this name resolving tool on their own code. The obfuscation map is an XML file whose structure looks like this:
<dotfuscatorMap version="1.1">
<header />
<mapping>
<module>
<name>ModuleName.dll</name>
<type />
...
<type/>
</module>
</mapping>
<statistics />
</dotfuscatorMap>
The <type>
element structure is the following:
<type>
<name>type_name</name>
<newname>obfuscated_name</newname>
<methodlist>
<method>
<signature>void(object, System.EventArgs)</signature>
<name>method_name</name>
<newname>obfuscated_name</newname>
</method>
...
</methodlist>
<fieldlist>
<field>
<signature>System.Windows.Forms.Button</signature>
<name>field_name</name>
<newname>obfuscated_name</newname>
</field>
...
</fieldlist>
</type>
You will notice that the obfuscated name is always placed in an optional <newname>
element. If this element is omitted, then the object uses its original name.
Next, we should define the user input. For example, we need to find a type with the obfuscated name “a”. Usually, we search for the “<newname>a</newname>” string – this will find all the types, methods, and fields that have the obfuscated name ‘a’. There are about several thousand results in complex projects. To achieve our search goal, we should analyze a parent element and detect if it is a <type>
element.
Thus, a user usually uses two parameters: the first parameter is an obfuscation map file path, and the second parameter is an obfuscated name. There is also one more (implicit) parameter – a search result type (type/method/field), but we will try to infer this parameter from the second. According to the requirement of UI simplicity, I think this is enough.
Type name search
The first task is an original type name search using an obfuscated name. Let’s do it. First, we need to enlist all types from the map file. This is an easy one:
XElement map = XElement.Load("sample.xml");
var types = map.Descendants("type");
The main operation here is the map.Descendants("type")
call, which returns all the <type>
elements from the XElement
content.
The Descendants()
method returns a plain collection of the descendant XML elements. This collection includes child elements, grandchildren elements, and etc. So, if we write map.Descendants()
, we will get all XML elements enumeration from the map document. This method has an overload that allows filtering the output collection by specifying the matching element name filter. I used this overload to filter out all elements except the <type>
.
Note: The filter name should be a fully qualified name; it means that if the filtered elements have a namespace, the filter name must have it too.
Note: Keep in mind that Descendants
uses deferred execution, meaning that the actual access to the underlying XML will be performed when you first access the Descendants
result rather then when you call this function.
map.Descendants("type")
will scan the whole XML tree for the specified element type; it is not the most effective solution, but the simplest one. Using direct element navigation that reduces the whole XML scan will be more productive. For example, we can use such an expression:
var types = map.Elements("mapping").Elements("module").Elements("type");
Depending on the XML content, this expression can give us ten times performance boost than the Descendants
call. But for this application, I prefer simplicity of the Descendants
function.
Now we have all the <type> elements, and need to find matches with the obfuscated name. I implement this using a LINQ query:
string obfuscatedName = "a";
var found = from type in types
where type.Element("newname").Value == obfuscatedName
select type;
The types
collection is filtered by matching the type’s child element <newname>
content with the passed obfuscated name. This can also be done using the Where
extension method with the lambda expression:
var found = types.Where(t => t.Element("newname").Value == obfuscatedName);
As stated before, the <newname>
element is optional, so Element("newname")
returns null
when the type is not obfuscated. To avoid possible NRE, I’ve changed LINQ query to the following:
var found = from type in map.Descendants("type")
let name = type.Element("newname") ?? type.Element("name")
where name.Value == obfuscatedName
select type;
This code will search types with obfuscated or original name matching obfuscatedName
.
The let
keyword introduces a new variable name
that holds a <newname>
element or a <name>
element in case no <newname>
element is present. This new variable is an anonymous type that consists of a current <type>
element and a <name>
/<newname>
element. Something like that:
new { Type = type, Name = type.Element("newname") ?? type.Element("name") };
The whole query can be represented in C# as:
IEnumerable<xelement> found = map.Descendants("type").
Select(type => new { Type = type, Name = type.Element("newname") }).
Where(tn => tn.Name.Value == obfuscatedName).
Select(tn => tn.Type);
As we can see, there is the second Select
function call, which (in conjunction with the anonymous type projection) will give us some performance penalty, so I rewrite the query to the following:
var found = from type in map.Descendants("type")
where (type.Element("newname") ?? type.Element("name")).Value ==
obfuscatedName
select type;
The next thing to do is to process complex type names. In XML, these names are separated by ‘/’ instead of ‘.’; e.g., the “MyClass.MyInternalClass” name is presented by a “MyClass/MyInternalClass” string value. We just need to replace “.” on “/” in the obfuscatedName
variable to allow a match:
obfuscatedName = obfuscatedName.Replace('.', '/');
At last, we provide anonymous type projection that will help us to process the search results in C#:
var types = from type in found
select new {
ModuleName = type.Parent.Element("name").Value,
TypeName = type.Element("name").Value
};
After that, you can process the search result as you wish; for example, output it to the console:
foreach (var type in types) {
Console.WriteLine("Module: {0}", type.ModuleName);
Console.WriteLine("Type: {0}", type.TypeName);
Console.WriteLine();
}
Summary
That’s it. We have found types providing the obfuscated name. In the next part, I will step deeper into the LINQ queries by providing Fields and Methods name resolving solutions.
Thanks for your time, and you are welcome to post any questions or suggestions.