Introduction
This article describes a practical usage of LINQ to XML.
This is the second and the last part of "Leveraging LINQ to XML: Querying an obfuscation map" article cycle. It's recommended to read the previous part - this will give you a better understanding of the things described below.
In the previous part, we used LINQ to XML queries to search for the original type name in an obfuscation map by providing its obfuscated name. Now, I want to show you how we can use advanced LINQ queries to provide more complex search tasks such as original type member name search.
Task definition
As previously defined, the user interface should be as simple as possible. So I decided to limit the UI with two editors. The first is used to provide the obfuscation map file name and the other one is for user search requests.
Let's take a look at three possible user inputs used as search criteria:
a
- this is, definitely, a search request for the obfuscated type "a
".a.b
- this is less clear - it could be a complex type name "a.b
", or a field named "b
" of type "a
". We can't distinguish these cases, so we will search for both - a type and a field.a.b(System.String, int)
- this is, definitely, a method search request with a signature (string, int)
.
Note: The complete method signature contains a result type, but since .NET doesn't support overloads based on a result type, we don't need it.
The member search task can be divided in two steps:
- Type search.
- Member search based on first step results.
We already have a code to complete the first step. To use it, we just need to provide the obfuscated type name. Thus, a list of more detailed steps is:
- Detect what kind of input is used;
- Parse the input to separate type names and a member name;
- Parse the member name to acquire the method signature;
- Search for the types with the parsed type names;
- Search for the field or method among the found type members that match the parsed member name.
Member name search
To separate a type name from a member name, we can split the input string using a "." separator and take the last part as a member name. But as we can see dots also present in a method signature, I decided to replace the signature's dots with something else that allows me to use a dotted type name split. The code below replaces "." with "/" within the method signature, if any:
int sigIndex = obfuscatedName.IndexOf('(');
if (sigIndex > 0)
obfuscatedName = new string(((IEnumerable<char>)obfuscatedName)
.Select((c, i) => i <= sigIndex ?
c :
c == '.' ? '/' : c)
.ToArray());
This code detects the signature presence by searching for the "(" symbol and replaces all "." with "/" inside the signature using the indexed Select
extension method.
Now, we can split the obfuscated name apart by simply calling the obfuscatedName.Split('.')
method. The last substring can be the method signature or the field name.
The method signature requires some additional parsing to detect the argument types, and the equality routine that will compare the signatures. I implemented this logic in the Signature
class. To achieve code unity, I also represent Field search request using the Signature
class - so any type member search request will be the Signature
instance. Signature
has three main members:
bool IsMethod
- indicates if this is a method signature.bool MemberName
- the name of the type member.bool SygEquals(string sig)
- check if the passed signature string equals to the instance.
I will not provide the Signature
class implementation details, because they are quite trivial and you can get them from the code sample at the start of the article.
There are two kinds of types be to searched:
- The first is a complete type that covers the whole search string (e.g., type name "a.b" input is "a.b")
- The other one is an incomplete type that partially covers an input string except the last name element (e.g., Type name "a" input is "a.b"). This type is required for member search as we have defined earlier.
Note: One more thing that I want to show is how to declare an anonymous type array before you fill it with data. For example, you want to declare an anonymous type array and fill it with data depending on some conditions and then process this array. In such a case, I use the next pattern:
var arr = new[] { new { Value = 0, Text = "" } };
if (SomeCondition)
arr = new[] { new { Value = 1, Text = "First" },
new { Value = 2, Text = "Second" } };
else
arr[0] = new { Value = -1, Text = "Failed" };
foreach (var e in arr)
Console.WriteLine(e.Text + ", " + e.Value);
Such an approach allows you not to define named types when you don't need them. I use this pattern to define an array of types to search with a flag that indicates if this type is a complete one.
var typeNames = new[] { new { Name = "", IsComplete = false } };
Now we need to rewrite the original type search query to use the typeNames
array.
var types = from type in map.Descendants("type")
join tname in typeNames on
(type.Element("newname") ?? type.Element("name")).Value
equals tname.Name
select new { TypeElement = type, IsComplete = tname.IsComplete };
The join
operation used here allows me to provide an easy filtering upon the typeNames
array. Anonymous type projection will be done for the query result. This projection contains the found type element and the flag indicating type completeness. To get all the found types, we can use a simple expression:
types.Where(t => t.IsComplete);
Note: LINQ queries use deferred execution basis; it means that the execution of a query is deferred until the moment you access the data, and even more - a query will be executed each time you access it. The next code demonstrates this behavior:
XElement xml = new XElement("parent",
new XElement("child1")
);
var qry = from e in xml.Elements()
select e;
xml.Add(new XElement("child2"));
foreach(var q in qry)
Console.Write(q.Name + " ");
Console.WriteLine();
xml.Add(new XElement("child3"));
foreach (var q in qry)
Console.Write(q.Name + " ");
To avoid redundant executions, you can cache the query result by using the .ToArray()
or .ToList()
extension methods. I intend to use the result of the types query both in the complete type search and the type member search, so in the worst case, the types query will be executed twice. To avoid this, I cache the query result using the .ToArray()
function.
Now, when we have found the types, we can proceed to the type members search. The straight approach is to use the nested select
:
var members =
from type in types
where !type.IsComplete
from member in
type.TypeElement.Element("methodlist").Elements("method")
where member.Element("newname").Value == signature.MemberName &&
signature.SygEquals(member.Element("signature").Value)
select member;
Here, we select methods from incomplete types and filter them by matching the Signature
objects. This query will be compiled to something like this:
var members =
types.Where(type => !type.IsComplete)
.SelectMany(
type => type.TypeElement.Element("methodlist").Elements("method"),
(type, member) => new { type = type, member = member }
)
.Where(typeMember =>
typeMember.member.Element("newname").Value == signature.MemberName &&
signature.SygEquals(typeMember.member.Element("signature").Value)
)
.Select(typeMember => typeMember.member);
Here, we can see the SelectMany
call with an anonymous type projection. In the general case, each type
element from the types
array contains a collection of methods, so it looks like a two-dimensional collection that consists of types where each type holds a methods collection. The SelectMany
call flattens this two-dimension collection to one-dimension, and then we filter it.
This is a general case, but in our case, we will have a collection of types with collections of one method at most (because of our search filter). So, we have a redundant SelectMany
call and an anonymous type projection that will impact the performance. We can't fix this issue using a clear LINQ syntax, but we can combine LINQ with extension methods and lambda expressions to achieve the desired result:
var members = (
from type in types
where !type.IsComplete
select type.TypeElement.Element("methodlist").Elements("method")
.SingleOrDefault(member =>
member.Element("newname").Value == signature.MemberName &&
signature.SygEquals(member.Element("signature").Value)
)
).Where(m => m != null);
Here, I have combined LINQ select
with the SingleOrDefault
extension method call that allows me to remove the SelectMany
call and the anonymous type projection. The last Where
method call filters out the default values from the result - this will give us an empty enumeration if nothing is found. Here is what it will be compiled to:
var members =
types.Where(type => !type.IsComplete)
.Select(type =>
type.TypeElement.Element("methodlist").Elements("method").SingleOrDefault(
member =>
member.Element("newname").Value == signature.MemberName &&
signature.SygEquals(member.Element("signature").Value)
)
).Where(m => m != null);
At the end, we can project the members query result to the anonymous type that helps us to process it in future.
var membersPrj = from member in members
select new {
ModuleName = member.Parent.Parent.Parent.Element("name").Value,
TypeName = member.Parent.Parent.Element("name").Value,
MemberName = member.Element("name").Value,
Signature = member.Element("signature").Value
};
There is no difference in the method and field search due to the unified Signature
class solution, so I generalized this approach for both the fields and methods search. Now, you can process the found members as you wish, for example, output them to the console:
foreach (var member in membersPrj) {
Console.WriteLine("Module: {0}", member.ModuleName);
Console.WriteLine("Type: {0}", member.TypeName);
Console.WriteLine("Member: {0}", member.MemberName);
Console.WriteLine();
}
The complete solution can be downloaded from the link at the top of the article.
Summary
That's all. There are still many things to do, for example, the application can process the whole call stack and can retrieve its de-obfuscated version; also, it's handy to have some external API, e.g., a command line or something like this, but this is out of the article scope.
I should also mention some issues that I have faced during development. The first is that it is hard to provide code decomposition because of anonymous types that can't be used as method parameters. The other one is that LINQ query debugging is quite difficult (but thanks to LINQ Pad, not as hard as it can be). All others are not so noticeable, so I don't think them worth to be mentioned here.
This article describes my first experience of a practical use of the LINQ to SQL technology. I hope you enjoyed reading, and that the article material will bring you some new and useful experience that you can apply in your practice.