Leveraging LINQ to XML: Querying an obfuscation map: Part 2

Alexander Yegorov

5.00/5 (4 votes)

5 Oct 2009CPOL7 min read

21.2K

158

A practical use of the LINQ to XML technology.

Introduction

This article describes a practical usage of LINQ to XML.

This is the second and the last part of "Leveraging LINQ to XML: Querying an obfuscation map" article cycle. It's recommended to read the previous part - this will give you a better understanding of the things described below.

In the previous part, we used LINQ to XML queries to search for the original type name in an obfuscation map by providing its obfuscated name. Now, I want to show you how we can use advanced LINQ queries to provide more complex search tasks such as original type member name search.

Task definition

As previously defined, the user interface should be as simple as possible. So I decided to limit the UI with two editors. The first is used to provide the obfuscation map file name and the other one is for user search requests.

Let's take a look at three possible user inputs used as search criteria:

a - this is, definitely, a search request for the obfuscated type "a".
a.b - this is less clear - it could be a complex type name "a.b", or a field named "b" of type "a". We can't distinguish these cases, so we will search for both - a type and a field.
a.b(System.String, int) - this is, definitely, a method search request with a signature (string, int).

Note: The complete method signature contains a result type, but since .NET doesn't support overloads based on a result type, we don't need it.

The member search task can be divided in two steps:

Type search.
Member search based on first step results.

We already have a code to complete the first step. To use it, we just need to provide the obfuscated type name. Thus, a list of more detailed steps is:

Detect what kind of input is used;
Parse the input to separate type names and a member name;
Parse the member name to acquire the method signature;
Search for the types with the parsed type names;
Search for the field or method among the found type members that match the parsed member name.

Member name search

To separate a type name from a member name, we can split the input string using a "." separator and take the last part as a member name. But as we can see dots also present in a method signature, I decided to replace the signature's dots with something else that allows me to use a dotted type name split. The code below replaces "." with "/" within the method signature, if any:

// Replace dots within method parameters with '/' to correctly split names.
int sigIndex = obfuscatedName.IndexOf('(');
if (sigIndex > 0) // Signature is present.

  // Explicit conversion required for IEnumerable extension methods call.
  obfuscatedName = new string(((IEnumerable<char>)obfuscatedName) 
    .Select((c, i) => i <= sigIndex ?  // If index less then signature start
        c :                  // bypass characters.
        c == '.' ? '/' : c)  // Otherwise if character inside signature 
                             // is '.' replace it with '/' otherwise bypass.
    .ToArray());

This code detects the signature presence by searching for the "(" symbol and replaces all "." with "/" inside the signature using the indexed Select extension method.

Now, we can split the obfuscated name apart by simply calling the obfuscatedName.Split('.') method. The last substring can be the method signature or the field name.

The method signature requires some additional parsing to detect the argument types, and the equality routine that will compare the signatures. I implemented this logic in the Signature class. To achieve code unity, I also represent Field search request using the Signature class - so any type member search request will be the Signature instance. Signature has three main members:

bool IsMethod - indicates if this is a method signature.
bool MemberName - the name of the type member.
bool SygEquals(string sig) - check if the passed signature string equals to the instance.

I will not provide the Signature class implementation details, because they are quite trivial and you can get them from the code sample at the start of the article.

There are two kinds of types be to searched:

The first is a complete type that covers the whole search string (e.g., type name "a.b" input is "a.b")
The other one is an incomplete type that partially covers an input string except the last name element (e.g., Type name "a" input is "a.b"). This type is required for member search as we have defined earlier.

Note: One more thing that I want to show is how to declare an anonymous type array before you fill it with data. For example, you want to declare an anonymous type array and fill it with data depending on some conditions and then process this array. In such a case, I use the next pattern:

// Decalare anonymous array.
var arr = new[] { new { Value = 0, Text = "" } };

// Fill array with data.
if (SomeCondition)
  // You can fill it with new instance.
  arr = new[] { new { Value = 1, Text = "First" }, 
                new { Value = 2, Text = "Second" } };
else
  // Or use existing elements.
  arr[0] =  new { Value = -1, Text = "Failed" };

// Process array data.
foreach (var e in arr)
  Console.WriteLine(e.Text + ", " + e.Value);

Such an approach allows you not to define named types when you don't need them. I use this pattern to define an array of types to search with a flag that indicates if this type is a complete one.

// Define array of anonymous types  - tricky but its works.
var typeNames = new[] { new { Name = "", IsComplete = false } };

Now we need to rewrite the original type search query to use the typeNames array.

// Define type query.
var types = from type in map.Descendants("type")
           join tname in typeNames on 
             (type.Element("newname") ?? type.Element("name")).Value 
             equals tname.Name
           select new { TypeElement = type, IsComplete = tname.IsComplete };

The join operation used here allows me to provide an easy filtering upon the typeNames array. Anonymous type projection will be done for the query result. This projection contains the found type element and the flag indicating type completeness. To get all the found types, we can use a simple expression:

types.Where(t => t.IsComplete);

Note: LINQ queries use deferred execution basis; it means that the execution of a query is deferred until the moment you access the data, and even more - a query will be executed each time you access it. The next code demonstrates this behavior:

XElement xml = new XElement("parent",
               new XElement("child1")
             );

// Query that retrieve all child elements.
var qry = from e in xml.Elements()
        select e;

// Add child element to show deferred execuiton.
xml.Add(new XElement("child2"));


foreach(var q in qry)
    Console.Write(q.Name + " ");
    // Print: "child1 child2".
Console.WriteLine();

// Add new child element to show "each time" execution.
xml.Add(new XElement("child3"));

foreach (var q in qry)
    Console.Write(q.Name + " ");
    // Print: "child1 child2 child3".

To avoid redundant executions, you can cache the query result by using the .ToArray() or .ToList() extension methods. I intend to use the result of the types query both in the complete type search and the type member search, so in the worst case, the types query will be executed twice. To avoid this, I cache the query result using the .ToArray() function.

Now, when we have found the types, we can proceed to the type members search. The straight approach is to use the nested select:

var members = // For each incoplete types
from type in types
where !type.IsComplete

// Select all type methods that has matched name and signature.
from member in
     type.TypeElement.Element("methodlist").Elements("method")
where member.Element("newname").Value == signature.MemberName &&
      signature.SygEquals(member.Element("signature").Value)
select member;

Here, we select methods from incomplete types and filter them by matching the Signature objects. This query will be compiled to something like this:

var members =
// Filter incomplete types.
types.Where(type => !type.IsComplete)

// Flatten all types members in one enumeration.
.SelectMany(
   type => type.TypeElement.Element("methodlist").Elements("method"),
    // Projecting result to temporary anonymous type.
   (type, member) => new { type = type, member = member }   
)

// Filter out type members by signature matching.
.Where(typeMember => 
   typeMember.member.Element("newname").Value == signature.MemberName &&
   signature.SygEquals(typeMember.member.Element("signature").Value)
)

// Select only member XElement from result
.Select(typeMember => typeMember.member);

Here, we can see the SelectMany call with an anonymous type projection. In the general case, each type element from the types array contains a collection of methods, so it looks like a two-dimensional collection that consists of types where each type holds a methods collection. The SelectMany call flattens this two-dimension collection to one-dimension, and then we filter it.

This is a general case, but in our case, we will have a collection of types with collections of one method at most (because of our search filter). So, we have a redundant SelectMany call and an anonymous type projection that will impact the performance. We can't fix this issue using a clear LINQ syntax, but we can combine LINQ with extension methods and lambda expressions to achieve the desired result:

var members = ( // For each incoplete types
 from type in types
 where !type.IsComplete

 // Select single method that mathes name and signature.
 select type.TypeElement.Element("methodlist").Elements("method")
  .SingleOrDefault(member => 
      member.Element("newname").Value == signature.MemberName &&
      signature.SygEquals(member.Element("signature").Value)
   )
).Where(m => m != null);

Here, I have combined LINQ select with the SingleOrDefault extension method call that allows me to remove the SelectMany call and the anonymous type projection. The last Where method call filters out the default values from the result - this will give us an empty enumeration if nothing is found. Here is what it will be compiled to:

var members =
    // Filter incomplete types.
    types.Where(type => !type.IsComplete)

    // Select method that mathes name and signature.
    .Select(type =>
        type.TypeElement.Element("methodlist").Elements("method").SingleOrDefault(
          member =>
            member.Element("newname").Value == signature.MemberName &&
            signature.SygEquals(member.Element("signature").Value)
        )
    ).Where(m => m != null);

At the end, we can project the members query result to the anonymous type that helps us to process it in future.

var membersPrj = from member in members
                 select new {
                   ModuleName = member.Parent.Parent.Parent.Element("name").Value,
                   TypeName = member.Parent.Parent.Element("name").Value,
                   MemberName = member.Element("name").Value,
                   Signature = member.Element("signature").Value
                 };

There is no difference in the method and field search due to the unified Signature class solution, so I generalized this approach for both the fields and methods search. Now, you can process the found members as you wish, for example, output them to the console:

foreach (var member in membersPrj) {
    Console.WriteLine("Module:      {0}", member.ModuleName);
    Console.WriteLine("Type:        {0}", member.TypeName);
    Console.WriteLine("Member:      {0}", member.MemberName);
    Console.WriteLine();
}

The complete solution can be downloaded from the link at the top of the article.

Summary

That's all. There are still many things to do, for example, the application can process the whole call stack and can retrieve its de-obfuscated version; also, it's handy to have some external API, e.g., a command line or something like this, but this is out of the article scope.

I should also mention some issues that I have faced during development. The first is that it is hard to provide code decomposition because of anonymous types that can't be used as method parameters. The other one is that LINQ query debugging is quite difficult (but thanks to LINQ Pad, not as hard as it can be). All others are not so noticeable, so I don't think them worth to be mentioned here.

This article describes my first experience of a practical use of the LINQ to SQL technology. I hope you enjoyed reading, and that the article material will bring you some new and useful experience that you can apply in your practice.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)