TinyObfuscate - A Tiny String Obfuscator for C / C++

Michael Haephrati

4.98/5 (24 votes)

13 Aug 2022CPOL7 min read

68.3K

2.5K

TinyObfuscate is a simple tool you can use when you need to obfuscate or conceal strings in your program; it prevents them from showing up with a strings or hex tool once your executable is examined.

Obfuscation makes data, such as a string, unintelligible. It is mainly done to preserve the proprietary of the source code and protect any intellectual property involved.

Download Obfuscated and original EXE - 28.1 KB

Get the commercial version

This article won second prize: Best C++ Article of October 2017

Introduction

The purpose of obfuscators is to hide parts of a program's code, flow, and functionality in a way that will make them unintelligible. Obfuscation will make it harder to reverse engineer it and reveal the 'secret sauce' if your program uses an algorithm that is a trade secret. Sometimes, you need to obfuscate strings in your program and don't want to use expensive and complex obfuscation tools (and there are quite a few out there). The following source code-level string obfuscator can be helpful in such cases.

Background

If you take a typical executable and dive into it using any hex editor, Strings, or even Notepad :), you may find many strings among the binary data that reveal trade secrets, IP addresses, or other pieces of information, all in the form of strings; you don't want to give away.

The purpose of TinyObfuscate is to hide these strings.

Before we begin, remember obfuscation is NOT a form of encryption. Keep in mind that every lock can be broken, as at some point, anything encrypted must be decrypted to be used. You will achieve stronger security if you encrypt the strings and then decrypt them during runtime. This is where obfuscation's advantage comes into play. When we obfuscate, we do not encrypt; we are Hiding in Plain Sight. Obfuscation hides the needle in the haystack. With obfuscation, it may take longer and require more resources to find the 'needle' than just decrypting an encrypted string, which can be easily found in some cases.

Security always requires using several methods in conjunction with each other; if one fails (or is hacked), the others will still maintain effective protection. Obfuscation should come last after everything else is implemented. Once you have added layers of encryption and thoroughly debugged the program, it's an excellent time to obfuscate it (note that an obfuscated source code is hard to maintain and update, so it's recommended to maintain the non-obfuscated version and obfuscate it before deploying a new version).

The purpose of the TinyObfuscate tool is to obfuscate, not to encrypt. The advantage of obfuscation is that nothing is encrypted, so nothing needs to be decrypted. The data remains as is but obscure.
Note that the tool shown in this article is a limited and elementary version to be used only for learning purposes. Obfuscation systems are sold for $10K+, and my tool is intended to provide you with only a small taste of what obfuscation is. Furthermore, the article only describes one aspect of obfuscation, strings obfuscation. When obfuscation is performed in commercial products, it includes obfuscating functions, API calls, and more.

Using the Tool

I have written about string obfuscation in the past, but what makes this article unique is the easy method for obfuscating strings in your source code. There is no need to run any tool or scan your project. Instead, copy and paste your sensitive string (for example, "my secret string") and name the variable you plan to use (by default, to support UNICODE, that would be wchar_t). You will get an initialization source code to use.

Then instead of using this code:

C++

wchar_t m_Variable[] = L"my secret string"

You run this tool...

and enter the string and the variable name and then copy the result:

C++

// (15.08.2020 19:15:21) Obfuscated: 'My secret string'
wchar_t* s_2411641058()
{
  wchar_t* _2411641058 = new wchar_t[32];
  _2411641058[0x9] = L'g' - 0x47;
  _2411641058[0x3] = L'4' + 077;
  _2411641058[0x1e] = L'h' - 04;
  _2411641058[0x4] = 0101 + 0x24;
  _2411641058[0x1] = L'h' + 021;
  _2411641058[0xc] = L's' - 0x1;
  _2411641058[0x16] = 119;
  _2411641058[0xe] = L'8' + 066;
  _2411641058[0x12] = 0x7a - 023;
  _2411641058[0x10] = 0;
  _2411641058[0x11] = 0x7e - 011;
  _2411641058[0x1b] = L'G' + 033;
  _2411641058[0xb] = 0150 + 0xc;
  _2411641058[0x13] = L'6' + 067;
  _2411641058[0x6] = 114;
  _2411641058[0x1c] = L')' + 0110;
  _2411641058[0x7] = 0106 + 0x1f;
  _2411641058[0x19] = L'g' - 05;
  _2411641058[0xd] = 105;
  _2411641058[0x17] = 0116 + 0x16;
  _2411641058[0x8] = 116;
  _2411641058[0x1f] = L'x' - 0x7;
  _2411641058[0x1a] = 107;
  _2411641058[0x1d] = 076 + 0x2d;
  _2411641058[0x18] = 0141 + 0x1; 
  _2411641058[0x5] = 062 + 0x31;
  _2411641058[0x14] = 0137 + 0x1b;
  _2411641058[0x15] = 057 + 0x41;
  _2411641058[0xa] = L'w' - 04;
  _2411641058[0x0] = 77;
  _2411641058[0xf] = 0133 + 0xc;
  _2411641058[0x2] = 0x3b - 033;
  _2411641058[0x1f] = '\0';
  return _2411641058;
}

You can test each option by building an executable and searching for the string "My secret string" (Best to use Strings along with the 'Findstr' option). When the obfuscated version is used, the string won't be found. Let's say your software connects to a remote server; you store the IP being used and don't want it revealed. This way, you can mask and hide sensitive data. The data will only be hidden from the executable file. However, once you communicate with a remote server, sniffing tools will show the IP and anything sent or received.

There is a way to hide IP and data from sniffing tools (for example, Wireshark). We developed such POC

for a large government agency several years ago. Even though we could hide any communication between our program and a server, including the IP of that server, we still needed to develop an end-to-end encrypted communication protocol and obfuscate the IP address (along with other sensitive data) inside the program's file itself.

Large corporates use obfuscation for any sensitive software. For example, Microsoft Windows'’ Patch Guard is fully obfuscated, making it harder to reverse engineer it. The methods used to obfuscate Windows-sensitive components goes way beyond obfuscating only strings but also include obfuscating function names, variables, etc.

The Samples Provided

I have created a small console application named CodeProjectTest.exe which has one line:

C++

wchar_t m_Variable[] = L"my secret string;

Then I obfuscated it using TinyObfuscate, and built, naming it obf_CodeProjectTest.exe.

Both can be downloaded here (they are both code signed using an EV Code Signing Certificate).

I checked both with Strings using the 'findstr' option.

C++

strings CodeProjectTest.exe | findstr /i "secret"

and:

C++

strings obf_CodeProjectTest.exe | findstr /i "secret"

The results are shown in the following screenshots:

Before obfuscation, the string "my secret string" was found.
After obfuscation, the string "my secret string" was not found.

The Source Code - The Building Blocks

Random Characters and Digits

Let's begin by saying it is not recommended to use rand(). See this article for the reasons. Instead, I used the code from the following article by Arvid Gerstmann.

Using this code, I created a simple function that returns a random number within a given range.

C++

int RandomIntFromRange(int From, int To)
{
    int result;
    std::random_device rd;
    pcg rand(rd);
    std::uniform_int_distribution<> u(From, To);
    result = u(rand);
    return result;
}

We need to be able to generate random characters and random digits. I have created the following macros:

C++

#define RANDOM_DIGIT (RandomIntFromRange(1,9))
#define RANDOM_WCHAR (WCHAR)(RandomIntFromRange(L'a',L'z'))
#define RANDOM_INT_LARGER_THAN(n) (int)(RandomIntFromRange(n,122))
#define RANDOM_INT_SMALLER_THAN(n) (int)((n>48)?RandomIntFromRange(48, n):n)

Handle Escape Characters

When you enter as an input a string that contains Escape characters, you need to treat them differently, otherwise, they will not be coded correctly. The following function will replace Escape characters such as '\n' (which will be represented as "\\n") with the correct value which is 0x0a.

C++

CString ProcessEscapeString(CString p_szOriginalStr)
{
    CString w_szProcessStr;
    wchar_t w_pESC_char[] = { L'\a', L'\b', L'\f', L'\n', 
                              L'\r', L'\t', L'\v', L'\\', L'\0'};
    wchar_t w_pESC_str[] = { L'a', L'b', L'f', L'n', 
                             L'r', L't', L'v', L'\\', L'0'};
    int i, j;
    int w_nLength = p_szOriginalStr.GetLength();

    // parse escape characters
    for (i = 0; i < w_nLength; i++)
    {
        if (p_szOriginalStr.GetAt(i) == L'\\')
        {
            for (j = 0; j < 9; j++)
            {
                if (p_szOriginalStr.GetAt(i + 1) == w_pESC_str[j])
                {
                    w_szProcessStr += w_pESC_char[j];
                    i++;
                    break;
                }
            }
            if (j >= 9)
            {
                w_szProcessStr += p_szOriginalStr.GetAt(i);
            }
        }
        else
        {
            w_szProcessStr += p_szOriginalStr.GetAt(i);
        }
    }
    
    return w_szProcessStr;
}

Shuffle Elements

When we convert the string into an array, we want to shuffle it so the order will be (almost) random, making it harder to analyze. One of the methods to decode obfuscated data is to examine what you expect to be the logical order. Shuffling the order makes it harder to guess what the obfuscated data is..

C++

void shuffle(int array[], const int size)
{
    const int n_size = size;
    int temp[1028];
    std::vector<int> indices;

    for (int i(0); i < size; ++i)
        temp[i] = array[i];

    int index = rand() % size;
    indices.push_back(index);

    for (int i = 0; i < size; ++i)
    {
        if (i == 0)
            array[i] = temp[index];
        else
        {
            while (find(indices, index))
                index = rand() % size;

            indices.push_back(index);
            array[i] = temp[index];
        }
    }
}

Adding Junk

Another method of concealing the content is adding random junk data in between the real data. Since the result is a NULL terminated array, that's easy. You place the NULL at the end of the string and the junk after the NULL but since we later convert each value into a formula (instead of "72" we may put "100 - 28"), this method is good enough for our purpose.

C++

TextWithJunk += (CString)L" ";
for (i = Length + 1; i < Length * 2; i++)
{
    WCHAR result = RANDOM_WCHAR;
    TextWithJunk += (CString)(result);
}

Replacing Values With Formulas

Then we randomly replace values with different types of formulas such as x=z-y or z=y+z, etc.

So when the formula is x=z-y, we need z to be random but larger than y. That's why we use RANDOM_INT_LARGER().

C++

switch (choice)
{
    case 10:
    case 1:
        // x = z - y
        // Calculate Z
        z = RANDOM_INT_LARGER_THAN(x);
        // Calculate the difference
        d = z-x;
        Formula.Format(L"%d - %d",z,d);
        break;
    case 2:
    case 3:
        // x = z + y
        // Calculate Z
        z = RANDOM_INT_SMALLER_THAN(x);
        // Calculate the difference
        d = x - z;
        Formula.Format(L"%d + %d", z, d);
        break;
    case 4:
    case 5:
        // x = 'z' - y
        // Calculate Z
        z = RANDOM_INT_LARGER_THAN(x);
        // Calculate the difference
        d = z - x;
        Formula.Format(L"L'%c' - %d", z, d);
        break;
    case 6:
    case 7:
        // x = 'z' + y
        // Calculate Z
        z = RANDOM_INT_SMALLER_THAN(x);
        // Calculate the difference
        d = x - z;
        Formula.Format(L"L'%c' + %d", z, d);
        break;
    case 8:
    case 9:
        // x = 'z'
        Formula.Format(L"%d",x);
        break;
}

Tiny Obfuscate - the Advanced Version

Since the initial publication of this article, we have continued enhancing the project, and the recent version of TinyObfuscate is used as part of our day-to-day development, including in several commercial products.

The recent version has two modes:

Project Mode
Immediate Mode

The Immediate Mode resembles the original version mentioned in this article but has additional features and enhancements.

You can select the type of string (UNICODE or wide char, const).
The obfuscated code is wrapped inside a new function that is generated.
Optionally: the function code and prototype are inserted into a given .cpp and .h, not before checking whether there isn't already a function that obfuscates the given string.
The function call is copied to the Clipboard (either the newly generated function or an existing one of the given strings was obfuscated before), so the user can just paste it instead of the given string.
The generated function is automatically tested to verify it will return the given string.
Various control and escape characters are handled. These include: \n, \t, etc. %s, %d and so forth.

Comments are automatically added to keep track of the original obfuscated string and when it was obfuscated.

Example of Obfuscation

Here is an example of the process of obfuscating a given string and a string that contains a formatting character ("%d").

The code line used in this example is:

wprintf(L"The result is %d", result);

We must obfuscate the actual string to replace it in the call to wprintf with the generated function. The string is placed in the "String to obfuscate" field, and the user presses ENTER.

As a result, the following takes place:

1. A "balloon alert" appears

2. The following code will appear (and inserted to the project’s source and header files).

Now, you can paste the function to the source code line, which will now look like this:

wprintf(s_1111865989(),result);

Points of Interest

The project was created using Visual Studio 2019 Ultimate, using MFC.

The attached executable has been Code Signed using our Extra Validation (EV) Code Signing Certificate.

History

12^th October, 2017: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)