Introduction
Metamorphic techniques aims to provide to applications the ability to physically change their code without moving away from the given objectives. This means that the actual code and structure of the application is dynamic and subject to changes, but what the application want to do remains unchanged. If your program objective is, for example, to perform some mathematical operations or manage I/O streams, then it will continue to do that and produce the same results… but how it will reach this objective changes.
Why shall an application perform such transformation on itself? Well, this technique has been developed some years ago (lot of years ago, actually) to primary fool Antiviruses. This is actually one of the techniques which makes the level of harm of viral applications to rise to higher level, because basically invalidates Antivirus signature databases, which are the most important information used to locate and later put in quarantine or delete viral infections in your PC. Together with polymorphic engines (see my introduction to them here), they are the actual methods to allow your application to run undetected on an Operating System.
Background
All the concepts founds in this article are for advanced users; using this technique without a deep knowledge in CPU Instruction Sets will likely result in a total fail of your application. C programming skill are a must, since this is the language chosen to develop the engine (my favourite language and my first choice in term of programming, actually). Knowledge of the common build tools is mandatory (like GCC and linkers), together with assemblers (NASM in this article) and additional utilities provided by Linux command line (objdump or readelf).
You also must know how an application is compiled and packed to be loaded and run in an Operating System. This means understanding the PE and ELF file format, for Windows and Linux platforms. Since the technology developed here is valid for x86 processor family, basically all Operating Systems using this platform can be affected by the following software.
As always, I’ll try to keep the description of the application internals as simple as I can, to allow a wider range of readers to understand the following concepts.
Warning: discretion is required.
This techniques can actually be embedded in whatever application to perform metamorphosing of a piece of code, of an entire application and of the engine itself too. These techniques, as I said, are mainly involved in malware development, but can be used more generically for both bad and good purposes. Is up to you to decide if you want to proceed in a direction or in the other one. You can design then applications which can hide in plain sight, or Antiviruses utilities which escape from malware detection and destruction (some viruses disable Antiviruses, or make them inoffensive).
Basically what presented in the next chapters is something similar to a weapon in the real world, and in these days of Virtual Wars.
How metamorphism works?
The main objective of a Metamorphic Engine is to change part of the routines codes of a certain areas of the application to make them change the “shape”, but not their job. Since this concept is not generally immediate to get, let’s introduce it with a metaphor: metamorphism in natural language. Applying metamorphism on an sentence written in english means finding out some synonyms with which you can change the shape of the sentence, while maintaining its meaning unchanged.
If we take the following sentence, for example:
The device will operate under water
and we instruct our metamorphic engine with the rule:
device <--> apparatus
what we will end up having after the sentence transformation will be something like:
The apparatus will operate under water
Now notice that, even if the words of the sentence changed, the meaning remains the same. I'm not a native english writer/speaker, so please try understand the concept behind the previous example; is not so immediate for me to play with methapors or english grammar.
If we traspose the previous sentence like CPU instructions, so that every word is an opcode that the processing unit can understand and execute, what will end up having is something of the following format:
Thedevicewilloperateunderwater
this because CPUs do not need any space, since their vocabolary is way more reduced than human one. Every word does exactly one job, which ultimately results in perform operations on the registers. This is a rough picture of how exactly works intruction sets in the CPU and I reduced a lot the complexity behind that. This allow me to introduce the following important problem with CPU instruction and metamorphism: the miss of alignment with the original code.
Missing the alignment with the original code means to start considering the first executable opcode from the wrong point of the sentence. If, for example, the initial 'T' letter is removed from the sentence, what we will found is something like:
hedevicewilloperateunderwater
which is then translated as the following sentence:
hed evicew illo perateu nderw ater
As you brain, result of millions of years of evolution, realize that something is wrong and all the words are "shifted" by one, a CPU does not. It will actually really try to run the operation "hed", and this will lead in the worse case in the generation of an exception. At this point an OS usually handles such exception by terminate the application and reporting the error to the user, which will realize that something nasty is happening in its computer.
The ability to understand the Instruction Set in the right way, together with the a location of a good starting point procedures to modify, are essential requirements for this technique. Usually metamorphic engines are "specially crafted" directly in assembly language, in such way that they just can change their own behavior or swap part of their code to perform metamorphism.
I did not like such restrictions, and so I realized a generic purpose metamorphic engine.
ISA
The Instruction-Set Analyzer (ISA) is a compact x86 scanner which provides the ability to navigate through the CPU instructions without loosing the alignment with them. This means that, once a reliable starting point is located, you can actually compute how much long is the next x86 opcode. If you know that, you can locate the right "spaces" between the CPU words, and thus following the execution path in the right way.
ISA is not a disassembler nor a CPU emulation tool (but can be extended to cover such roles); the only job it has is to locate the length of the opcodes provided, in order not to loose alignment with the code while scanning for patterns to transform in other instructions. ISA has been instructed to evaluate opcode sizes and format for 32 and 64 bits architectures, by following the documentation provided by Intel Manuals 325462 (search it on Intel web site for more information or to download your copy).
You can found ISA source codes in the isa folder, located in the project root. The source basically contains some tables which resume the logic of the Intel instruction set. By using such tables the analyzer can detect if an opcode is a prefix, if it has a ModR/M or SIB bytes, if there are displacement bytes after them and if the instruction requires an immediate value.
The prefix table, for example, is organized as follows:
#define TT (X86_ARCH_32)
#define SF (X86_ARCH_64)
#define AL (TT | SF)
static char x86_pre[256] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, AL,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, AL, 0, 0, 0, 0, 0, 0, 0, AL, 0,
0, 0, 0, 0, 0, 0, AL, 0, 0, 0, 0, 0, 0, 0, AL, 0,
0, 0, 0, 0, 0, 0, 0, 0, SF, SF, SF, SF, SF, SF, SF, SF,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, AL, AL, AL, AL, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
AL, 0, AL, AL, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
and identifies that 0x26 (row 2, column 6) is a valid prefix for both 32 and 64 bits architectures. Similar tables (for ModR/M byte and Immediates of one-, two- and three-bytes opcodes) can be found in the same code sheet, together with comments that help to understand how the overall logic has been organized.
The only public procedure in the x86.h header is the following:
int x86_decode(unsigned char * buf, x86arch arch, x86op * op);
and allows you to decode the opcode pointed by the buf pointer, by giving the architecture to take in account and a pointer to an opcode structure that will be filled with precious information. The resulting value is a negative value on error, otherwise the length of the evaluated opcode.
ISA have been debugged in order to make sure it wont loose the alignment while evaluating opcodes, but since this is a project made during my free time, I am not 100% sure that it can cover all the possible opcodes without errors. By looking in the test directory you can find the tests carried on ISA, which are composed by specially crafted assembly code sheets for both 32 and 64 bits architectures. In these files are listed all the possible valid operations (one per opcode entry) that a CPU can perform, aligned to 16 bytes and filled with Int3.
Iseta is a debugging utility that you can use to inspect the RAW binary created by the make, and will use ISA to navigate through the instruction and test if something is wrong with the analyzer internal mechanisms. You can invoke such application without any arguments to see a quick help, or with a similar command line to debug one of the compiled binaries:
piku@HAL:/M1/tests$ ./iseta x86/iset32.bin 0 4128 32
The expected result of this execution is something like:
Istruction Set Analizer debugging utility.
Operating in 32 bits mode.
Analyzing file chunk at 0:
00 d8 cc cc cc cc cc cc cc cc cc cc cc cc cc cc
OPCODE RESUME:
OP code:00
OP size:2
N.of prefixes: 0
1-byte operation!
ModR/M detected: d8
Press ENTER for next opcode...
Analyzing file chunk at 16:
01 d9 cc cc cc cc cc cc cc cc cc cc cc cc cc cc
OPCODE RESUME:
OP code:01
OP size:2
N.of prefixes: 0
1-byte operation!
ModR/M detected: d9
Press ENTER for next opcode...
and so on until the binary file termination. If everything is fine, you should always see exactly one opcode per ENTER keyboard input, starting from 0x00 and arriving to 0xFF. There are some holes in the tables, but they are there according to Intel manuals, since some opcodes are not allowed and probably reserved for the future.
M1 simple metamorphic engine
After having introduced all the necessary mechanism to understand problems and behaviors of metamorphic engines, let's see how they actually work. M1 is a simple, general purpose, metamorphic engine which uses ISA to scan opcodes and change operations according to the hardcoded rules. Since I want to keep this simple, the only rule which has been injected in M1 is arithmetic operation swapping for addition and subtraction. This means that additions of a value n will be changed in subtractionds with the value -n, and subtractions with the value n will be translated in additions of the value -n. Alongside with registers swap, this is one of the most basic operations that a metamorphic engine can perform.
To restrict even more the case, I will force my attention on addition/substraction operated on 1 byte immediate data, and just on certain registers. This means that the metamorphic engine will scan for 0x83 opcode family, and will affect only this types of operations that occurs on the selected registers.
If you invoke make in the project root, two applications will be generated: one is M1 and the other our famous dummy. Dummy will be the initial target where to test the metamorphosis. This application don't do anything smart, but only ask for a value which will be incremented and printed on the standard output five times by five different routines (the "something" family routines, see the sources).
If you try it just after the make, you can see the following output:
piku@HAL:/M1$ ./dummy
Give me a value!!!
piku@HAL:/M1$ ./dummy 1
Value is now 1
Value is now 2
Value is now 3
Value is now 4
Value is now 5
Another text document is generated alongside the applications, and is called dummy.pre.txt. This file will contains the disassembled code section of our dummy application, which is necessary to understand what is actually happening after M1 execution on it. If you concentrate on something procedure, present at the offset 40066e, line 197, you will see in details how the C code has been translated, compiled and then disassembled back from CPU opcodes.
000000000040066e <something>:
40066e: 55 push %rbp
40066f: 48 89 e5 mov %rsp,%rbp
400672: 48 83 ec 10 sub $0x10,%rsp
400676: 89 7d fc mov %edi,-0x4(%rbp)
400679: 8b 45 fc mov -0x4(%rbp),%eax
40067c: 89 c6 mov %eax,%esi
40067e: bf 74 07 40 00 mov $0x400774,%edi
400683: b8 00 00 00 00 mov $0x0,%eax
400688: e8 03 fe ff ff callq 400490 <printf@plt>
40068d: 8b 45 fc mov -0x4(%rbp),%eax
400690: 83 c0 01 add $0x1,%eax
400693: 89 c7 mov %eax,%edi
400695: e8 a6 ff ff ff callq 400640 <something_1>
40069a: c9 leaveq
40069b: c3 retq
Here 400672 and 400690 will be changed according to M1 necessity.
NOTE: For this and successive opcode reports, I will suppose you are operating on a 64bits architecture. the case for 32bits is a little different and not covered in this document.
Now it's time to run M1 while pointing to the unfortunate dummy utility. If you run the metamorphic engine with the following arguments:
piku@HAL:/M1$ ./m1 /M1/dummy 1646 45 2
You will evaluate dummy starting from 1646 bytes (in hex 0x66e), for 45 bytes (in hex 0x66e + 0x2d = 0x69b). This is the right offset where our "something" procedure is located! By invoking make dump after having run M1 you will dump another text file with the disassembled target application. By sliding the newly created file to the "something" application (always at 40066e), you will now found that the two operations we pointed out are different (highlighted here):
000000000040066e <something>:
40066e: 55 push %rbp
40066f: 48 89 e5 mov %rsp,%rbp
400672: 48 83 c4 f0 add $0xfffffffffffffff0,%rsp
400676: 89 7d fc mov %edi,-0x4(%rbp)
400679: 8b 45 fc mov -0x4(%rbp),%eax
40067c: 89 c6 mov %eax,%esi
40067e: bf 74 07 40 00 mov $0x400774,%edi
400683: b8 00 00 00 00 mov $0x0,%eax
400688: e8 03 fe ff ff callq 400490 <printf@plt>
40068d: 8b 45 fc mov -0x4(%rbp),%eax
400690: 83 e8 ff sub $0xffffffff,%eax
400693: 89 c7 mov %eax,%edi
400695: e8 a6 ff ff ff callq 400640 <something_1>
40069a: c9 leaveq
40069b: c3 retq
Which show that M1 did its job and swapped the operation correctly. But does the affected application still work? Well, why don't you invoke again the previous command on dummy and just see yourself?
piku@HAL:/M1$ ./dummy 1
Value is now 1
Value is now 2
Value is now 3
Value is now 4
Value is now 5
The same, exact, output is produced, but the application now has physically changed! Compliment, you just performed your first metamorphic operation. :-)
What if you want ot mutate more than one procedure? Well, nothing basically stop you from selecting a bigger area, but you always have to be sure that you are evaluating from a valid opcode and nothing in the middle destroy your alignment. M1 has been left dummy intentionally, since I just wanted to demonstrate how metamorphic engines work, and not really build a powerful one. It happened to me some time to modify some sections of code where, in the middle of it, there was stored what i belive was random data of some sort (well, invalid op codes); this mis-aligned the computation of ISA from that point on, invalidating the following modified procedures (which crashed the application by signalling a Segmentation Fault).
If you rebuild dummy invoking make again, and run M1 using the command line:
piku@HAL:/M1$ ./m1 /M1/dummy 1469 407 2
You will perform metamorphosis starting from procedure something4 and including the main one. If you dump again the dummy application now, you will notice that all the compatible operations have been swapped to the one instructed in M1. Again, if you try to invoke the dummy utility after it's metamorphosis, you will get again a valid output like:
piku@HAL:/M1$ ./dummy 1
Value is now 1
Value is now 2
Value is now 3
Value is now 4
Value is now 5
Moving forward
Now that we have an application which is capable of apply metamorphosis to other applications without destroying them, what is the next step? What else can we do to go on and validate more M1?
Well, obviously modifying a legacy application. :-)
During the tests i performed I tried, with success, to modify Explorer.exe application, but despite the code inside its text section was legal, Windows (version 10) detected the changes and prevented it from running (InPageError, 0xc0000428). While looking inside Windows folder then I copied and tried to run other applications in order to detect who else was able to run outside the classic "C:\Windows" domain, and I detected write.exe.
Write.exe is the simple Wordpad, an application which is half-way between Notepad and Microsoft Office Word (well, way more to the Notepad side I would say). To perform this test you will have to copy it in another folder so that the original file will not be affected by the changes (So from C:\Windows to whatever you want).
As a first operation I scanned write.exe for it's headers, and saved (as for dummy) a backup objdump trace by invoking the following commands (note that I saved the copy of write.exe in the project root):
piku@HAL:/M1$ objdump -x write.exe
piku@HAL:/M1$ objdump -S write.exe > write.pre.dump
The first operation is necessary to obtain information with which you will found the right locations where to modify the procedures, while the second is used to have a copy of the original internal to evaluate after the metamorphosis. The output of the header scan provided me the following feedbacks:
piku@HAL:/M1$ objdump -x write.exe | more
write.exe: file format pei-x86-64
write.exe
architecture: i386:x86-64, flags 0x0000012f:
HAS_RELOC, EXEC_P, HAS_LINENO, HAS_DEBUG, HAS_LOCALS, D_PAGED
start address 0x0000000140001420
Characteristics 0x22
executable
large address aware
Time/Date Sat Jul 16 04:28:49 2016
Magic 020b (PE32+)
MajorLinkerVersion 14
MinorLinkerVersion 0
SizeOfCode 00000a00
SizeOfInitializedData 00002200
SizeOfUninitializedData 00000000
AddressOfEntryPoint 0000000000001420
BaseOfCode 0000000000001000
ImageBase 0000000140000000
SectionAlignment 0000000000001000
FileAlignment 0000000000000200
MajorOSystemVersion 10
MinorOSystemVersion 0
MajorImageVersion 10
MinorImageVersion 0
MajorSubsystemVersion 10
MinorSubsystemVersion 0
Win32Version 00000000
SizeOfImage 00007000
SizeOfHeaders 00000400
CheckSum 00011d73
Subsystem 00000002 (Windows GUI)
DllCharacteristics 0000c160
SizeOfStackReserve 0000000000080000
SizeOfStackCommit 0000000000002000
SizeOfHeapReserve 0000000000100000
SizeOfHeapCommit 0000000000001000
LoaderFlags 00000000
NumberOfRvaAndSizes 00000010
In particular what is necessary here is the AddressOfEntryPoint and BaseOfCode values. By looking for the .text header you will find that is located, as classic, at offset 0x400 from the file start. Armed with such information now you can finally inspect and locate an area where to test the metamorphic engine. Since you can never know if the modified part is affected by the normal opening of the application (where the OS loader load in memory it and start its execution from the entry point), we will aim to modify a part which is immediately near the Entry Point (in a context of code path), so if something is wrong the application will immediately fail and you will know of your failure.
First, we need to read what there is actually at location 0x1420. You can quickly do it by looking at the write.pre.dump file and search for location 140001420; again, for the laziest one I'll report the output here:
14000141c: cc int3
14000141d: cc int3
14000141e: cc int3
14000141f: cc int3
140001420: 48 83 ec 28 sub $0x28,%rsp
140001424: e8 5b 02 00 00 callq 0x140001684
140001429: 48 83 c4 28 add $0x28,%rsp
14000142d: e9 7e fd ff ff jmpq 0x1400011b0
140001432: cc int3
140001433: cc int3
140001434: cc int3
140001435: cc int3
As you can see it does nothing special, and immediately move the execution to another location. So let's move to 140001684 and see what we can find there:
140001682: cc int3
140001683: cc int3
140001684: 48 89 5c 24 20 mov %rbx,0x20(%rsp)
140001689: 55 push %rbp
14000168a: 48 8b ec mov %rsp,%rbp
14000168d: 48 83 ec 20 sub $0x20,%rsp
140001691: 48 83 65 18 00 andq $0x0,0x18(%rbp)
140001696: 48 bb 32 a2 df 2d 99 movabs $0x2b992ddfa232,%rbx
14000169d: 2b 00 00
1400016a0: 48 8b 05 61 19 00 00 mov 0x1961(%rip),%rax # 0x140003008
1400016a7: 48 3b c3 cmp %rbx,%rax
1400016aa: 0f 85 8f 00 00 00 jne 0x14000173f
1400016b0: 48 8d 4d 18 lea 0x18(%rbp),%rcx
1400016b4: ff 15 76 0a 00 00 callq *0xa76(%rip) # 0x140002130
1400016ba: 48 8b 45 18 mov 0x18(%rbp),%rax
1400016be: 48 89 45 10 mov %rax,0x10(%rbp)
1400016c2: ff 15 78 0a 00 00 callq *0xa78(%rip) # 0x140002140
1400016c8: 8b c0 mov %eax,%eax
1400016ca: 48 31 45 10 xor %rax,0x10(%rbp)
1400016ce: ff 15 64 0a 00 00 callq *0xa64(%rip) # 0x140002138
1400016d4: 8b c0 mov %eax,%eax
1400016d6: 48 31 45 10 xor %rax,0x10(%rbp)
1400016da: ff 15 48 0a 00 00 callq *0xa48(%rip) # 0x140002128
1400016e0: 8b c0 mov %eax,%eax
1400016e2: 48 c1 e0 18 shl $0x18,%rax
1400016e6: 48 31 45 10 xor %rax,0x10(%rbp)
1400016ea: ff 15 38 0a 00 00 callq *0xa38(%rip) # 0x140002128
1400016f0: 8b c0 mov %eax,%eax
1400016f2: 48 8d 4d 10 lea 0x10(%rbp),%rcx
1400016f6: 48 33 45 10 xor 0x10(%rbp),%rax
1400016fa: 48 33 c1 xor %rcx,%rax
1400016fd: 48 8d 4d 20 lea 0x20(%rbp),%rcx
140001701: 48 89 45 10 mov %rax,0x10(%rbp)
140001705: ff 15 3d 0a 00 00 callq *0xa3d(%rip) # 0x140002148
14000170b: 8b 45 20 mov 0x20(%rbp),%eax
14000170e: 48 b9 ff ff ff ff ff movabs $0xffffffffffff,%rcx
140001715: ff 00 00
140001718: 48 c1 e0 20 shl $0x20,%rax
14000171c: 48 33 45 20 xor 0x20(%rbp),%rax
140001720: 48 33 45 10 xor 0x10(%rbp),%rax
140001724: 48 23 c1 and %rcx,%rax
140001727: 48 b9 33 a2 df 2d 99 movabs $0x2b992ddfa233,%rcx
14000172e: 2b 00 00
140001731: 48 3b c3 cmp %rbx,%rax
140001734: 48 0f 44 c1 cmove %rcx,%rax
140001738: 48 89 05 c9 18 00 00 mov %rax,0x18c9(%rip) # 0x140003008
14000173f: 48 8b 5c 24 48 mov 0x48(%rsp),%rbx
140001744: 48 f7 d0 not %rax
140001747: 48 89 05 c2 18 00 00 mov %rax,0x18c2(%rip) # 0x140003010
14000174e: 48 83 c4 20 add $0x20,%rsp
140001752: 5d pop %rbp
140001753: c3 retq
140001754: cc int3
140001755: cc int3
Ok, now things here are getting interesting, since this is a proper procedure. Some basic math will provide us then the location in the source file on the Hard disk and the size of the procedure (0x1684 - 0x1000 + 0x400 = 0xa84, which is decimal is 2692, while the size of this procedure is 206 bytes). You can verify if this is right by opening the executable file with vim and switch to hex view, like this:
piku@HAL:/M1$ vim ./write.exe
:%!xxd -c16
What is left is to invoke M1 with the right arguments to modify the procedure, as follows (I also reported the output of M1 this time):
piku@HAL:/M1$ ./m1 /M1/write.exe 2692 206 2
Read 206 bytes from location a84
Found one at 0x9
To add: 8b --> c0
Found one at 0xd
Found one at 0xca
Written 206 bytes to location a84
File mutated!
As you can see only one change is made, and two other possible candidates have been dropped because does not respect the limits we imposed to the metamorphic engine.
Time to repeat now the objdump operation and to check what is changed:
piku@HAL:/M1$ objdump -S write.exe > write.dump
Now, if you align the write.pre.dump file with write.dump file at location 0x140001684, what you will get is the following procedure (I higlighted the morphed operation at position 0x14000168d):
140001682: cc int3
140001683: cc int3
140001684: 48 89 5c 24 20 mov %rbx,0x20(%rsp)
140001689: 55 push %rbp
14000168a: 48 8b ec mov %rsp,%rbp
14000168d: 48 83 c4 e0 add $0xffffffffffffffe0,%rsp
140001691: 48 83 65 18 00 andq $0x0,0x18(%rbp)
140001696: 48 bb 32 a2 df 2d 99 movabs $0x2b992ddfa232,%rbx
14000169d: 2b 00 00
1400016a0: 48 8b 05 61 19 00 00 mov 0x1961(%rip),%rax # 0x140003008
1400016a7: 48 3b c3 cmp %rbx,%rax
1400016aa: 0f 85 8f 00 00 00 jne 0x14000173f
1400016b0: 48 8d 4d 18 lea 0x18(%rbp),%rcx
1400016b4: ff 15 76 0a 00 00 callq *0xa76(%rip) # 0x140002130
1400016ba: 48 8b 45 18 mov 0x18(%rbp),%rax
1400016be: 48 89 45 10 mov %rax,0x10(%rbp)
1400016c2: ff 15 78 0a 00 00 callq *0xa78(%rip) # 0x140002140
1400016c8: 8b c0 mov %eax,%eax
1400016ca: 48 31 45 10 xor %rax,0x10(%rbp)
1400016ce: ff 15 64 0a 00 00 callq *0xa64(%rip) # 0x140002138
1400016d4: 8b c0 mov %eax,%eax
1400016d6: 48 31 45 10 xor %rax,0x10(%rbp)
1400016da: ff 15 48 0a 00 00 callq *0xa48(%rip) # 0x140002128
1400016e0: 8b c0 mov %eax,%eax
1400016e2: 48 c1 e0 18 shl $0x18,%rax
1400016e6: 48 31 45 10 xor %rax,0x10(%rbp)
1400016ea: ff 15 38 0a 00 00 callq *0xa38(%rip) # 0x140002128
1400016f0: 8b c0 mov %eax,%eax
1400016f2: 48 8d 4d 10 lea 0x10(%rbp),%rcx
1400016f6: 48 33 45 10 xor 0x10(%rbp),%rax
1400016fa: 48 33 c1 xor %rcx,%rax
1400016fd: 48 8d 4d 20 lea 0x20(%rbp),%rcx
140001701: 48 89 45 10 mov %rax,0x10(%rbp)
140001705: ff 15 3d 0a 00 00 callq *0xa3d(%rip) # 0x140002148
14000170b: 8b 45 20 mov 0x20(%rbp),%eax
14000170e: 48 b9 ff ff ff ff ff movabs $0xffffffffffff,%rcx
140001715: ff 00 00
140001718: 48 c1 e0 20 shl $0x20,%rax
14000171c: 48 33 45 20 xor 0x20(%rbp),%rax
140001720: 48 33 45 10 xor 0x10(%rbp),%rax
140001724: 48 23 c1 and %rcx,%rax
140001727: 48 b9 33 a2 df 2d 99 movabs $0x2b992ddfa233,%rcx
14000172e: 2b 00 00
140001731: 48 3b c3 cmp %rbx,%rax
140001734: 48 0f 44 c1 cmove %rcx,%rax
140001738: 48 89 05 c9 18 00 00 mov %rax,0x18c9(%rip) # 0x140003008
14000173f: 48 8b 5c 24 48 mov 0x48(%rsp),%rbx
140001744: 48 f7 d0 not %rax
140001747: 48 89 05 c2 18 00 00 mov %rax,0x18c2(%rip) # 0x140003010
14000174e: 48 83 c4 20 add $0x20,%rsp
140001752: 5d pop %rbp
140001753: c3 retq
140001754: cc int3
140001755: cc int3
As you can see, again, M1 morphed a subtraction operation with an addition which will have the same effect on the CPU registers. Does write.exe work now? Why don't you run it and see for yourself?
Resume
Binary metamorphism is an interesting technique which allows to instruct an engine to morph a desired target set of instructions. The techniques goes from the most simple change, as the example showed in the previous chapters, to more complex ones which requires .text section extension, relocations recomputation and other kind of adjustment to avoid to destroy the application.
This techniques allows the creator to hide its application in plain sight, because any effort to create an hash on it will be destroyed by the morphing of the internal instructions. This of course does not stop Operating Systems and Antiviruses to perform integrity checks, as I saw in the test over explorer.exe application. As you seen this is all but simple, because requires precise Instruction-Set analysing software, correct morphing rules and alignment detection routines that avoid to change the wrong part of the application.
This is all, and I hope you enjoyed this simple introduction to metamorphism. :-)