Introduction
A paper by Stephen Dolan which proved that the x86 mov
instruction is Turing-complete was released in 2013 and followed by a movfuscator project which made an entire compiler out of the project. See the paper: https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf. Inside the paper, a Universal Turing Machine (UTM) is proposed as the proof of the Turing-completeness using only a single jump instruction.
This code also serves as a reference for some details on how to achieve x86 and x86-64 conditional compilation, with both inline assembly and external assembly and some semantics unique perhaps to the Windows platform though applicable to any.
Background
Knowing of models of computation, Turing-completeness, Universal Turing Machines, Turing Machines and the C and x86 assembly programming languages is preferable along with inline assembler and Windows technologies or knowledge of how to port them to other platforms such as Linux and Mac (which should not be too difficult).
Using the Code
2 extra Turing machine models one which is not general and state limited, the other which is not but both of which are input tape size limited are included in the download package for additional examples in both C and assembly formats.
The definition followed by the code are presented which allows for a full C emulation of the research paper. Some cheating using a cmp
and je
instructions to gracefully abort the code in C without crashing is used as the original research paper allowed a segmentation fault with bad memory address 0 access to halt.
Deductions based on the code had to be made to figure out the data structure and format of the Turing machine, as well as the input tape and how to properly initialize them. Defining registers based on looking at mutual exclusion, and frequency of use was done to do it in the lowest impact way. The C prolog before the assembly adapts a simple encoding of a Turing machine which is assumed to be in current state sorted order into a memory addressed encoded version.
The code compiles and runs successfully on Microsoft Visual Studio 2015.
First, a Turing Machine which does some computation must be codified based on having the following format:
char turingMachine[][5] = {
{ '1', '0', '0', 'R', '1' },
{ '1', '1', '1', 'R', '1' },
{ '1', '_', '_', 'R', '3' },
{ '3', '0', '0', 'R', '3' },
{ '3', '1', '1', 'R', '4' },
{ '3', '_', '_', 'R', '2' },
{ '4', '0', '0', 'R', '4' },
{ '4', '1', '1', 'R', '4' },
{ '4', '_', '_', 'L', '5' },
{ '5', '0', '1', 'L', '5' },
{ '5', '1', '0', 'L', '6' },
{ '6', '0', '0', 'L', '6' },
{ '6', '1', '1', 'L', '6' },
{ '6', '_', '_', 'L', '7' },
{ '7', '1', '0', 'L', '7' },
{ '7', '0', '1', 'R', '1' },
{ '7', '_', '1', 'R', '1' }
}; char Input[] = "0100_1001_";
This is a Turing machine to perform binary addition using a decrement, then increment until 0 strategy. The initial input is encoded as 4 and 5 so the result should be 9 and 0. It must be in sorted order by current state value so it can be processed so if not presorted then quick sort or another option can be used. The following function is used to do binary search lookups on the new state:
int TMCmp(const void* l, const void* r)
{
if (*(unsigned char*)l == *(unsigned char*)r) return 0;
return *(unsigned char*)l > *(unsigned char*)r ? 1 : -1;
}
For external assembler support which is required for x64:
#include <basetsd.h>
extern "C" void __cdecl MovTM
(ULONG_PTR, ULONG_PTR, ULONG_PTR, ULONG_PTR, ULONG_PTR, ULONG_PTR);
And the rest of the code is as follows:
ULONG_PTR* TransitionTape = (ULONG_PTR*)malloc((sizeof(turingMachine) / 5)
* 9 * sizeof(ULONG_PTR)); ULONG_PTR* InputTape = (ULONG_PTR*)malloc(sizeof(Input) * sizeof(ULONG_PTR) * 2);
ULONG_PTR* InputTapeLeft = (ULONG_PTR*)malloc(sizeof(Input) * sizeof(ULONG_PTR) * 2);
for (int i = 0; i < sizeof(turingMachine) / 5; i++) {
TransitionTape[i * 9] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 2;
TransitionTape[i * 9 + 3] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 4;
TransitionTape[i * 9 + 5] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 6;
TransitionTape[i * 9 + 7] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 8;
char* ptr = (char*)bsearch(&turingMachine[i][4], turingMachine,
sizeof(turingMachine) / 5, 5, TMCmp);
int idx;
if (ptr != NULL) {
idx = (ptr - (char*)turingMachine) / 5;
while (idx != 0 && turingMachine[idx - 1][0] == turingMachine[i][4]) idx--;
}
TransitionTape[i * 9 + 8] = ptr == NULL ? (turingMachine[i][4] == '2' ?
(ULONG_PTR)InputTapeLeft : (ULONG_PTR)InputTapeLeft) :
(ULONG_PTR)(TransitionTape + (idx * 9));
TransitionTape[i * 9 + 2] = turingMachine[i][1];
TransitionTape[i * 9 + 4] = turingMachine[i][2];
TransitionTape[i * 9 + 6] = turingMachine[i][3] == 'R' ? sizeof(ULONG_PTR) : 0;
TransitionTape[i * 9 + 1] = ((i == sizeof(turingMachine) / 5 - 1) ||
turingMachine[i + 1][0] != turingMachine[i][0]) ? (ULONG_PTR)InputTapeLeft :
(ULONG_PTR)(TransitionTape + (i + 1) * 9);
}
for (int i = 0; i < sizeof(Input); i++) {
InputTape[i * 2] = Input[i];
InputTapeLeft[i * 2] = '_';
InputTape[i * 2 + 1] = (ULONG_PTR)(InputTape + (i + 1) * 2);
InputTapeLeft[i * 2 + 1] = (ULONG_PTR)(InputTapeLeft + (i + 1) * 2);
}
ULONG_PTR scratch[256]; ULONG_PTR TransitionTapePtr = (ULONG_PTR)TransitionTape;
ULONG_PTR InputTapePtr =
(ULONG_PTR)InputTape + sizeof(ULONG_PTR) * 2; ULONG_PTR InputTapeLeftPtr =
(ULONG_PTR)InputTapeLeft; MovTM((ULONG_PTR)TransitionTapePtr, (ULONG_PTR)InputTape,
(ULONG_PTR)InputTapeLeft, (ULONG_PTR)scratch, (ULONG_PTR)InputTapePtr, (ULONG_PTR)InputTapeLeftPtr);
#if !defined(_WIN64)
for (int i = 0; i < sizeof(Input); i++) {
InputTape[i * 2] = Input[i];
InputTapeLeft[i * 2] = '_';
InputTape[i * 2 + 1] = (ULONG_PTR)(InputTape + (i + 1) * 2);
InputTapeLeft[i * 2 + 1] = (ULONG_PTR)(InputTapeLeft + (i + 1) * 2);
}
#define ADDRPLIER 4
#define SIZEWORD dword
#define X eax
#define Y ebx
#define M ecx
#define Z Y
#define D Z
#define H M
#define L edi
#define R L
#define T R
#define S edx
#define N esi
__asm {
mov T, TransitionTapePtr
mov S, InputTape
mov N, InputTapeLeft
start:
mov X, [T]
mov X, [X]
mov Y, [S]
mov SIZEWORD ptr [scratch + Y * ADDRPLIER], 0
mov SIZEWORD ptr [scratch + X * ADDRPLIER], 1 * ADDRPLIER
mov M, SIZEWORD ptr [scratch + Y * ADDRPLIER]
mov X, [T]
mov X, [X + 1 * ADDRPLIER]
mov X, [X]
mov Y, [S]
mov [N], Y
mov [N + 1 * ADDRPLIER], X
mov Z, [N + M]
mov [S], Z
mov D, [T]
mov D, [D + 1 * ADDRPLIER]
mov D, [D + 1 * ADDRPLIER]
mov D, [D]
mov R, InputTapePtr
mov [N], R
mov L, InputTapeLeftPtr
mov [N + 1 * ADDRPLIER], L
mov X, [N + D]
mov [S + 1 * ADDRPLIER], X
mov [N], L
mov [N + 1 * ADDRPLIER], S
mov L, [N + D]
mov InputTapeLeftPtr, L
mov [N], S
mov R, InputTapePtr
mov [N + 1 * ADDRPLIER], R
mov R, [N + D]
mov InputTapePtr, R
mov SIZEWORD ptr [N], 1 * ADDRPLIER
mov SIZEWORD ptr [N + 1 * ADDRPLIER], 0
mov X, [N + D]
mov [N], X
mov [N + 1 * ADDRPLIER], D
mov D, [N + M]
mov L, InputTapeLeftPtr
mov [N], L
mov R, InputTapePtr
mov [N + 1 * ADDRPLIER], R
mov S, [N + D]
mov X, [S + 1 * ADDRPLIER]
mov [N], X
mov L, InputTapeLeftPtr
mov [N + 1 * ADDRPLIER], L
mov L, [N + D]
mov InputTapeLeftPtr, L
mov R, InputTapePtr
mov [N], R
mov [N + 1 * ADDRPLIER], X
mov R, [N + D]
mov InputTapePtr, R
mov T, TransitionTapePtr
mov X, [T + 1 * ADDRPLIER]
mov Y, [T]
mov Y, [Y + 1 * ADDRPLIER]
mov Y, [Y + 1 * ADDRPLIER]
mov Y, [Y + 1 * ADDRPLIER]
mov Y, [Y]
mov [N], X
mov [N + 1 * ADDRPLIER], Y
mov T, [N + M]
mov TransitionTapePtr, T
mov X, [T]
mov SIZEWORD ptr [N], 1 * ADDRPLIER
mov SIZEWORD ptr [T], 0
mov H, [N]
mov [T], X
mov SIZEWORD ptr [N], 0
mov [N + 1 * ADDRPLIER], N
mov X, [N + H]
cmp X, 0
je finish
mov X, [X]
jmp start
finish:
}
#endif
free(TransitionTape);
free(InputTape);
free(InputTapeLeft);
x86 Pre-Link Event command (safeseh likely only needed on the release build): ml.exe /safeseh /Fo "$(OutDir)tm.obj" /c tm.asm
x64 Pre-Link Event command: ml64.exe /DX64 /Fo "$(OutDir)tm.obj" /c tm.asm
x86/64 Linker Input Additional Dependency: $(OutDir)\tm.obj
x64 Post-Build Event for debug build only (C runtime issue for assembly linkage): copy "$(MSBuildProgramFiles32)\Microsoft SDKs\Windows Kits\10\ExtensionSDKs\Microsoft.UniversalCRT.Debug\10.0.14393.0\Redist\Debug\x64\ucrtbased.dll" "$(OutDir)" copy "$(MSBuildProgramFiles32)\Microsoft Visual Studio 14.0\VC\redist\debug_nonredist\x64\Microsoft.VC140.DebugCRT\vcruntime140d.dll" "$(OutDir)"
Assembly x64 prologue:
ifndef X64
.model flat, C
endif
option casemap :none
.code
ifndef X64
ADDRPLIER EQU 4
SIZEWORD TEXTEQU <dword>
X TEXTEQU <eax>
Y TEXTEQU <ebx>
M TEXTEQU <ecx>
Z EQU Y
D EQU Z
H EQU M
L TEXTEQU <edi>
R EQU L
T EQU R
S TEXTEQU <edx>
N TEXTEQU <esi>
MovTM PROC USES ebx edi esi TransitionTapePtr:DWORD, InputTape:DWORD, InputTapeLeft: DWORD,
scratch: DWORD, InputTapePtr: DWORD, InputTapeLeftPtr: DWORD
else
ADDRPLIER EQU 8
SIZEWORD TEXTEQU <qword>
X TEXTEQU <rax>
Y TEXTEQU <rbx>
M TEXTEQU <rdi>
Z EQU Y
D EQU Z
H EQU M
L TEXTEQU <r10>
R EQU L
T TEXTEQU <rcx>
S TEXTEQU <rdx>
N TEXTEQU <r8>
scratch TEXTEQU <r9>
TransitionTapePtr TextEQU <T>
InputTape TextEQU <S>
InputTapeLeft TextEQU <N>
MovTM PROC USES rbx rdi Dummy1: QWORD, Dummy2: QWORD, Dummy3: QWORD,
Dummy4: QWORD, InputTapePtr: QWORD, InputTapeLeftPtr: QWORD
endif
The code inside the __asm {}
block above should be put in between followed by this assembly x64 epilogue:
ret
MovTM ENDP
END
For the sake of simplicity of understanding the dual linked list stack and memory pointer transition table constructs, the following C pseudo code which introduces control flow simplification represents the general functionality:
ULONG_PTR CurInputTapePtr = (ULONG_PTR)InputTape;
TransitionTapePtr = (ULONG_PTR)TransitionTape;
InputTapePtr = (ULONG_PTR)InputTape + sizeof(ULONG_PTR) * 2;
InputTapeLeftPtr = (ULONG_PTR)InputTapeLeft;
do {
if (*(ULONG_PTR*)(*(ULONG_PTR*)TransitionTapePtr) == *(ULONG_PTR*)CurInputTapePtr) {
*(ULONG_PTR*)CurInputTapePtr = *(ULONG_PTR*)(*(ULONG_PTR*)
(*(ULONG_PTR*)TransitionTapePtr + sizeof(ULONG_PTR)));
if (*(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)TransitionTapePtr +
sizeof(ULONG_PTR)) + sizeof(ULONG_PTR))) == 0) {
*(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR)) = InputTapePtr;
InputTapePtr = CurInputTapePtr;
CurInputTapePtr = InputTapeLeftPtr;
InputTapeLeftPtr = *(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR));
} else {
*(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR)) = InputTapeLeftPtr;
InputTapeLeftPtr = CurInputTapePtr;
CurInputTapePtr = InputTapePtr;
InputTapePtr = *(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR));
}
TransitionTapePtr = *(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)
(*(ULONG_PTR*)TransitionTapePtr + sizeof(ULONG_PTR)) +
sizeof(ULONG_PTR)) + sizeof(ULONG_PTR)));
} else {
TransitionTapePtr = *(ULONG_PTR*)(TransitionTapePtr + sizeof(ULONG_PTR));
}
} while (TransitionTapePtr != (ULONG_PTR)InputTapeLeft);
Points of Interest
The data structure of the Turing machine encoding format which uses memory pointers and lists of state transitions within a given state was deduced, and from the typical encoding derived with some simple C code. The linked list format of the input tape and its left part were also deduced and simulated.
2 errors in the research paper were found, namely around here, we see R was improperly placed instead of L: mov [N], X ;; select new value for L \\ mov [N+1], R \\ mov L, [N + D]
and we see that 'H is 1, we must halt the machine' but then in fact the code selects to halt if H is 0 not 1. To correct it in the preceding code, we swap the 0 and 1, mov [N], 0 \\ mov [T], 1
Now if H is 0, we must halt the machine which is done with 2 cheating instructions, a compare and conditional jump simply for successful simulation in C with graceful termination.
Research papers can have small mistakes which one must carefully peruse and spend time fixing. We have tried to remain as exactly faithful to the pseudo code of the paper as possible while only adjusting for register additions, transition from C, and graceful termination which is noted with tabification.
The x86 limited number of registers requires looking for mutual exclusivity and saving or reloading them in a careful manner. The esp
and ebp
should not be used especially from C where recovering them or accessing the data would become more difficult, though with great effort and care, it is indeed possible. 64-bit code has access to the r8-r15 registers making it possible to have a register for every needed variable. The calling semantics are peculiar and must be very specially handled.
An example of an equivalent UTM in C, as well as a 64-bit conditionally compilable version make a more general and well understood implementation.
Conclusion is that indeed the research paper UTM which although more theoretical and not practically implemented as much as the result of the paper and the ability to use mov
instructions for obfuscation, is correct.
History
- Initial version
- x64 and external assembly file support