Linux SO File Injection

Asif Bahrainwala

4.94/5 (15 votes)

4 Jun 2016CPOL4 min read

34K

410

Inject a library file in a running process

Download SOFileInjection.zip - 7 KB

Introduction

This article hopes to introduce live patching a process in Linux. Readers will be able to inject an SO file into a remote process running on Linux (x86-64bit process, tested on Ubuntu 16.04, 4.4.0-22-generic), provided they have the required access rights. We will revisit debugging on Linux platform and step around Linux's version of ASLR (https://en.wikipedia.org/wiki/Address_space_layout_randomization).

This is best required when you don't have the source code but want the process to forcefully load your SO file, in the constructor of a global object in the SO file, you could perform all sorts of operations including API hooking.

Background

I would recommed users to refer to my previous article: http://www.codeproject.com/Articles/1073879/Write-Your-Own-Linux-Debugger. For Windows developers, we can use the API CreateRemoteThread as mentioned in http://www.codeproject.com/Articles/535677/Memory-Analyzer-x-bit-a-Free-Detour.

Unfortunately for Linux developers, we don't have any direct way to do this, which is where this article comes in :-). Since we are actually writing a debugger of sorts (using ptrace), you could also achieve this using gdb.

Users will require Qtcreator: sudo apt-get install -y qtcreator qt5-default

Using the Code

As with all my previous articles, the code must be refered to at all times.

We start by spawing the target process that needs to be injected by the SO file (as in the attached sample). We can also work with providing the PID. Fork always does the trick.

C++

switch(rpid = fork())  //spqwn a process
{
case 0://child process
{
    int y=execlp("../build-QTUI_App-Desktop-Debug/QTUI_App",0);
    //int y=execlp("../build-Test-Desktop-Debug/Test",0);
    break;
}
case -1:
    printf("error spawning process\n");exit(-1);
    break;
    //parent continues execution
}

The next step is to attach your process to this target process, your process is responsible for manipulating this target process to load the required SO file.

ptrace is the single most crucial API thats is provided by the OS to assist in debugging.

C++

int status=0;
   ptrace(PTRACE_ATTACH,rpid,NULL);printf("error %u\n",errno);   //lets attach the process
   pid_t tid=wait(&status);
   ptrace(PTRACE_SETOPTIONS, tid, NULL, PTRACE_O_TRACEFORK |
   PTRACE_O_TRACEVFORK | PTRACE_O_TRACECLONE | PTRACE_O_TRACEEXIT);

Now comes the best part...

We have to manupilate the target process by injecting opcodes so as to trick it to load our module (lib.so). The opcodes used here are for x86 (64bit) only, I may port it to x86 32-bit.

When the debugger is attached to the process, it sends a SIGSTOP to the debuggee. When this happens, we get busy. We want the debugger to break with SIGTRAP , so that the tracee breaks for good and that we can query register state, which is why we need to add additional code.

C++

ptrace(PTRACE_SINGLESTEP, tid, 0, 0);
tid=waitpid(-1, &status, __WALL);

This will cause the processor to execute the next instruction and then SIGTRAP. This is what the debugger uses for single stepping every instruction when debugging with disassembly.

For x86, calling ptrace (PTRACE_SINGLESTEP...) causes the trace flag to be set (https://en.wikipedia.org/wiki/FLAGS_register), when the trace flag is set, debug exception is raised for every instruction executed, when the ISR is called, it doesn't single step for obvious reasons.

We must save the state of the process before changing it:

The API process_vm_readv(rpid,&Originaliovec,1,&remote_iov,1,0) is used to copy the target process memory so as to restore it later. We must also save the state of the registers, ptrace to the rescue.

C++

ptrace(PTRACE_GETREGS,tid,NULL,&uregs);

Now that we have enough to restore the process to its original state, we can change values in the address space of the target process.

WriteProcessMemory is used for just this.

We move the required parameters to RSI and RDI and then call dlopen, all this is done via injecting the required opcodes into the target process's address space, the function is well commented using numbering (1),(2),...

ptrace(PTRACE_POKETEXT...) is used to write memory to target process.

C++

void WriteProcessMemory(const unsigned int rpid,user uregs={})
{
    char *str = libName;
    memcpy(data_opcodes, str,strlen(str)+1);  //copied the name of the so

    unsigned char MovRaxtoRDI[] = { 0x48, 0x8B, 0xf8 };  //these are the opcodes for move RAX=>RDI
    unsigned char Mov1toRBX[] = { 0x48, 0xc7, 0xc3, 01, 0, 0, 0 };   //move 1=RBX
    unsigned char MovRBXtoRSI[] = { 0x48, 0x8B, 0xF3 };              //move RBX=>RSI
    unsigned char CallRax[] = {0xff, 0xd0, 0xcc };  //Call RAX and then break (int 3)

    //compine all the opcodes
    unsigned char opcodes[50];

    //copy the address of the lib file to RAX, we are placing the lib.so file 
    //after all the opcodes (including the breakpoint)
    //so the flow is (intel assembly format):
    /*
     * mov rax,address of the so file       (1)
     * mov rdi,rax                          (2)
     * mov rbx,1                            (3)
     * mov rsi,rbx                          (4)
     * mov rax,function address of dlopen   (5)
     * call RAX                             (6)
     * breakpoint
     * .
     * .
     * /
     * /
     * l
     * i
     * b
     * .
     * s
     * o
     */

    /*(1)*/unsigned char MovtoRax[2 + 8] = { 0x48, 0xb8 };
    void *p = uregs.regs.rip+sizeof(MovtoRax) + sizeof(MovRaxtoRDI) + 
              sizeof(Mov1toRBX) + sizeof(MovRBXtoRSI) + sizeof(MovtoRax) + sizeof(CallRax);
    memcpy(&MovtoRax[2], &p, 8);
    memcpy(opcodes, MovtoRax, sizeof(MovtoRax));  //move first paramter to RAX-> then to RDI

    /*(2)*/memcpy(opcodes + sizeof(MovtoRax), MovRaxtoRDI, sizeof(MovRaxtoRDI));
    /*(3)*/memcpy(opcodes + sizeof(MovtoRax) + sizeof(MovRaxtoRDI), 
          Mov1toRBX, sizeof(Mov1toRBX));  //move second parameter to RBX  -> then to RSI
    /*(4)*/memcpy(opcodes + sizeof(MovtoRax) + sizeof(MovRaxtoRDI) + 
          sizeof(Mov1toRBX), MovRBXtoRSI, sizeof(MovRBXtoRSI));

    /*(5)*/
    p = FindFuncAddr("libdl",dlopen,rpid); //find out where libdl is loaded in the remote process, 
                                           //this is randomly loaded for every process (thanks to ASLR)
    memcpy(&MovtoRax[2], &p, 8);  //move function address to RAX->call RAX (in this case Sleep)
    memcpy(opcodes + sizeof(MovtoRax) + sizeof(MovRaxtoRDI) + sizeof(Mov1toRBX) + 
                     sizeof(MovRBXtoRSI), MovtoRax, sizeof(MovtoRax));

    /*(6)*/memcpy(opcodes + sizeof(MovtoRax) + sizeof(MovRaxtoRDI) + 
            sizeof(Mov1toRBX) + sizeof(MovRBXtoRSI) + sizeof(MovtoRax), CallRax, sizeof(CallRax));
    memcpy(data_opcodes, opcodes, sizeof(MovtoRax) + sizeof(MovRaxtoRDI) + 
            sizeof(Mov1toRBX) + sizeof(MovRBXtoRSI) + sizeof(MovtoRax) + sizeof(CallRax));

    memcpy(data_opcodes+sizeof(MovtoRax) + sizeof(MovRaxtoRDI) + 
           sizeof(Mov1toRBX) + sizeof(MovRBXtoRSI) + sizeof(MovtoRax) + sizeof(CallRax),
           str,strlen(str)+1);

    //now write these opcodes to the remote process.
    for(int i=0;i<sizeof(data_opcodes);++i){
        ptrace(PTRACE_POKETEXT,rpid,uregs.regs.rip+i,data_opcodes[i]);
    }
}

As mentioned earlier, to call dlopen, we need to know its address in that target process. This is done by refering to file /proc/<PID>/maps (since you have access to the target process, you can open it). From here, we get the address of the module which holds the function dlopen. Now to get the function's address, well you know the function address in your current process and its offset from the address of the module loaded in your process, use the same offset with respect to the module in the remote process.

The below function will find out where libdl is loaded.

C++

void *FindSoAddress(const char *strLibName,pid_t pid)

The below function will find out the function address of dlopen in that target process.

C++

void *FindFuncAddr(const char *strLibName,const void *pLocalFuncAddr,pid_t pid)

Now that opcodes to load the SO file are in place (along with the desired break point after call RAX), let the target process continue.

C++

ptrace(PTRACE_CONT, tid, NULL,0);

This will cause the target process to load the SO file and then break (with int 3: 0xcc), refer to unsigned char CallRax[] = {0xff, 0xd0, 0xcc }; //Call RAX and then break (int 3) in function WriteProcessMemory.

Once this is done, commence restoration and then exit your process.

C++

if(siginfo.si_signo==5 && bFirst)
{
    bFirst=false;
    RestoreMemory(rpid,originalRegs);
    ptrace(PTRACE_DETACH,tid,0,0);
    exit(0);//your work is done
}

RestoreMemory is going to make use of ptrace(PTRACE_POKETEXT...) to write back to target process and use ptrace(PTRACE_SETREGS,rpid,NULL,&originalRegs) to set the original register context and then detach itself and exit.

Don't forget to build the lib.so file: gcc lib.cpp -shared -fpic -o lib.so

Points of Interest

Armed with this, readers can now implement CreateRemoteThread on Linux systems, API hooks for remote processes.

History

4^th June, 2016: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)