(untagged)

Code Injection into Running Linux Application

Gregory Shpitalnik

0.00/5 (No votes)

12 Feb 2009

How to inject some code into running Linux application

Introduction

Let's assume that your program is running on Linux and is not going to terminate for a long period of time, like UNIX daemons. However you want to upgrade the program in some simple way but you do not want to terminate the program execution. What comes to your mind is to somehow upgrade some known function in your program so that it will do some additional job without compromising the function's usual behavior and without terminating your program. You think about injecting some new code into the code of your program so that it will be triggered when another already existing function in your program is called. This may be a bit imaginary example but it demonstrates the idea, why it is sometimes needed to inject some code in the running program. It is also relevant to mention the virus injection techniques into the running code.

In this article, I'll explain how it is possible to inject a C function into the running program on Linux without terminating the program. We'll talk about Linux object files Executable and Linkable Format (ELF), about object file sections, symbols and relocations.

Working Example Overview

I will explain step by step the code injection technique using the following simple example. The example consists of 3 components:

Dynamic (shared) library libdynlib.so that is built from dynlib.hpp and dynlib.cpp C++ source files.
Application app that is built from app.cpp source file and is linked with libdynlib.so library.
The injection function located in injection.cpp file.

Let us review the components code.

// dynlib.hpp

extern "C" void print();

The dynlib.hpp header defines the print() function.

// dynlib.cpp

#include <stdlib.h>
#include <iostream>
#include "dynlib.hpp"

using namespace std;


extern "C" void print()
{
    static unsigned int counter = 0;
    ++counter;

    cout << counter << ": PID " << getpid() << ": In print() " << endl;
}

The dynlib.cpp implements the print() function that just prints a counter (that is incremented at every function call), the program process id and a message.

// app.cpp

#include 
#include 
#include "dynlib.hpp"

using namespace std;


int main()
{
    while (1)
    {
        print();
        cout << "Going to sleep ..." << endl;
        sleep(3);
        cout << "Waked up ..." << endl;
    }

    return 0;
}

The application app.cpp calls the print() function (from the libdynlib.so dynamic library, then sleeps for a few seconds and continues doing the same in the infinite loop.

// injection.cpp

#include 

extern "C" void print();

extern "C" void injection()
{
    print(); // do the original job, call the function print()
	system("date"); // do some additional job
}

The injection() function call is going to replace the print() function call in the application main() function. The injection() function will first call the original print() function and then do some additional job. For example, it can run some external executable file using system() function call or just print the current date as I do.

Compile and Run the Application

Let us first compile the components with g++ C++ compiler and gcc C compiler.

g++ -ggdb -Wall dynlib.cpp -fPIC -shared -o libdynlib.so
g++ -ggdb app.cpp -ldynlib -ldynlib -L./ -o app
gcc  -Wall injection.cpp -c -o injection.o

-rwxr-xr-x  1 gregory ftp  52248 Feb 12 02:05 app
-rw-r--r--  1 gregory ftp   1088 Feb 12 02:05 injection.o
-rwxr-xr-x  1 gregory ftp  52505 Feb 12 02:05 libdynlib.so

Note that the dynamic library libdynlib.so is compiled and linked with -fPIC flag that produces position independent code and the injection object is compiled with C compiler. We can now run the application app executable.

[lnx63:code_injection] ==> ./app
1: PID 4184: In print()
Going to sleep ...
Waked up ...
2: PID 4184: In print()
Going to sleep ...
Waked up ...
3: PID 4184: In print()
Going to sleep ...

Getting into Debugger

The application app passed few loop iterations but we pretend that it's already running few weeks so it's now time to inject our new code without terminating the applications. We'll use Linux gdb debugger during the injection process. First we need to attach gdb to the application process 4184, see the PID (application process id) printed above.

[lnx63:code_injection] ==> gdb app 4184
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
	Using host libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /store/fileril104/project/gregory/code_injection/app, process 4184
Reading symbols from 
	/store/fileril104/project/gregory/code_injection/libdynlib.so...done.
Loaded symbols for /store/fileril104/project/gregory/code_injection/libdynlib.so
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0x006e17a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb)

Loading the Injection Code into the Executable Process Memory

As I mentioned above, injection.o object file is not initially included in the app executable process image. We first need to load injection.o into the process memory address space. This can be done with mmap() system call that will map the injection.o file into the app process address space. Let us do it in the debugger.

(gdb) call open("injection.o", 2)
$1 = 3
(gdb) call mmap(0, 1088, 1 | 2 | 4, 1, 3, 0)
$2 = 1073754112
(gdb)

We first open the injection.o file with O_RDWR (value 2) read/write permissions. We need write permission because later we'll make changes in the loaded injection code. The returned allocated file descripter for the opened file is 3. Then we bring the file into the process address space with mmap() call. The mmap() call accepts the file size (1088 bytes), the file mapping permissions - PROT_READ | PROT_WRITE | PROT_EXEC (for reading/writing and executing, 1 | 2 | 4) and opened file descriptor - 3. and returns the starting address of the mapped file within the process address space - 1073754112. We can verify that the injection.o was indeed mapped into the process address space by looking into /proc/[pid]/maps (where pid is the executable process id - 4184 in our example) file that on Linux is the file that contains information about running process memory layout.

[lnx63:code_injection] ==> cat /proc/4184/maps
006e1000-006f6000 r-xp 00000000 fd:00 394811     /lib/ld-2.3.4.so
006f6000-006f7000 r-xp 00015000 fd:00 394811     /lib/ld-2.3.4.so
006f7000-006f8000 rwxp 00016000 fd:00 394811     /lib/ld-2.3.4.so
006ff000-00824000 r-xp 00000000 fd:00 394812     /lib/tls/libc-2.3.4.so
00824000-00825000 r-xp 00124000 fd:00 394812     /lib/tls/libc-2.3.4.so
00825000-00828000 rwxp 00125000 fd:00 394812     /lib/tls/libc-2.3.4.so
00828000-0082a000 rwxp 00828000 00:00 0
00832000-00853000 r-xp 00000000 fd:00 394813     /lib/tls/libm-2.3.4.so
00853000-00855000 rwxp 00020000 fd:00 394813     /lib/tls/libm-2.3.4.so
0096e000-00975000 r-xp 00000000 fd:00 394816     /lib/libgcc_s-3.4.6-20060404.so.1
00975000-00976000 rwxp 00007000 fd:00 394816     /lib/libgcc_s-3.4.6-20060404.so.1
00978000-00a38000 r-xp 00000000 fd:00 45535      /usr/lib/libstdc++.so.6.0.3
00a38000-00a3d000 rwxp 000bf000 fd:00 45535      /usr/lib/libstdc++.so.6.0.3
00a3d000-00a43000 rwxp 00a3d000 00:00 0
08048000-08049000 r-xp 00000000 00:34 30468731   /store/fileril104/project/gregory/
						code_injection/app
08049000-0804a000 rwxp 00000000 00:34 30468731   /store/fileril104/project/gregory/
						code_injection/app
0804a000-0806b000 rwxp 0804a000 00:00 0
40000000-40001000 r-xp 00000000 00:34 30468725   /store/fileril104/project/gregory/
						code_injection/libdynlib.so
40001000-40002000 rwxp 00000000 00:34 30468725   /store/fileril104/project/gregory/
						code_injection/libdynlib.so
40002000-40003000 rwxp 40002000 00:00 0
40003000-40004000 rwxs 00000000 00:34 30468724   /store/fileril104/project/gregory/
						code_injection/injection.o
4000f000-40011000 rwxp 4000f000 00:00 0
bfffe000-c0000000 rwxp bfffe000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0

You can verify that /store/fileril104/project/gregory/code_injection/injection.o starts at address 0x40003000 (decimal 1073754112) and ends at address 0x40004000 within the process address space. Other dynamic libraries mapping is also shown in the above output. Well, we now have all the components loaded in the executable process memory.

Relocations

Now it's time to inspect the application binary executable in ELF format from inside. We'll use readelf Linux utility that displays different data from ELF format object files (i.e. any object, library or executable files on Linux). We look at the symbol relocations in the app executable. We are interested in print() function call relocation.

[lnx63:code_injection] ==> readelf -r app

Relocation section '.rel.dyn' at offset 0x5ec contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
08049d58  00001706 R_386_GLOB_DAT    00000000   __gmon_start__
08049d60  00000305 R_386_COPY        08049d60   _ZSt4cout

Relocation section '.rel.plt' at offset 0x5fc contains 13 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
08049d24  00000107 R_386_JUMP_SLOT   0804868c   print
08049d28  00000207 R_386_JUMP_SLOT   0804869c   _ZNSt8ios_base4InitC1E
08049d2c  00000507 R_386_JUMP_SLOT   080486ac   _ZStlsISt11char_traits
08049d30  00000607 R_386_JUMP_SLOT   080486bc   _ZNSolsEPFRSoS_E
08049d34  00000707 R_386_JUMP_SLOT   08048664   _init
08049d38  00000807 R_386_JUMP_SLOT   080486dc   sleep
08049d3c  00000907 R_386_JUMP_SLOT   080486ec   _ZNKSsixEj
08049d40  00000b07 R_386_JUMP_SLOT   080486fc   _ZNKSs4sizeEv
08049d44  00000c07 R_386_JUMP_SLOT   0804870c   __libc_start_main
08049d48  00000d07 R_386_JUMP_SLOT   08048ae4   _fini
08049d4c  00001307 R_386_JUMP_SLOT   0804872c   _ZSt4endlIcSt11char_tr
08049d50  00001507 R_386_JUMP_SLOT   0804873c   __gxx_personality_v0
08049d54  00001607 R_386_JUMP_SLOT   0804874c   _ZNSt8ios_base4InitD1E

As you can see, the print symbol relocation is located at the absolute (virtual) address (offset) 0x08049d24 in the app executable and the type of this relocation is R_386_JUMP_SLOT. The relocation address is an absolute virtual address of the executable after it is loaded in the memory prior to its run. Note that this relocation resides in the .rel.plt section of the executable binary image. The PLT stands for Procedure Linkage Table, that is the table that provides indirect call for a function. This means that when you call a function you don't directly jump to the function location, but first jump to an entry in the Procedure Linkage Table and then from the PLT jump to the actual function code. This is necessary when you call a function that resides in a dynamic library (libdynlib.so in our example) because you do not know in advance at what address in the executable process space the dynamic libraries will be loaded and in what dynamic library you will first find the required function (print() in our example). All this knowledge is available only at the moment of loading application into the memory prior to its run and at that time it's the job of dynamic linker (ld-linux.so on Linux) to resolve relocations so that the requested function will be correctly called. In our example the dynamic linker will load the libdynlib.so library into the executable process address space, find the address of the print() function in the library and set this address into the relocation address 0x08049d24.

Our goal is to replace the address of the print() function with the address of function injection() from the injection.o object file that was not initially included in the executable process image when it started running.

More information on ELF format, relocations and dynamic linker can be found in Executable and Linkable Format (ELF) document.

We can check that the address 08049d24 currently contains the address of function print().

(gdb) p & print
$4 = (void (*)(void)) 0x40000be8 <print>
(gdb) p/x * 0x08049d24
$5 = 0x40000be8
(gdb)

The address of the injection() function can be found by running readelf -s (displays object file symbol table) on the injection.o file.

[lnx63:code_injection] ==> readelf -s injection.o

Symbol table '.symtab' contains 13 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS injection.cpp
     2: 00000000     0 SECTION LOCAL  DEFAULT    1
     3: 00000000     0 SECTION LOCAL  DEFAULT    3
     4: 00000000     0 SECTION LOCAL  DEFAULT    4
     5: 00000000     0 SECTION LOCAL  DEFAULT    5
     6: 00000000     0 SECTION LOCAL  DEFAULT    6
     7: 00000000     0 SECTION LOCAL  DEFAULT    8
     8: 00000000     0 SECTION LOCAL  DEFAULT    9
     9: 00000000    25 FUNC    GLOBAL DEFAULT    1 injection
    10: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND system
    11: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND print
    12: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND __gxx_personality_v0

The function (symbol) injection is located at the offset 0 in the .text section in the injection.o object file. But the .text section starts at the offset 0x000034 in the injection.o object file.

[lnx63:code_injection] ==> readelf -S injection.o
There are 13 section headers, starting at offset 0x104:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000019 00  AX  0   0  4
  [ 2] .rel.text         REL             00000000 000418 000018 08     11   1  4
  [ 3] .data             PROGBITS        00000000 000050 000000 00  WA  0   0  4
  [ 4] .bss              NOBITS          00000000 000050 000000 00  WA  0   0  4
  [ 5] .rodata           PROGBITS        00000000 000050 000005 00   A  0   0  1
  [ 6] .eh_frame         PROGBITS        00000000 000058 000038 00   A  0   0  4
  [ 7] .rel.eh_frame     REL             00000000 000430 000010 08     11   6  4
  [ 8] .note.GNU-stack   NOTE            00000000 000090 000000 00      0   0  1
  [ 9] .comment          PROGBITS        00000000 000090 000012 00      0   0  1
  [10] .shstrtab         STRTAB          00000000 0000a2 00005f 00      0   0  1
  [11] .symtab           SYMTAB          00000000 00030c 0000d0 10     12   9  4
  [12] .strtab           STRTAB          00000000 0003dc 00003b 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Replacing the print() Function with injection() Function

I would like to remind you that the injection.o file was loaded into the executable process memory at address 0x40003000 (see above). So the final absolute address of the injection() function within the executable process.is 0x40003000 + 0x000034.

We now set this address into the print() function relocation address 0x08049d24.

(gdb) set * 0x08049d24 = 0x40003000 + 0x000034
(gdb)

At this point, we successfully replaced the call to the print() with the call to the injection() function.

Resolving injection() Function Relocations

However we still need some work to be done. The code of the injection() function is not ready to run yet because it has 3 unresolved relocations.

[lnx63:code_injection] ==> readelf -r injection.o

Relocation section '.rel.text' at offset 0x418 contains 3 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000009  00000501 R_386_32          00000000   .rodata
0000000e  00000a02 R_386_PC32        00000000   system
00000013  00000b02 R_386_PC32        00000000   print

Relocation section '.rel.eh_frame' at offset 0x430 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000011  00000c01 R_386_32          00000000   __gxx_personality_v0
00000024  00000201 R_386_32          00000000   .text

The first .rodata relocation points to the "date" constant string stored in the .rodata read-only data section, the second system relocation refers to the system() function call and the third print relocation refers to the print() function call. Note that all the three relocations reside in the .rel.text section that is their offsets are relative to the beginning of the .text section.

We resolve all the above three relocations manually and set appropriate addresses to these three memory locations. The addresses of these relocations within the executable process address space are calculated by summing up:

The injection.o starting address (0x40003000) within the process address space.
The .text section starting offset 0x000034 within the injection.o object file.
The relocation offset relative to the .text section (0x00000009 - for .rodata, 0x0000000e. for system and 00000013 for print).

Note that system and print relocations are of R_386_PC32 type. This means that the value (resolved address) to be set into the relocation location should be calculated relatively to the PC program counter, that is relatively to the relocation location. Also R_386_PC32 relocation requires that the value that was stored in the relocation location before relocation resolution (addend) should be added to the resolved address. The R_386_32 .rodata relocation also adds the addend to its resolved address.

(gdb) p & system
$7 = (<text> *) 0x733650 <system>  // Address of the system() function
(gdb) p * (0x40003000 + 0x000034 + 0x0000000e)
$8 = -4                              // Addend of the system relocation
(gdb) set * (0x40003000 + 0x000034 + 0x0000000e) = 0x733650 -
	(0x40003000 + 0x000034 + 0x0000000e) - 4
(gdb) p & print
$9 = (void (*)(void)) 0x40000be8 <print>    // Address of the print() function
(gdb) p * (0x40003000 + 0x000034 + 0x00000013)
$10 = -4                             // Addend of the print relocation
(gdb) set * (0x40003000 + 0x000034 + 0x00000013) = 0x40000be8 -
	(0x40003000 + 0x000034 + 0x00000013) - 4
(gdb) p * (0x40003000 + 0x000034 + 0x00000009)
$11 = 0                              // Addend of the .rodata relocation
(gdb) set * (0x40003000 + 0x000034 + 0x00000009) = 0x40003000 + 0x000050 // 0x000050 is
		// the offset of .rodata section within injection.o object file.

We just resolved all the three relocations within injection() function code. Well, we are done. We exit the debugger. The application will continue running and now do additional job of printing the current date.

gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program:
	/store/fileril104/project/gregory/code_injection/app, process 4184
[lnx63:code_injection] ==>

// The application execution continues

Waked up ...
Thu Feb 12 20:09:40 IST 2009
4: PID 4184: In print()
Going to sleep ...
Waked up ...
Thu Feb 12 20:09:43 IST 2009
5: PID 4184: In print()
Going to sleep ...
Waked up ...
Thu Feb 12 20:09:46 IST 2009
6: PID 4184: In print()
Going to sleep ...
Waked up ...
Thu Feb 12 20:09:49 IST 2009
7: PID 18138: In print()
Going to sleep ...
Waked up ...

That's it.

Conclusion

I showed how one can inject a C function into the running program on Linux without terminating the program. Note that process memory manipulations that were demonstrated are allowed only for processes for which you are either owner or have appropriate permissions.

History

12^th February, 2009: Initial post

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here