Motivation
Well recently I had nasty worm/rootkit problem and naturally I wanted to know what he changed in my system. So i started seeking for some tool to detect registry changes. some simple tool to dump complete registry content to text file before infection and after and by simple text diff i would be able to see the changes fast. I was not very lucky thou. Since all reg tools i found were using win32 api to get data which that clever rootkit redirected to himself and thus stayed hidden. Also as i later found out malware don't even need to be that clever to hide things in registry from standard api.
So now I had physical clean registry files from system restore point and dirty ones from my infected system. And I didn't stop poking in the hives until I did come up with simple tool to dump and compare their real contents in simple text format. I also needed full reg path at each entry so in case I use text diff on those dumps I see where the change happened.
Hive format
NT/XP registry files (binary hives not textual reg files) are actually very simple. tey are just bunch of 4k blocks where each block contain variable sized blocks . Each of those starts with
usual 4b size and 2b type.
And thats about it . thats ms registry hive format. Oh and I nearly forgot. First 1k of first block is hive header with no usefull info as far as i know
Now whats inside of those variable sized blocks
The simplest way to describe registry is to think of it as a file system where keys are directories and values are files. And as we allready know both directories and files have names except that each file can also contain data.
So there are 2 basic blocks one for keys and one for values. what's nice is that MS decided to use human readable 2 char strings in the block type field i mentioned earlier. so if you open hive in hex viewer jou can clearly see "nk" for key block and "vk" for value block.
Using the code
And here is actual code to dump registry hives. I used portable c code so it should be compilable on unix too without much change.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
struct offsets {
long block_size;
char block_type[2];
short count;
long first;
long hash;
};
struct key_block {
long block_size;
char block_type[2];
char dummya[18];
int subkey_count;
char dummyb[4];
int subkeys;
char dummyc[4];
int value_count;
int offsets;
char dummyd[28];
short len;
short du;
char name;
};
struct value_block {
long block_size;
char block_type[2];
short name_len;
long size;
long offset;
long value_type;
short flags;
short dummy;
char name;
};
void walk ( char* path, key_block* key ) {
static char* root=(char*)key-0x20, *full=path;
memcpy(path++,"/",2); memcpy(path,&key->name,key->len); path+=key->len;
for(int o=0;o<key->value_count;o++){
value_block* val = (value_block*)(((int*)(key->offsets+root+4))[o]+root);
if(!val->offset) continue;
char* data = root+val->offset+4;
if(val->size&1<<31) {
data = (char*)&val->offset;
}
*path='/'; if(!val->name_len) *path=' ';
memcpy(path+1,&val->name,val->name_len); path[val->name_len+1]=0;
printf("%s [%d] = ",full,val->value_type);
for(int i=0;i<(val->size&0xffff);i++) {
if(val->value_type==1||val->value_type==7) {
if(data[i]) putchar(data[i]);
} else {
printf("%02X",data[i]);
}
}
printf("\n");
}
offsets* item = (offsets*)(root+key->subkeys);
for(int i=0;i<item->count;i++){
offsets* subitem = (offsets*)((&item->first)[i]+root);
if(item->block_type[1]=='f'||item->block_type[1]=='h') {
walk(path,(key_block*)((&item->first)[i*2]+root));
} else for(int j=0;j<subitem->count;j++) {
walk(path,(key_block*)((&subitem->first)[item->block_type[1]=='i'?j*2:j]+root));
}
}
}
int main(int argc, char** argv) {
char path[0x1000]={0}, *data; FILE* f; int size;
if(argc<2||!(f=fopen(argv[1],"rb"))) return printf("hive path err");
fseek(f,0,SEEK_END);
if(!(size=ftell(f))) return printf("empty file");
rewind(f); data=(char*)malloc(size);
fread(data,size,1,f);
fclose(f);
walk(path,(key_block*)(data+0x1020));
free(data);
return 0;
}
Points of Interest
Remember it will dump values that you normally don't even have access to so be careful.
It's prerfect to just dump the hives before and after software installation and just compare changes with text diff (for example commandline version from UnixUtils is great) .