Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

MsAccess MdbTools with MFC and .NET

4.82/5 (9 votes)
13 Jan 2012LGPL310 min read 71.7K   10K  
Viewer of MsAccess databases directly from MFC and .NET - Repair corrupt databases
Imagen Imagen

Introduction

I used to have problems with some Access 97 databases that went corrupt. Also I had instalation problems with the JET Engine for MsAccess. I found MdbTools, and could repair databases or at least get data out. Then I ported MdbTools to MFC and .NET (only read, not write support). I updated the sources to the latest version of mdbtools.

The MsAccess file structure

Pages

All the info of the mdb file is organized as pages. A page is a region in the file of 2k (Jet3) and 4k (Jet4).
So, the file len of MsAccess databases is a multiple of 2,048 (Jet3) or 4,096 (Jet4). For example, the Nwind.mdb file sample contains 1,548,288 / 2048 = 756 pages.
All pages are stored secuentially, and, as they are all the same size, page n starts at n * 2048 + 1 or n * 4096 + 1 in the file (considering n starting at 0).

For example, for a Jet3 database containing 5 pages (including 0 which is the Database Definition page):

Positions differ from Jet3 to Jet4, so all tables will be referred to Jet3.
Page NumberStarts at position (dec)Size (dec)
002048
120482048
240962048
361442048
481922048

Page 0 - Database Definition Page

There is not much information about this page. What is known:

From (hex)To (hex)Len (dec)FormatDescription
0x00x01BytePage type - For the database definition page type is 0
0x10x1319Unknown
0x140x174LE-IntJet version of the database (0 for Jet3, 1 for Jet4, 2 for Jet5, 3 for Access 2010
0x180x3D38Unknown
0x3E0x414LE-IntRC4 Key used to encrypt pages (if the database is encrypted)
0x420x5014BytePassword (masked)
0x510x7FF1967Unknown

LE-Int: Refers to Little Endian Int which means that the number is stored from the least significant bit to the most significant bit (LSB-MSB). For example, the number 0xA1B2C3D4 is stored as: D4 C3 B2 A1.

The first byte of each page identifies the page type:
  • 0x00 Database definition page. (Always page 0)
  • 0x01 Data page
  • 0x02 Table definition
  • 0x03 Intermediate Index pages
  • 0x04 Leaf Index pages
  • 0x05 Page Usage Bitmaps (extended page usage)

Page 1

Keeps track of allocated pages in the database.

Page 2 - Table definition page for MSysObjects

MsAccess databases include system tables that contain information about the database itself. They all start with the prefix MSys.
Page 2 contains the Table definition (Page type 0x02) of the table MSysObjects. This table contains all the objects names and (more important), the page number location of them.

The structure of a Table Definition page is:

From (hex)To (hex)Len (dec)FormatDescription
0x00x01BytePage type - For the table definition page type is 2
0x10x78Unknown
0x80x92LE-IntPage len
0xC0xF4LE-IntNumber of rows
0x170x182LE-IntNumber of variable columns
0x190x1A2LE-IntNumber of columns
0x1B0x1E4LE-IntNumber of indexes
0x1F0x224LE-IntNumber of index entries
0x230x264Pages usage bitmask
0x270x2A4LE-IntPages with free space bitmask

Column definitions start at 0x2B + (Number of index entries * 8). Column properties go first. Then go column names (in the same order that properties).
Column properties have a fixed size of 18 bytes.

Assuming "Number of index entries" = 2 (for MSysObjects), column properties start at 0x3B. The number of columns was retrieved before, so you must read 18 * numcols bytes.

From (hex)To (hex)Len (dec)FormatDescription
0x3B0x3B1ByteColumn type
0x3C0x3C1ByteColumn Number
0x3D0x3E2LE-IntVariable col number
0x3F0x402LE-IntRow col number
0x410x455Unknown
0x460x461BytePrecision
0x470x471ByteScale
0x480x481ByteFlags. Include isfixed, is long autonumeric, uuid auto
0x490x4A2LE-IntColumn fixed offset
0x4B0x4C2LE-IntColumn size offset

After the properties, the name of the columns appear. The first byte contains the size and then you read that amount of bytes (the name). So, column names canĀ“t be bigger than 256 chars
Column names are stored in compressed unicode.

For example, for MSysObjects there are 17 columns, so 0x3B adds to 17 * 18 = 16D The first column is Id (len 2)

From (hex)To (hex)Len (dec)FormatDescription
0x16D0x16D1ByteColumn name size
0x16E0x16F2Unicode TextColumn Name

After the columns, the indexes definition follow.

Data allocation

Data is stored in data pages (page type 1) with the structure:

From (hex)To (hex)Len (dec)FormatDescription
0x00x01BytePage Type (1 for this page)
0x10x11ByteUnknown
0x20x32LE-IntFree space in this page
0x40x74LE-IntPage pointer to page definition
0x80x92LE-IntNumber of records

After that information, there are offsets of 2 bytes to records (there are 2 * num_records bytes of this items). After that, records start.
To know record len you substract next offset to previous offset.

In the table definition there is a field called Pages usage bitmask:
The 3 first bytes of that item are the page and the last byte the row of the page usage bitmap.
For example, in the sample database, the bitmap is located in Page 6, row 0.

Bitmap data is 133 bytes long and contain:
From (hex)To (hex)Len (dec)FormatDescription
0x00x01ByteMap type (0 or 1)
0x10x44ByteBase page for which this map applies
0x50x85128ByteEach bit contains allocation of this table (boolean). Allocation can be set to 128 pages after base page

So, this basic allocation cannot hold more than 128 pages. As pages are of size 2048, that gives: 128 * 8 * 2048 = 2Mb. For bigger databases the field Pages usage bitmask points to a record containing an map identifier (1) of 1 byte and secuentially pointers of 4 bytes to pages of type 5 that contain bitmaps.

The code

Solution structure

The solution contains 5 projects:
  • MdbToolsLib: Contains the GLib (C Libraries) and libmdb (Access). It also contains classes to interface with projects MFCMdbTools and NETMdbTools. It uses STL.
  • MFCMdbTools: The idea of this Dll is to act as a layer between MFC and MdbToolsLib. Most of the code is CString conversion
  • NETMdbTools: The idea of this Dll is to act as a layer between .NET and MdbToolsLib. It uses C++/CLI.
  • MFCMdbToolsTestApp: MFC test app that uses MFCMdbTools.
  • NETMdbToolsTestApp: C# test app that uses NETMdbTools. It includes export functionality.

Naming conventions

There are 3 projects which share the same functionality (MdbToolsLib, MFCMdbTools, NETMdbTools) for different libraries (C/C++, MFC, .NET).
So, there are 3 prefixes, one for each library:
  • MdbToolsLib: The prefix is MdbLib. For example, MdbLibDatabase
  • MFCMdbTools: The prefix is MFCMdb. For example, MFCMdbDatabase
  • NETMdbTools: The prefix is NETMdb. For example, NETMdbDatabase

MdbToolsLib

Class MdbLibDatabase

CMdbLibDatabase() Initializes the MdbTools handler as 0 so that you can check if a database is loaded.
void* mdb; Private variable that contains the MdbHandle (struct) of the database. It is declared as void so that you don't need the MdbTools.h header.
int Open(const char* strname); Opens the database and assigns Mdbhandle. Returns a 0 for error, other for success.
void Close(); If there is an open database, it closes all tables handles and closes the Mdbhandle database.
void LoadTables(); Loads the database catalog and creates CMdbLibTable objects for each one in the tables array.
map<char*, CMdbLibTable, cmp_str> tables; This is the array of tables. cmp_str is a comparison to find items sorted by char*

Class MdbLibTable

CMdbLibTable() Initializes the MdbTableDef object (tdef) and the name (char*) as 0.
char* Name; Contains the name of the table. It is allocated and assigned in CMdbLibDatabase::LoadTables() and deleted in ClearTableColumns()
map<char*, CMdbLibColumn, cmp_str> columns; This is the array with the columns of the table. It is loaded in LoadTableColumns()
map<char*, CMdbLibValue, cmp_str> cvalues; This is the array with the values for iterating the table rows. It is loaded and allocated in LoadTableColumns() but values are assigned in NextItem()
void LoadTableColumns(); Loads columns and values (columns and cvalues arrays). Allocates memory for both. Binds cvalue data to col so that it is assigned when row changes
void ClearTableColumns(); Frees the MdbTableDef tdef variable, and all cvalues allocated space for row data.
int NextItem(); Executes the mdb_fetch_row and returns if eof. Values are assigned inside mdb_fetch_row by the bind_ptr.
void MoveFirst(); Executes the mdb_rewind_table so that record cursor moves to the first row.
void* mdb; MdbHandle variable of the database. Declared as void to avoid Mdbtools.h dependency.
void* tdef; This is the MdbTools handle for the table. It is assigned in LoadTableColumns().
BOOL IsSystem; If the table is system table (name starts with MSys).
void* GetEntryByName(char* name); Finds the entry in the catalog by table name. This entry is neccesary to perform the mdb_read_table.

Class MdbLibColumn

enum MdbLibColType Contain the column type constants. It mimics MdbTools column type constants. They exist to provide independence from MdbTools.h.
char* Name; Column name. It is allocated and assigned in CMdbLibTable::LoadTableColumns().
MdbLibColType Type; Column type.
int Size; Column fiexd size.
int IsInt(); Boolean used to group all integer types for formatting porpuses.
int IsNumDec(); Boolean used to group all float types for formatting porpuses.
int IsBinary(); Boolean used to group all binary (memo, OLE, etc) types for formatting porpuses.
int IsDate(); Returns if the column type is date (for formatting porpuses).

Class MdbLibValue

char* value; This is the buffer where the column value is stored when iterating rows. It is assigned to bind_ptr in CMdbLibTable::LoadTableColumns().
int len; This is real len of the data stored in value. It is assigned to len_ptr in CMdbLibTable::LoadTableColumns().
int GetIntValue(); Returns the value stored in "value" variable as int.
double GetDoubleValue(); Returns the value stored in "value" variable as double. To do so it eliminates the exponent and multiplies accordingly.
void GetTimeValue(SYSTEMTIME* st); Returns the date in SYSTEMTIME format.

Libraries

The original MdbTools uses GLib, so, there are dependencies and libglib is needed to run the application.
I had the idea of replacing GLib with STL and native windows functions, but then, it would be a lot of work and besides, when MdbTools sourcecode improves, I would have a difficult time upgrading.

The project is in VS 2010 and uses the latest MFC and .NET 4 but it can be easily ported the older versions of Visual Studio.
To compile MFC and .NET apps, first compile GLib and copy the .lib in the glib folder of the app.

History

  • 2012-01-13: Improve article with some MsAccess format detail.
  • 2011-12-20:

    • Made a common .lib interface (including GLib) for both .NET and MFC and put it all into one single project.
    • Added support for Binary fields (OLE). Improved support for numbers and dates
    • In the .NET app added export to DAO database to repair corrupt databases
  • 2011-11-21. GLib library now links into dlls.
  • 2011-11-14. First version that uses glib dlls.

Licence and sourcecode

The licence for this project is LGPL which basically means that you can use the library in any project (even commercial apps) but if you make modifications to the library itself you must publish them. The sources were taken from https://github.com/brianb/mdbtools.

To do

  • Add support for other database objects (queries, modules, forms, etc).
  • Add ATL/COM interface for MdbTools.
  • Add write support.

Acknowledgements

To the authors of mdbtools: Brian Bruns, Karl Nyberg, Georg Bauer, Carl Seutter, Trevor Harrison, Brent Johnson, Tim Nelson, David Mansfield, Jeff Smith, Steve Langasek, Rene Engelhard, Vincent Fourmond, Tim Retout, Nirgal Vourgere

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)