Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

Steganography with OOXML for Office (Zip)

5.00/5 (1 vote)
22 Jan 2013CPOL3 min read 15.7K   670  
Hide data inside the zip structure of any zip-based file.

Introduction

This tool suite will, in an intelligent way, let you hide data inside the zip structure of any zip-based file. There are a few similar ways to hide data in a zip structure, but they will trigger error messages in Office when opening such OOXML documents (a newer-style Office document format which is zip based).

I settled on using the Extra Field and Comment Field for injected data, since it works fine on most zip-dependent solutions (have not seen a case it does not work in), and MS Office is by far the most picky one. Other OOXML files created by OpenOffice for instance, are still zip files, and will thus also work.

Some details

The Extra Field can exist in both the Local File Headers (LFH) and/or the Central Directory Headers (CDS). The size of it is restricted to a word (0xFFFF), which equates to roughly 65 KB per Extra Field entry. On a regular zip archive with one file, it would thus be possible to hide roughly 65 * 2 KB. Luckily for OOXML files there are by default quite a lot of files. On a minimal docx that would be about 700 KB, and with a pptx capable of far more. Or a zip archive with n files would be capable of n * 65 KB.

Note that Office solutions using OOXML have certain restrictions on the use of EF in the LFH, which means only certain of those can contain data of limited size. The good thing is that such restrictions don't exist for the CDS. However it is far easier to detect hidden data in the CDS than in LFH. According to the OOXML specification this field can be reserved as much as 65535 bytes per entry (minus the size of the other fields of that entry). In practice, only the entries in CDS can hold that much data without Office complaining.

The EF in LFH appears to handle a maximum of 256 or 512 bytes, and only for certain of the files. MS Office uses it own signature inside the EF and always starts with "20 A2" and followed by the total size of EF minus 4 bytes. Actually only these first 4 bytes of the header is actually used, leaving the next 4 bytes of the header free, as well as the whole following blocks of either 256 or 512 bytes.

To be able to extract any hidden data, there is a custom header created consisting of information about data size, data relative offset.

How to use

Hide part

  1. Start by browsing for the file to be hidden.
  2. Click browse and select a container (docx, xlsx, etc) where the secret data will be injected.
  3. Based on the size of the secret data, method 1 - 6 will be made available or grayed out.
  4. Click Hide.

Extract part

  1. Click browse and select a container (docx, xlsx, etc) that carries the secret data.
  2. Click Extract.

Then, the message will stored in extractMsg.txt.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)