Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Spoofing the Wily Zip CRC

0.00/5 (No votes)
31 May 2003 1  
How Zip CRCs work and how to create a specific file CRC

Introduction

CRC's are designed for error detection and the are extremely good at that. One often sees warnings that CRC's should not be used for security purposes yet it is rarely shown just how incredibly easy it is to "spoof" a CRC. This short article will show how true that warning is in addition to providing a simplified algorithm for Zip file CRC calculations.

Background

CRC's are used for error detection and are optimal for this purpose. For instance the CRC in Zip will detect all single error bursts of up to any 32 consecutive bits no matter where they occur and the CRC itself is only 32 bits! Is that cool or what? This would seem to be an ideal candidate for a one way hash function but it's not. In fact, the well understood and regular math behind CRC's is what makes them so easy to spoof. CRC's are one component of a fascinating filed of mathematics called "Galois Field Theory", a cornerstone of error correcting codes used just about everywhere. This is the stuff that keeps your CD's playing true even when severely scratched. A great text for those wishing to delve deeply into the math is Peterson's classic "Error Correcting Codes" which will explain in math terms why the algorithms shown here work. Also, perhaps the great tutorial and in depth description of CRC's is Ross N. Williams' classic "crc_v3.txt" widely available on the net.

Here is the Zip CRC algorithm in un-obfuscated terms:

  1. A 32 bit value, the residual, is set 0xffffffff
  2. The data stream is viewed as a series of bits starting with the lsb of the first byte and ending with the msb of the last byte.
  3. Each bit is xor'ed with the lsb of the residual and if the result is one the generating (gf) value, 0xdb710641 is xor'ed into the residual.
  4. The residual is then rotated 1 bit to the right
  5. Steps 3 and 4 are repeated for each bit in the stream
  6. The ending crc is the bit inverted residual.

Finding the CRC of a Zipped File

Zip CRC's can be seen by using the verbose mode. With unzip.exe I use this command to view crc's:

unzip -lv zipfile

Which produces this listing from running the sample code included. (and zipping the result)

Archive: x.zip
Method Size Ratio Date Time CRC-32 Name
------ ------ ---- ----- ---- ---- ------ ----
41 Stored 41 0% 06-01-03 00:22 01234567 crc01234567.txt
------ ------ --- -------

The CrcZip Class

class CrcZip {
public:
    explicit CrcZip(UI32 a_r=~0) : r(a_r){}
    enum {gf=0xdb710641};  // This is the generator

    UI32 r;                // residual, polynomial mod gf

    void clk(int i=0);     // clock in data, bit at a time

    void update(char *pbuf, int len); // update crc residual 

    operator UI32(){return ~r;};// object yields current CRC

    void clkrev();        // clk residual backwards

};

// This reverses CrcZip::clk(0)

// in Galois terms: R := i + R*X

void CrcZip::clkrev()
{
    if (r & 0x80000000)
        r = (r << 1) ^ gf;
    else
        r = (r << 1);
}

// this is the basic "gut's" of crc calcuation.

// and updates the crc's 32 bit residual for a

// single bit, in terms of galois theory

// this is: R := R*X^(-1)

void CrcZip::clk(int i)
{
    r ^= i;
    // rotation combined with possible xor of "gf"

    if (1 & r)
        r = ((r^gf) >> 1) | 0x80000000;
    else
        r >>= 1;
}

// This does a byte at a time by calculating the

// crc bit by bit, lsb first. Slow, but clear.

void CrcZip::update(char *pbuf, int len)
{
    while (len--)
    {
        int bitcnt, byte = *pbuf++;
        for (bitcnt=0; bitcnt < 8; bitcnt++)
        {
            clk(byte & 1);
            byte >>= 1;
        }
    }
}

The Test Program

The sample code calculates, "fixes", and writes a file which when zipped, creates a CRC of 0x01234567. After you see how easy spoofing CRC's is you will never again use them to ID stamp virus detectors or other code designed as one way functions.

Here we calculate the initial CRC of a test string.

    CrcZip crc;
    char s[41]={"This is a 0 crc file\x1a" "xxxx" "arbitrary stuff"};
    crc.update(s, sizeof(s));
    cout << "The CRC for this string is " << hex << crc << endl;

Then we initialize a fixup crc with the desired value xor'ed into the orignal strings crc:

   CrcZip crcFixup(crc^0x01234567);  
                 // 0x01234567 instead of

                 // "0" makes it more interesting

Now we back up the fixup residual by the number of bits from the patch location to the file end and xor it with whatever is in the patch.

// Then we back step the residual to the point we want

// to stuff a 32 bit value. To interpret "(41-21)*8".

// 41 is the length of the string. 21 is the position

// of the 4 byte place we want to stuff

// a fixup value, 8 bits per byte

for (int i =0; i < (41-21)*8; i++)
    crcFixup.clkrev();
    
// since crc's are "linear" (super-position applies)

// we xor the fixup residual with the 4 bytes "xxxx"

// already there.

*(reinterpret_cast<UI32 *>(s+21)) ^= crcFixup.r;

Applications

There are several applications for this in hardware (and possibly software) design. Hardware design is even more notorious for boundary condition sensitivity than software. Error detection protocols should always be well tested. Such things as all zero and all ones as well as a single crc bits set with all other bits zero must be tested in hardware designs. This technique provides a methodology for easily creating these patterns. Brute force can also be used. Exhaustive searches are easy with 32 bit CRC's but not at all feasible with extensions such as 64 bits.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here