Introduction
CRC's are designed for error detection and the are extremely good at that. One often sees warnings that CRC's should not be used for security purposes yet it is rarely shown just how incredibly easy it is to "spoof" a CRC. This short article will show how true that warning is in addition to providing a simplified algorithm for Zip file CRC calculations.
Background
CRC's are used for error detection and are optimal for this purpose. For instance the CRC in Zip will detect all single error bursts of up to any 32 consecutive bits no matter where they occur and the CRC itself is only 32 bits! Is that cool or what? This would seem to be an ideal candidate for a one way hash function but it's not. In fact, the well understood and regular math behind CRC's is what makes them so easy to spoof. CRC's are one component of a fascinating filed of mathematics called "Galois Field Theory", a cornerstone of error correcting codes used just about everywhere. This is the stuff that keeps your CD's playing true even when severely scratched. A great text for those wishing to delve deeply into the math is Peterson's classic "Error Correcting Codes" which will explain in math terms why the algorithms shown here work. Also, perhaps the great tutorial and in depth description of CRC's is Ross N. Williams' classic "crc_v3.txt" widely available on the net.
Here is the Zip CRC algorithm in un-obfuscated terms:
- A 32 bit value, the residual, is set 0xffffffff
- The data stream is viewed as a series of bits starting with the lsb of the first byte and ending with the msb of the last byte.
- Each bit is xor'ed with the lsb of the residual and if the result is one the generating (gf) value, 0xdb710641 is xor'ed into the residual.
- The residual is then rotated 1 bit to the right
- Steps 3 and 4 are repeated for each bit in the stream
- The ending crc is the bit inverted residual.
Finding the CRC of a Zipped File
Zip CRC's can be seen by using the verbose mode. With unzip.exe I use this command to view crc's:
unzip -lv zipfile
Which produces this listing from running the sample code included. (and zipping the result)
Archive: x.zip
Method Size Ratio Date Time CRC-32 Name
------ ------ ---- ----- ---- ---- ------ ----
41 Stored 41 0% 06-01-03 00:22 01234567 crc01234567.txt
------ ------ --- -------
The CrcZip Class
class CrcZip {
public:
explicit CrcZip(UI32 a_r=~0) : r(a_r){}
enum {gf=0xdb710641};
UI32 r;
void clk(int i=0);
void update(char *pbuf, int len);
operator UI32(){return ~r;};
void clkrev();
};
void CrcZip::clkrev()
{
if (r & 0x80000000)
r = (r << 1) ^ gf;
else
r = (r << 1);
}
void CrcZip::clk(int i)
{
r ^= i;
if (1 & r)
r = ((r^gf) >> 1) | 0x80000000;
else
r >>= 1;
}
void CrcZip::update(char *pbuf, int len)
{
while (len--)
{
int bitcnt, byte = *pbuf++;
for (bitcnt=0; bitcnt < 8; bitcnt++)
{
clk(byte & 1);
byte >>= 1;
}
}
}
The Test Program
The sample code calculates, "fixes", and writes a file which when zipped, creates a CRC of 0x01234567. After you see how easy spoofing CRC's is you will never again use them to ID stamp virus detectors or other code designed as one way functions.
Here we calculate the initial CRC of a test string.
CrcZip crc;
char s[41]={"This is a 0 crc file\x1a" "xxxx" "arbitrary stuff"};
crc.update(s, sizeof(s));
cout << "The CRC for this string is " << hex << crc << endl;
Then we initialize a fixup crc with the desired value xor'ed into the orignal strings crc:
CrcZip crcFixup(crc^0x01234567);
Now we back up the fixup residual by the number of bits from the patch location to the file end and xor it with whatever is in the patch.
for (int i =0; i < (41-21)*8; i++)
crcFixup.clkrev();
*(reinterpret_cast<UI32 *>(s+21)) ^= crcFixup.r;
Applications
There are several applications for this in hardware (and possibly software) design. Hardware design is even more notorious for boundary condition sensitivity than software. Error detection protocols should always be well tested. Such things as all zero and all ones as well as a single crc bits set with all other bits zero must be tested in hardware designs. This technique provides a methodology for easily creating these patterns. Brute force can also be used. Exhaustive searches are easy with 32 bit CRC's but not at all feasible with extensions such as 64 bits.