Introduction
A Check Digit is an alphanumeric character added to a number to detect errors. Check Digits are a simple and easy way to nuetralize the errors introduced by the human element of keying data. Check Digits are not Error Correcting Codes. Should the reader want to investigate robust Error Correcting Codes, he or she is referred to Error-Correcting Codes by Peterson and Weldon.
It is also notebale that CRC codes are not a check digit scheme, since more than one alphanumeric digit is generated to detect errors. If the reader desires a system for very long datasets or binary data sets, consider using a 32 bit CRC, Adler checksum, or a hash-based checksum. An Adler checksum is similar to a redundancy check, trading speed for reliability. A sample hashing application using CRC32, MD4, MD5, RIPEMD-160, SHA-1, Whirlpool, and SHA-2 family of hashes is presented in A File Checksum Shell Menu Extension Dll.
This article will discuss the following Check Digit Schemes:
- Mod 9 Scheme
- Mod 7 Scheme
- Mod 11 Scheme
- 3-Weight Method
- IBM Scheme
- UPC Scheme
- Verhoeff Algorithm
- ISO 7064 Mod N/N+1
In addition, a superficial treatment of Group Theory is presented following the algorithms.
Common Errors
According to Richard Hamming, the two most common errors are:
- 12 becomes 21 (adjacent characters are transposed)
- 112 becomes 122 (incorrect doubling of triples)
Jacobus Verhoeff presents the follow stastics in Error Detecting Decimal Codes:
Error | Approximate Percentage | Comment |
a → b | 60% - 90% | Single Error |
ab → aab | 10% - 20% | Adding a Digit |
aab → ab | 10% - 20% | Omitting a Digit |
ab → ba | 10% - 20% | Transposition Error |
aa → bb | 0.5% - 1.5% | Twin Errors |
acb → bca | > 1% | Jump Twin Error |
13 → 30 | 0.5% - 1.5% | Phonetic Error (similar pronunciations) |
Downloads
There are 5 downloads available with this article. They are presented at the end of the article.
Mod 9 Scheme
The mod 9 system is used by the United States Postal Service on money orders. If a money order number is 123456789 and c = 0, the number imprinted on the money order is 1234567890. The system is c = n % 9
. This system is also known as "casting out nines". Casting Out Nines suffers from the fact that 0 → 9 and 9 → 0 are not detected; and no transposition errors are detected (unless it invlolves the check digit).
Mod 7 Scheme
A more robust system than Casting Out Nines is a mod 7 system. Bressoud and Wagon prove the following in Computationl Number Theory:
- Single Error Detection Rate is 93.81%
- Transposition Error Detection Rate is 93.87%
Similar to the Mod 9 Scheme, c = n % 7
. However, only 3 transposition errors are undetectable: 70 ↔ 07, 81 ↔ 18, and 92 ↔ 29.
Mod 11 Scheme
The mod 11 scheme is a weighted scheme also known as International Standard Book Number (ISBN). The first 9 digits represent the unique identifier, the 10th digit is the check digit. Since the scheme is mod 11, the check digit can be 0-9 or X.
This mod 11 based system operates on a dot product (i.e., the individual digits of the number n). n · (10, 9, ..., 2, 1)
mod 10. Stated another way, let ai be the ith digit of n (most significant to least significant). Then
c = 10a1 + 9a2 + 8a3 + 7a4 + 6a5 + 5a6 + 4a7 + 3a8 + 2a9 mod 11
The 10th (a10) digit would be -c
so the following equation (called a check equation) holds true.
0 = 10a1 + 9a2 + 8a3 + 7a4 + 6a5 + 5a6 + 4a7 + 3a8 + 2a9 + 1a10 mod 11
This scheme detects all single error and the transposition of any two digits at any distance (assuming the overall number is 10 or fewer digits long).
_tstring number = _T("073560753");
_tcout << _T("ISBN: ") << number << endl;
int i = 10, c = 0;
for( _tstring::const_iterator it = number.begin();
it != number.end(); i--, it++ )
{
if( _T('X') == *it )
{ c += 10 * i; }
else
{ c += CharToNumber( *it ) * i; }
c %= 11;
}
c = 11 - c;
if( 10 == c )
{ _tcout << _T("Check digit is: ") << _T("X") << endl; }
else
{ _tcout << _T("Check digit is: ") << c << endl; }
3-Weight Method
The 3-Weight Methods is also known as the Electronic Funds Transfer Routing Number Scheme used by the banking industry. It is a mod 10 based system, but rather than operating on n
as a whole number, it operates on a dot product (i.e., the individual digits of the number n
).
So, given an 8 digit routing number, the check digit c = n · (7, 3, 9, 7, 3, 9, 7, 3)
, mod 10. Stated another way, let ai be the ith digit of n (most significant to least significant). Then
c = 7a1 + 3a2 + 9a3 + 7a4 + 3a5 + 9a6 + 7a7 + 3a8 mod 10
c
is then concatenated to the 8 digit route, forming a 9 digit routing number
This scheme is based on the fact that multiplication modulo 10 yields a permutation of the 10 decimal digits if the multiplication factor is a digit of the set { 1, 3, 7, 9 }; but only a subset of the decimal digits if the factor is 5 or an even digit. This system cannot detect adjacent transpositions of digits that differ by 5.
The 3-Weight method does not use simple modular reductions, so c = 10 - c
is not needed to hold the equality.
The reader should find the following check equation holds true (where a9 is -c):
0 = 7a1 + 3a2 + 9a3 + 7a4 + 3a5 + 9a6 + 7a7 + 3a8 + a9 mod 10
The 3-Weight Scheme is presented below. As the reader can see, it generalizes to a arbitrary length of digits easily. However, if the reader desires a system for very long datasets, or binary data sets consider using a CRC or Alder checksums, or a hash-based checksum.
_tstring number = "12345678";
int i = 0, c = 0;
for( _tstring::const_iterator it = number.begin();
it != number.end(); i++, it++ )
{
switch( i % 3 + 1 )
{
case 1:
c += 7 * CharToNumber( *it );
break;
case 2:
c += 3 * CharToNumber( *it );
break;
case 3:
c += 9 * CharToNumber( *it );
break;
default:
assert( false );
break;
}
c %= 10;
}
IBM Check Digit Scheme
The IBM Check Digit Scheme is as close to a perfect system as can be obtained while working over the Field of Integers (Z). This scheme is used by companies such as Mastercard and Visa, and Canadian Social Insurance Numbers. The system is also known as Luhn's Algorithm. The scheme detects all Single Errors, and all Transposition Errors except 0 ↔ 9.
The check digit c
is defined as the dot product:
-(a1, a2, a3, a4, ..., an-1, an) · (2, 1, 2, 1, ..., 1, 2) - r (mod 10)
r
is the number of odd indexed digits greater than 5. This means r = r + 1
if and only if ai > 5, where i
is odd.
In practice, the algorithm is implemented quite differently.
int i = 1, c = 0;
for( _tstring::reverse_iterator it = number.rbegin();
it != number.rend(); it++, i++)
{
if( 0 == i % 2 )
{
c += CharToNumber( *it );
}
else
{
int t = 2 * CharToNumber( *it );
if( t > 9 ) t -= 9;
c += t;
}
}
UPC Scheme
The Universal Product Code system is similar to IBM's system, except a weight of 3 is given for the even indexed digits. The implementation is left as an exercise to the reader.
Verhoeff Algorithm
Unlike previously exmamined methods (which operate over Z), Verhoeff's Algorithm operates over the dihedral group D5.
The algorithm is in use by companies such as ESB Networks of the Republic of Ireland. The algorithm is perfect in that it detects all Single Errors and all Transposition Errors. Additionally, it detects most (but not all) Twin Errors, Jump Twin Errors, Jump Transpositions Errors (abc → cba), and Phonetic Errors.
Verhorff's observation was as follows: one can use the 10 symmetries to encode the 10 digits by using the identity for 0; the rotations (in order) for 1, 2, 3, 4; and the reflections for 5, 6, 7, 8, 9.
The final step in preparation for the implementation is defining the permutation of digits.
Let δ be the permutation of digits as follows:
- δ performs no transformation on 0
- δ switches 1 and 4
- δ switches 2 and 3
- δ cyclically permutes 5, 8, 6, 9, 7
δ(0) = 0
δ(1) = 4
δ(4) = 1
δ(2) = 3
δ(3) = 2
δ(5) = 8
δ(8) = 6
δ(6) = 9
δ(9) = 7
δ(7) = 5
Unlike Z, D5 is not communative: 8 ° 7 ≠ 7 ° 8.
°
| 0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
|
0
| 0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
|
1
| 1
| 2
| 3
| 4
| 0
| 6
| 7
| 8
| 9
| 5
|
2
| 2
| 3
| 4
| 0
| 1
| 7
| 8
| 9
| 5
| 6
|
3
| 3
| 4
| 0
| 1
| 2
| 8
| 9
| 5
| 6
| 7
|
4
| 4
| 0
| 1
| 2
| 3
| 9
| 5
| 6
| 7
| 8
|
5
| 5
| 9
| 8
| 7
| 6
| 0
| 4
| 3
| 2
| 1
|
6
| 6
| 5
| 9
| 8
| 7
| 1
| 0
| 4
| 3
| 2
|
7
| 7
| 6
| 5
| 9
| 8
| 2
| 1
| 0
| 4
| 3
|
8
| 8
| 7
| 6
| 5
| 9
| 3
| 2
| 1
| 0
| 4
|
9
| 9
| 8
| 7
| 6
| 5
| 4
| 3
| 2
| 1
| 0
|
The reader should realize that D5 is a complex entity (dysfunctional) whose elements are somewhat arbitrarily mapped to the integers from 0 to 9, and the group operation is not at all like addition or multiplication. Also, there is no concept of order for D5: < and > have no meaning. The red shading of the table denotes the loss of the Communative Property when multiplying elements.
A small digression: should one desire to perform 6 ° 1, first locate 6th row, and then the 1st column. 6 ° 1 = 5.
c = δn(a1) ° δn-1(a2) ° δn-2(a2) ° ... ° δ2(an-1) ° δ(an)
So, to encode 1793, the reader would:
c = δ4(1) ° δ3(7) ° δ2(9) ° δ1(3)
δ4(1) = δ3(4) = δ2(1) = δ1(4) = 1
δ3(7) = δ2(5) = δ1(8) = 6
δ2(9) = δ1(7) = 5
δ1(3) = 2
1 ° 6 ° 5 ° 2
(1 ° (6 ° (5 ° 2))) = 4
As with other examples, the inverse of 4 is required so that the check equation hold true. Therefore, the check digit equals 1 (1 ° 4 = 0).
Below is the result of running checkdigit5 against 1000372996. Due to Multiplicative, Permutation, and Inverse Tables, the code is rather easy. Note that c is no longer a sum from a previous stage - it is used as a lookup, and then effectively discarded.
int i = 1, c = 0;
for( _tstring::reverse_iterator it = number.rbegin();
it != number.rend(); it++, i++)
{
unsigned int n = CharToNumber( *it );
unsigned int p = P[ i % 8 ][ n ];
c = M[ c ][ p ];
}
c = Inv[ c ];
ISO 7064 Mod N/N+1
This section presents an overview of a family of check digits. Implemetations are presented in Downloads.
The ISO specifications are designed for the following alphabets:
- numeric (10 digits: 0 to 9)
- alphabetic (26 letters: A to Z)
- alphanumeric (letters and digits)
The ISO specifications are designed to detect the following (taking from the ISO website):
- all single substitution errors (the substitution of a single character for another, for example 4234 for 1234);
- all or nearly all single (local) transposition errors (the transposition of two single characters, either adjacent or with one character between them, for example 12354 or 12543 for 12345);
- all or nearly all shift errors (shifts of the whole string to the left or right);
- a high proportion of double substitution errors (two separate single substitution errors in the same string, for example 7234587 for 1234567);
- a high proportion of all other errors.
| Comment
|
Mod 10/11
| European Blood Banks, Unique Identifier for People (U.S. NIH)
|
Mod 16/17
| European Bank Routing
|
Mod 36/37
| U.S. Livestock Identification
|
Group Theory
A Group is a set of objects and an associated operation. Groups are used everyday. For example, counting people in a room. The Group is over the Integers, and the operation is addition, denoted +. There are other operations that can be performed, such a as multiplication, denoted x.
Under addition, there is an identity element: 0. This provides the expected result: 1 + 0 = 1, 0 + 1 = 1, etc. Under multiplication, the identity element is 1. The same is true with multiplication: 1 x 1 = 1, 5 x 1 = 5, and so forth.
Just as one can operate over the Integers, operations can be performed over D5. The operations will be defined as follows.
The first operation that will be defined is ρ. This is the Greek letter rho, and it denotes rotate in D5. ρ0 is the identity element under rotate, as it leaves the element unchanged. ρ1 rotates the dihedral 72°. ρ2 rotates the dihedral 144°. And finally, ρ5 = ρ0.
The next operation that will be defined is σ. This is the Greek letter sigma. It follows rho in the alphabet. It will be used to denote the operation reflection. σ0 leaves the element unchanged, while σ1 'flips' the element over it's horizontal axis. So σ2 = σ0. The perceptive reader should realize ρ5 = σ2.
To visualize the observation, number the partitions of the dihedral as follows.
Now, perform operations on the element. For example, ρ7σ5. ρ7 = ρ2, so:
And next, σ5 = σ1:
The final visual representation is pictured below. Note that this was presented to the user as a visual demonstration. It is not meant to visually demonstrate Verhoeff 's Algorithm. Green denotes the new position of partition 2.
Summary
Though a perfect check digit system was presented, academia has provided much more. In an extraordinary 1978 paper, A. S. Sethi, V. Rajaraman, and P. S. Kenjale described a check system which can be used to correct any Single Error or Transposition Error. It is noteworthy that the system requires two check digits rather than one, and is only effective for certain bases.
The reader is encouraged to visit Error Detecting Decimal Digits (included with this article) for Wagner and Putter's experience in developing a check digit system for a commercial mail order house.
Downloads
Revisions
- 07.27.2007 Fixed Grammatical Errors
- 06.19.2007 Removed Sample 1 Fodder
- 06.02.2007 Reworked Introduction
- 06.02.2007 Added 'Common Error' Section
- 04.17.2007 Cleaned Formatting
- 12.19.2006 Added Reference to A File Checksum Shell Menu Extension Dll
- 12.10.2006 Added ISO 7064 Material
- 12.09.2006 Updated Example of Verhoeff (expanded hand calculation)
- 11.26.2006 Initial Release