(untagged)

Integer Base Converter

Adam Zgagacz

0.00/5 (No votes)

13 Dec 2013

Encode numbers in different positional numeral systems.

Introduction

If you are a programmer you know different positional numeral systems and deal with them on a daily basis. In everyday life we are usually satisfied with the base 10 numeral system, where we can express any number using 10 symbols (digits): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

But the computer world is quite different. The binary nature of computer storage, size of byte - elementary storage unit, makes binary (base 2) system natural when dealing with computers. The only problem is that numbers expressed as binary make quite long strings for large numbers and they are not easy to read. Who knows that binary number 100100110010110000001011010010 is equivalent of decimal 1,234,567,890?

Hexadecimal (base 16) system comes to rescue and makes numbers shorter. The same decimal number 1,234,567,890 expressed as hexadecimal looks like: 499602D2. It's much shorter than the corresponding binary representation, but I bet you - it's not easy to tell what this number really represents.

Truth is that to deal with numbers written in different positional numeral systems you need to have tools to help translate numbers from one base to another.

.NET Framework gives very little help.

Convert.ToString(int number, int base) allows only to convert decimal representation of an integer to a different base only if the base is 2, 8, 10 or 16, And of course it can deal only with 32-bit integers. Not a very generic solution.

Background

What I've decided to have is a library class that allows to:

Convert integer number from any base representation to another base representation.
Do not set limits to the size of the number.

Both statements are rather bold and sound unfeasible so they deserve a few words of explanation:

"Any base" - I said "any base" because there is nothing what in theory prevents us from encoding numbers using high base numbers. There is only one "but." To express the number as base-x representation we need to use x distinct characters to represent digits within the number. For base 10 we use 0,1,2,3,4,5,6,7,9,0 - no problem; for base-16 we just add 6 more: A, B, C, D, E, F. There is no problem to find 64 (or even little more) distinct printable characters in English based character set, but how to write base-x numbers where x is much greater than 64? One thing comes to mind: use Unicode character set for non-English language. Perhaps Chinese characters? Other options are: use graphical objects for characters or colors or mixture of ideas mentioned above. Point is, that we are able to express numbers with written in numeral systems with very large base.

"No size limit" - this promise is very easy to full fill. Net Framework 4.0 introduced new structure: BigInteger. It allows to create and manipulate an arbitrary large integer numbers. Only limit is the size of available memory. After deciding to use BigInteger to represent decimal representation of the number, conversion can be performed truly for number of any size.

Code Explained

I decided to wrap all functionality in static library class called IntegerBaseConverter.

At hearth of the class lay two simple algorithms.

One to convert BigInteger number to the string expressing same number in Base-n numeral system (where n is a parameter of the algorithm).

public static string Convert(BigInteger value, int radix, string digits)
{
     if ((radix > digits.Length) || (radix < 2))
         throw new ArgumentOutOfRangeException("radix", radix, 
           string.Format("Radix has to be within range <2, {0}>;", digits.Length));
 
     StringBuilder sb = new StringBuilder();
     do
     {
         BigInteger remainder;
         
         value = BigInteger.DivRem(value, radix, out remainder);
 
         sb.Insert(0, digits[(int)remainder]);
     } while (value > 0);
     return sb.ToString();
}

And another one that converts Base-n string representation of the number to the BigInteger value.

public static BigInteger Parse(string value, int radix, string digits)
{
    if ((radix > digits.Length) || (radix < 2))
        throw new ArgumentOutOfRangeException("radix", radix, 
          string.Format("Radix has to be within range <2, {0}>;", digits.Length));
 
    if (value == "")
        value = digits.Substring(0,1);
  
    BigInteger RetValue = 0;
    for (int i = 0; i < value.Length; i++)
    {
        int CharIdx = digits.IndexOf(value[i]);
        if ((CharIdx >= radix) || (CharIdx < 0))
        throw new ArgumentOutOfRangeException("Value", digits[CharIdx], "Invalid character in the input string.");

        RetValue = RetValue * radix + CharIdx ;
    }
    return RetValue;
}

Both methods are simple, self-explanatory and algorithms are well known. I didn't invent a wheel here. I don't think any more explanation that code itself is necessary.

Just one quick note: Since word base is reserved word in C#, I decided to use word radix instead in my source code with the meaning "numeral system base" attached to it. In this article words base and radix are interchangeable and have identical meaning.

Using the code

IntegerBaseConverter class exposes one public property:

DefaultDigitSet - Read-only property showing default character set representing digits used to write the number in numeral system. It returns 64 character string containing the following characters: "0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,+,/. Characters and their order are most commonly used characters for string representations of radix based numeral systems when base is less or equal 64. To write number in Base-n numeral system only first n characters form the string are used. For example: octal numbers (base-8) are written using only 0,1,2,3,4,5,6,7 characters (digits). while base 64 numbers uses all 64 digits from DefaultDigitSet.
Most methods exposed by IntegerBaseConverter accepts string parameter, where caller can supply own set of digits. This way it is possible to use different characters for encoding and if provided string is longer than 64, it allows to use encoding base greater than 64.

IntegerBaseConverter also exposes the following public methods:

CanUseCaseInsensitiveParsingMode(string digits) - Returns true when there are no duplicate characters in digits string regardless their case.
Compare(Char digit1, Char digit2, String digits) - Compares two digits (characters). Order of digits is determined by order of characters in digits parameter. Every character in digits string can occur only once.
Compare(String number1, String number2, String digits) - Compares two numbers represented as strings encoded using the same radix value. Order of digits is determined by order of characters in digits parameter. Every character in digits string can occur only once.
Convert(BigInteger value, int radix, string digits) - Converts value to the string representing this number written in radix-x numeral system (x is provided as the radix method parameter). Every character in digits string can occur only once.
GetPaddedValue(BigInteger value, int radix, Type type, string digits) - Converts value to the string representing this number written in radix-x numeral system (x is provided as the radix method parameter). If string length is less that maximum string length for given radix and type combination it is left-padded with characters representing zero digit up to maximum allowed length. Every character in digits string can occur only once.
IsValidDigitSet(string digits) - Returns true when there are no duplicate characters in digits string.
IsValidNumber(string number, string digits) - Verifies if number is a valid number, by checking all characters used to represent number against set of valid characters (digits). Every character in digits string can occur only once.
MaxValue(Type type) - Returns BigInteger value representing maximum value allowed for System.Type provided as method parameter.
MaxValue(Type type, int radix, string digits) - Returns string representation of maximum value of the number of type type encoded with base radix. digits string is used for encoding digits in encoded number. Every character in digits string can occur only once.
Parse(string value, int radix, string digits) - Converts string representation of a number (written as radix base string) to its BigInteger equivalent.
TryParse(string value, int radix, string digits, out BigInteger result, bool CaseSensitiveParsing = true) - Converts string representation of a number (written as radix base string) to its BigInteger equivalent. digits parameter contains set o characters uses as digits for encoding the number. Every character in digits string can occur only once. Method returns true if conversion was successful, otherwise returns false.
Parse(string number, int radix, string digits, bool CaseSensitiveParsing = true) - Parses string number representation of the number written in radix base numeral system. digits parameter contains set o characters uses as digits for encoding the number. Every character in digits string can occur only once. Method fails (throws an exception) if number contains characters not present in first radix characters of digits set.

Demo program

Downloadable demo program, help file and source code are attached to this article. You can use it to investigate and test behavior of the library and as a starting point to implement your own solution with IntegerBaseConverter.

Help file note:

Since you are downloading help file from the Internet, you need to unblock the file before you use it for the first time. Form the Windows Explorer right-click on the file name , select 'Properties' and then click on 'Unblock' button.

History

September 10, 2013 - Initial version of the article.
September 11, 2013 - Source code improvements. Style correction. Use of default parameters. Addition of two extra methods: IsValidDigitSet and CanUseCaseInsensitiveParsingMode
September 17, 2013- Source code review and minor corrections (bug fixes).
December 13, 2013 - minor corrections of the source code

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here