Introduction
Converts an 8 bit string to a 4-bit string (max. 15 different characters allowed).
Respectively: Converts two 8 bit strings to one 8 bit string.
Through this conversion, strings can be stored using only 1/2 of the size of a usual string
. This might be useful for a huge amount of data, that uses 15 different characters at max (like phone numbers).
Background
I was thinking, that storing telephone numbers in a database as string
s is a waste of memory. But storing as an integer is also not possible. My solution was to use an encoded string
.
Using the Code
Below, you see the implementation of the class. At the bottom, there is a test()
function, that shows how to use the code.
For customizing the symbols, that can be represented/encoded, change Encode4Bits._mappingTable
. Never use more than 15 customized values.
class Encode4Bits:
def __init__(self):
self._mappingTable = ['\0', \
'0','1','2','3','4','5','6','7','8','9', \
'-','','','','']
def _encodeCharacter(self,char)
for p in range(len(self._mappingTable)):
if(char == self._mappingTable[p]):
return p
return None
def encode(self, string):
strLen = len(string)
mappingIndices = []
for i in range(strLen):
char = string[i]
index = self._encodeCharacter(char)
if(index is None):
raise("ERROR: Could not encode '" + char + "'.")
mappingIndices.append(index)
mappingIndices.append(0)
if(len(mappingIndices) % 2 != 0):
mappingIndices.append(0)
ret = ""
i = 0
while True:
if(i >= len(mappingIndices)):
break
val1 = mappingIndices[i]
val2 = mappingIndices[i+1]
val1 = val1 << 4
mixed = val1 | val2
char = chr(mixed)
ret += str(char)
i += 2
return ret
def decode(self, string):
ret = ""
for char in string:
index1 = (ord(char) & 0xF0) >> 4
index2 = (ord(char) & 0x0F)
ret += self._mappingTable[index1]
ret += self._mappingTable[index2]
return ret
def test():
numberCompressor = Encode4Bits()
encoded = numberCompressor.encode("067-845-512")
decoded = numberCompressor.decode(encoded)
print(len(decoded))
print(len(encoded))
if __name__ == "__main__":
test()
History
- 8th February, 2019: Initial version