Introduction
This project introduces a self-sustained .NET library (DLL) for in-memory data (as byte array and MemoryStream
) compression and encryption. Naturally, the processed output can be saved to hard drive, but the project concentrates on the effective way of in-memory data processing.
Compression
The obvious choice for the compression is in-built into .NET GZip functionality and LZM algorithm as provided by 7zip SDK C# implementation being the most effective compression function so far.
Since the purpose of the project is straight in-memory transformation, it does not provide for 7Zip files formation with their specific header structure. The library invokes GZip and LZM compression algorithms to deflate/inflate in-memory data using the parallel programming to take advantage of multiple CPU cores.
The library available compression options come from the following:
public enum CompressionAlgo {
GZip_All,
GZip1,
GZip2,
GZip4,
GZip8,
LZMA_All,
LZMA1,
LZMA2,
LZMA4,
LZMA8,
None
}
where 1, 2, 4, 8 stand for the number of processors to be engaged during compression and Gzip_All
and LZMA_All
- for all available machine processors (which usually equals to 8 so LZMA_All
and LZMA8
have the same meaning).
The main and only purpose of the functionality provided is in-memory manipulation. So the single-threaded LZMA1
compression result (byte array) saved to hard drive forms the correct 7z file that can be opened with 7zip application but multi-threaded options do not provide for well-formatted 7z file.
Encryption
For the encryption part, the library introduces the fast stream cipher based on BlakeB compression function as implemented by Samuel Neves, Christian Winnerlein and Uli Riehm.
The BlakeB
compression function works at the core of a generator that creates the output bytes array (pad) unique to the user in-putted key, salt and distribution values.
The produced pad is used in the following manner for the encryption/decryption:
- Encryption: Plain text (byte array p) XOR pad => Cipher text (byte array x)
- Decryption: Cipher text (x) XOR pad => Plain text (p)
Since encrypt
and decrypt
function are both just XORing the bytes with the pad, the cipher engine includes the single function Crypt
covering both cases and giving us the stream cipher implementation.
Both generator (BGen
class in the program) and stream cipher (Streamer
class in the program) work in the parallel mode utilizing multi-core processor.
For example, if machine has 8 processors, generator will produce 8 parallel arrays and glue them into the single output (pad). Needless to say that these pieces are guaranteed to be unique and not repeating.
Similarly, Streamer
will divide the input stream into 8 pieces, XOR them with the 8 corresponding parts of the pad and produce the summary result.
Let’s make some modification to the BlakeB
functionality.
The core compression function consists of blocks like these:
v0 = v0 + v4 + m0;
v12 = v12 ^ v0;
v12 = ((v12 >> 32) | (v12 << (32)));
v8 = v8 + v12;
v4 = v4 ^ v8;
v4 = ((v4 >> 24) | (v4 << (40)));
v0 = v0 + v4 + m1;
v12 = v12 ^ v0;
v12 = ((v12 >> 16) | (v12 << (48)));
v8 = v8 + v12;
v4 = v4 ^ v8;
v4 = ((v4 >> 63) | (v4 << (1)));
v1 = v1 + v5 + m2;
v13 = v13 ^ v1;
v13 = ((v13 >> 32) | (v13 << (32)));
v9 = v9 + v13;
v5 = v5 ^ v9;
v5 = ((v5 >> 24) | (v5 << (40)));
v1 = v1 + v5 + m3;
v13 = v13 ^ v1;
v13 = ((v13 >> 16) | (v13 << (48)));
v9 = v9 + v13;
v5 = v5 ^ v9;
v5 = ((v5 >> 63) | (v5 << (1)));
...
with repeating shift constants 32, 24, 16 and 63. Each of the constants appears 96 times in 12 rounds of processing totalling 384 times altogether.
To make the adversary’s life more interesting, current implementation initially generates byte array S[384] based on the user password (key), salt and distribution input. This allows for the following pattern in the compression function:
int i = 0;
v0 = v0 + v4 + m0;
v12 = v12 ^ v0;
v12 = RR(v12, S[i++]);
v8 = v8 + v12;
v4 = v4 ^ v8;
v4 = RR(v4, S[i++]);
v0 = v0 + v4 + m1;
v12 = v12 ^ v0;
v12 = RR(v12, S[i++]);
v8 = v8 + v12;
v4 = v4 ^ v8;
v4 = RR(v4, S[i++]);
...
where RR
is the right rotation function:
private ulong RR(ulong x, byte n) {
return (x >> n) | (x << (-n & 63));
}
Thus the compression function utilizes the unique set of 384 shift numbers (bytes) dependant on user inputted key, salt and distribution eliminating arbitrary constants from the calculations.
The only constants in the Generator are programmer-defined ulong
s:
private const ulong IV0 = 9111111111111111111;
private const ulong IV1 = 8222222222222222222;
private const ulong IV2 = 7333333333333333333;
private const ulong IV3 = 6444444444444444444;
private const ulong IV4 = 5555555555555555555;
private const ulong IV5 = 4666666666666666666;
private const ulong IV6 = 3777777777777777777;
private const ulong IV7 = 2888888888888888888;
private const ulong IV8 = 1999999999999999999;
private const ulong IV9 = 1111111111111111111;
private const ulong IV10 = 2222222222222222222;
private const ulong IV11 = 3333333333333333333;
private const ulong IV12 = 4444444444444444444;
private const ulong IV13 = 5555555555555555555;
private const ulong IV14 = 6666666666666666666;
private const ulong IV15 = 7777777777777777777;
private const ulong FNL0 = 123456789;
private const ulong FNL1 = 987654321;
Set unique constants in your implementation and enjoy the reliable and flexible personal stream cipher where nothing is predefined externally.
The user-provided salt and distribution input affect the initial state of the Generator and therefore add to the dispersion of the outcome.
Tango Library
The Tango library's Tang
class is a single entry point with interface for compression only, encryption only or both.
Here’s the constructors usage examples.
Compression
Tang tang1 = new Tang(CompressionAlgo.GZip1);
Tang tang2 = new Tang(CompressionAlgo.LZMA8);
Encryption
Tang tang3 = new Tang("key", "salt", "distr", 1);
Tang tang4 = new Tang(key, salt, distr);
Encryption with Compression
Tang tang5 = new Tang(CompressionAlgo.GZip8, "key",
"salt", "distr", 4);
Tang tang6 = new Tang
(CompressionAlgo.LZMA1, "key", "salt", "distr");
In Action
Tang tang1 = new Tang(CompressionAlgo.GZip1);
byte[] plain = ...
byte[] compressed = tang1.Zip(plain);
byte[] uncompressed = tang1.Unzip(compressed);
Tang tang 2 = new Tang("key", "salt", "distr");
byte[] encrypted= tang2.Crypt(plain);
tang2.Reset();
byte[] decrypted = tang2.Crypt (encrypted);
Tang tang3 = new Tang(CompressionAlgo.LZMA1, "key", "salt", "distr");
byte[] tangled = tang3.Tangle(plain);
tang3.Reset();
byte[] untangled = tang3.Untangle(tangled);
Also available are Tango functions with MemoryStream
input parameters instead of byte arrays.
Do not forget to put try
/catch
blocks around un-operations Unzip
and Untangle
.
Generation
After initialization, tang object can generate byte array of given length that is unique to provided key, salt and distribution:
Tang tang = new Tang("key", "salt", "distr");
byte[] hash = tang.Generate(int length);
Measurements
PC: i7 2GHz 64-bit 8 cores
Tangle()
Source: 9,653,618 B (fb2 file - allows for high compression)
| | time | compression rate |
GZip1 | 2,463,446 B | 1259 ms | 25.52% |
GZip8 | 2,472,430 B | 303 ms | 25.61% |
LZMA1 | 1,777,219 B | 30995 ms | 18.41% |
LZMA8 | 1,914,649 B | 6124 ms | 19.83% |
Compression Only
Source: 9653618 B
| | time | compression rate |
GZip1 | 2,463,446 B | 1169 ms | 3.7% 25.52% |
GZip8 | 2,472,430 B | 293 ms | 0.9% 25.61% |
LZMA1 | 1,777,219 B | 31377 ms | 100% 18.41% |
LZMA8 | 1,914,649 B | 6131 ms | 19.5% 19.83% |
The most effective compression algo as expected is LZMA1 which comes with the highest time cost.
GZip8 is 4 times faster than GZip1 and practically not inferior in compression.
Encryption Only
Source: 9653618 B
| time | |
1 core | 903 ms | 100% |
2 cores | 457 ms | 50.6% |
4 cores | 295 ms | 32.7% |
8 cores | 219 ms | 24.3% |
Generation
10,000,000 bytes generated within 250 ms
Initialization Speed
Tang tang = new Tang (CompressionAlgo.LZMA8, "key", "salt", "distr");
The Tang
construction time takes about 0.08 ms independently of the CompressionAlgo
applied.
Take your entanglement with Tango easy!
History
- 1st January, 2019: Initial version