|
Thanks for this gem! This helped me a LOT and saved my weekend!
Keep on truckin'!
|
|
|
|
|
This is exactly what I was looking for. Thank you!!
|
|
|
|
|
I haven't tested this yet, but if it works, it's just what I'm looking for. Cheers!
|
|
|
|
|
Yep, it works fine! Thanks very much!
|
|
|
|
|
Great job! I don't know if this is the right place for this but I needed it in VB.net so I rewrote.
Code is below.
Mike
Imports System
Imports System.Collections.Generic
Imports System.Text
Imports System.IO
Namespace crcCheckSum
''' <summary>
''' Encapsulates a <see cref="System.IO.Stream" /> to calculate the CRC32 checksum on-the-fly as data passes through.
''' </summary>
Public Class CrcStream
Inherits Stream
Private Shared table As UInteger() = GenerateTable()
Private m_stream As Stream
Private m_readCrc As UInteger = 4294967295
Private m_writeCrc As UInteger = 4294967295
''' <summary>
''' Encapsulate a <see cref="System.IO.Stream" />.
''' </summary>
''' <param name="stream">The stream to calculate the checksum for.</param>
Public Sub New(ByVal stream As Stream)
Me.m_stream = stream
End Sub
''' <summary>
''' Gets the underlying stream.
''' </summary>
Public ReadOnly Property Stream() As Stream
Get
Return m_stream
End Get
End Property
Public Overloads Overrides ReadOnly Property CanRead() As Boolean
Get
Return m_stream.CanRead
End Get
End Property
Public Overloads Overrides ReadOnly Property CanSeek() As Boolean
Get
Return m_stream.CanSeek
End Get
End Property
Public Overloads Overrides ReadOnly Property CanWrite() As Boolean
Get
Return m_stream.CanWrite
End Get
End Property
Public Overloads Overrides Sub Flush()
m_stream.Flush()
End Sub
Public Overloads Overrides ReadOnly Property Length() As Long
Get
Exit Property
End Get
End Property
Public Overloads Overrides Property Position() As Long
Get
Exit Property
End Get
Set(ByVal value As Long)
m_stream.Position = value
End Set
End Property
Public Overloads Overrides Function Seek(ByVal offset As Long, ByVal origin As SeekOrigin) As Long
Return m_stream.Seek(offset, origin)
End Function
Public Overloads Overrides Sub SetLength(ByVal value As Long)
m_stream.SetLength(value)
End Sub
Public Overloads Overrides Function Read(ByVal buffer As Byte(), ByVal offset As Integer, ByVal count As Integer) As Integer
count = m_stream.Read(buffer, offset, count)
m_readCrc = CalculateCrc(m_readCrc, buffer, offset, count)
Return count
End Function
Public Overloads Overrides Sub Write(ByVal buffer As Byte(), ByVal offset As Integer, ByVal count As Integer)
m_stream.Write(buffer, offset, count)
m_writeCrc = CalculateCrc(m_writeCrc, buffer, offset, count)
End Sub
Private Function CalculateCrc(ByVal crc As UInteger, ByVal buffer As Byte(), ByVal offset As Integer, ByVal count As Integer) As UInteger
Dim i As Integer = offset, [end] As Integer = offset + count
While i < [end]
crc = (crc >> 8) Xor table((crc Xor buffer(i)) And &HFF)
i += 1
End While
Return crc
End Function
Private Shared Function GenerateTable() As UInteger()
Dim table As UInteger() = New UInteger(255) {}
Dim crc As UInteger
Const poly As UInteger = 3988292384
For i As UInteger = 0 To table.Length - 1
crc = i
For j As Integer = 8 To 1 Step -1
If (crc And 1) = 1 Then
crc = (crc >> 1)
crc = crc Xor poly
Else
crc >>= 1
End If
Next
table(i) = crc
Next
Return table
End Function
''' <summary>
''' Gets the CRC checksum of the data that was read by the stream thus far.
''' </summary>
Public ReadOnly Property ReadCrc() As UInteger
Get
Return m_readCrc 'Xor &HFFFFFFFF
End Get
End Property
''' <summary>
''' Gets the CRC checksum of the data that was written to the stream thus far.
''' </summary>
Public ReadOnly Property WriteCrc() As UInteger
Get
Return m_writeCrc ' Xor &HFFFFFFFF
End Get
End Property
''' <summary>
''' Resets the read and write checksums.
''' </summary>
Public Sub ResetChecksum()
m_readCrc = 4294967295
m_writeCrc = 4294967295
End Sub
End Class
End Namespace
|
|
|
|
|
How does it handle big files?
|
|
|
|
|
It works really well with big files, especially if you're already reading or writing them for other purposes. The main idea of this class is that everything is done on-the-fly, thus getting rid of any significant overhead and wait times.
|
|
|
|
|
Hello again,
since ReadToEnd produces an out of memory exception, one must use the code in another way, but how?
Regards
|
|
|
|
|
You're getting an OutOfMemoryException because you're reading in a huge stream all at once.
You have to read it in a little bit at a time, like this:
byte[] buffer = new byte[4096];
int length;
while((length = buffer.Read(buffer, 0, buffer.Length)) != 0)
{
}
Console.WriteLine("CRC: " + stream.ReadCrc.ToString("X8"));
The CRC is calculated each time you call Read, but it won't be the CRC of the complete file until you've read the entire file.
|
|
|
|
|
|
I think you buffer'ed when you should have stream'ed:
while((length = stream.Read(buffer, 0, buffer.Length)) != 0)
|
|
|
|
|
Oops, right you are.
Thanks.
|
|
|
|
|
|
Hi,
Great bit of code and very useful.
I wanted to be able to read sections of a file and calculate the CRC for only that part, so I added a ReadLine method. I thought I'd post it here in case anyone else finds it useful. It's a bit of a hack but it works!
public string ReadLine()
{
StringBuilder sb = new StringBuilder();
int b;
b = ReadByte();
while (b >= 0 && b != '\n' && b != '\r')
{
sb.Append((char)b);
b = ReadByte();
}
if (b == -1)
{
return null;
}
else
{
int nextChar = ReadByte();
if (nextChar != '\n' && nextChar != '\r')
{
Seek(-1, SeekOrigin.Current);
}
return sb.ToString();
}
}
</code>
The file I am checking contains blocks of text, separated with blank lines. I want a CRC of each block so I use the ReadLine method to read up to the next blank line, get the checksum, reset the CRC by calling ResetChecksum() and continue reading the file to the next blank line.
Anthony
----
I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone.
-Bjarne Stroustrup
|
|
|
|
|
|
Hi mate,
Very nice work On-The-Fly CRC and with the polynomial.
With your class, we can check any type of stream, but if the target are always files, is there a reason why we can't derive it from FileStream and directly implement the CRC on it ?
It avoids using a FileStream to acquire the Stream and then encapsulate that Stream in your own class.
I've tested this changes but i can't get any performance improvements
But can you check if this is valid ??
Here's the adaptation:
<br />
class FileStreamWithCRC : FileStream<br />
{<br />
private uint _readCRC = unchecked(0xFFFFFFFF);<br />
private uint _writeCRC = unchecked(0xFFFFFFFF);<br />
private static uint[] GenerateTable()<br />
{<br />
unchecked<br />
{<br />
uint[] table = new uint[256];<br />
<br />
uint crc;<br />
const uint poly = 0xEDB88320;<br />
for (uint i = 0; i < table.Length; i++)<br />
{<br />
crc = i;<br />
for (int j = 8; j > 0; j--)<br />
{<br />
if ((crc & 1) == 1)<br />
crc = (crc >> 1) ^ poly;<br />
else<br />
crc >>= 1;<br />
}<br />
table[i] = crc;<br />
}<br />
<br />
return table;<br />
}<br />
<br />
}<br />
<br />
private static uint[] table = GenerateTable();<br />
<br />
public uint ReadCRC<br />
{<br />
get { return unchecked(this._readCRC ^ 0xFFFFFFFF); }<br />
}<br />
<br />
public uint WriteCRC<br />
{<br />
get { return unchecked(this._writeCRC ^ 0xFFFFFFFF); }<br />
}<br />
<br />
public FileStreamWithCRC(String filePath, FileMode fileMode, FileAccess fileAccess, FileShare fileShare): base(filePath, fileMode, fileAccess, fileShare)<br />
{<br />
<br />
}<br />
<br />
<br />
uint CalculateCRC(uint crc, byte[] buffer, int offset, int count)<br />
{<br />
unchecked<br />
{<br />
for (int i = offset, end = offset + count; i < end; i++)<br />
crc = (crc >> 8) ^ table[(crc ^ buffer[i]) & 0xFF];<br />
}<br />
return crc;<br />
}<br />
<br />
public void ResetChecksum()<br />
{<br />
this._readCRC = unchecked(0xFFFFFFFF);<br />
this._writeCRC = unchecked(0xFFFFFFFF);<br />
}<br />
<br />
public override int Read(byte[] array, int offset, int count)<br />
{<br />
count = base.Read(array, offset, count);<br />
this._readCRC = CalculateCRC(this._readCRC, array, offset, count);<br />
return count;<br />
}<br />
<br />
public override void Write(byte[] array, int offset, int count)<br />
{<br />
base.Write(array, offset, count);<br />
<br />
this._writeCRC = CalculateCRC(this._writeCRC, array, offset, count);<br />
}<br />
}<br />
<br />
|
|
|
|
|
Cool
The code works fine as far as I can tell. I just gave it a quick run through reading a file. Definitely a lot more convenient if you know you're going to be dealing with a file rather than some other type of stream.
A couple reasons it won't affect performance:
1. When you override a member of a class, internally it does the same thing as when you call two methods one after another and pass parameters along -- just that it does that bookkeeping and method calling for you automatically. Even if it does make a slight difference, it'd only be a couple dozen CPU cycles per call at most. Each call takes probably at least a few million cycles, so the difference is immeasurable.
2. On an even broader scale, the time it takes to load a file off the disk generally dwarfs the time it takes to calculate the CRC (modern hard drives are that slow). Performance is a bit more significant only after the first time you read a file, because Windows caches the file onto memory.
Thanks!
|
|
|
|
|
I was looking for some simple code to plug into my application and calculate checksums and this is exactly what I wanted! Thanks for sharing.
|
|
|
|
|
|
thanks for your contribution! Just a detail: strictly speaking, CRC is not a checksum algorithm. While checksums (used in internet protocols like IPv4) are one class of error detection codes, CRC is another one (used in LAN protocols like Ethernet). Parity checks form yet another such class...
|
|
|
|
|
Hmm... what is it called then?
From what I know browsing around, it's called a checksum algorithm.
Here's the definition at Wikipedia[^]:
"A cyclic redundancy check (CRC) is a type of hash function used to produce a checksum, which is a small number of bits, from a large block of data, such as a packet of network traffic or a block of a computer file, in order to detect errors in transmission or storage. "
|
|
|
|
|
Thinking about it, I don't remember a better name for the generated bits. I checked some technical papers: those bits are often called "CRC bits" or just the "CRC". Like you mentioned, "checksum" is very popular, too, maybe due to the lack of a better name. Again Wikipedia: Checksum[^]: "This article is about checksums calculated using addition. The term 'checksum' is sometimes used in a more general sense to refer to any kind of redundancy check."
As everyone understands what you mean by the term "checksum": no matter, ignore my post above (don't want to be captious)
btw: I'd call CRC an error detection code.
|
|
|
|
|
|