Introduction
SimplePack is a library for storing many files into one. SimplePack is similar to ordinary TAR archive in some ways, because both archives allow storing directory structure into one file without compression. SimplePack provide this basic functionality:
Archive.cs
- Add file/directory into archive
- Delete file/directory from archive
- Extract file/directory from archive
- Perform this operation in synchronous and asynchronous way (with progress information)
ArchiveInfo.cs
- Calculate basic statistic information about archive
ArchiveFileStream.cs
- Read only file stream providing direct reading of file content without necessary extraction
Note: Archive.cs can perform only one asynchronous operation in time. Changing this behavior will be a little bit tricky and will lead to decreased performance of the Archive itself. All asynchronous operations in this class have suffix Async
.
Note 2: SimplePack source code contains documentation of library.
Advantages and Disadvantages
Advantages
- Working without temporary files
- Synchronous and asynchronous way to work (no additional thread required)
- No data compression (faster archiving/extracting of data, possible direct access to specific file)
Disadvantages
Background
In simplePack archive are physically stored only files and serialized Footer. Directory structure which is in archive, is represented only with objects. This hierarchical structure is serialized at the end of archive. Root “directory” is represented with object of type Footer
. Class Footer
implements interface IArchiveStructure
same like DirectoryArchive
class. There are several differences between Footer
and DirectoryArchive
(e.g. Footer
don’t have parent directory) so this is the reason why Footer
does not inherit class DirectoryArchive
. There are two collections in classes Footer
and DirectoryArchive
. The first collection contains objects of class FileArchive
which Footer
or directory directly contains. The second collection contains objects of DirectoryArchive
class (so nested directories of current directory or Footer
). Class FileArchive
holds information about files which are stored in archive. FileArchive
and DirectoryArchive
also contain additional information about attributes of original files or directories. These attributes are restored after extracting file or directory. FileArchive
and DirectoryArchive
also refer to parent IArchiveStructure
object (Footer
or DirectoryArchive
). As you can see, hierarchical structure is represented by nested objects, so recursion is often used to go through this structure, but I think this does not slow the archive. Anyway, SimplePack also stores a list of all files which are in archive, to speed up operations with files (the main reason why I did this is: I’m too lazy to program a few additional recursion methods). Deleted files are replaced with unused space (Gap). Files are stored in archive sequentially, so one file follows another. When this is not true, there is a gap between two files. There is very simple way how to detect whether there are some gaps in archive. As you can remember, we have still a list of all files in archive, so let’s sort this list by start position in archive (StartSeek
), the rest is just a simple algorithm. Data stored in Footer (objects of types FileArchive
and DirectoryArchive
) contains information about parentDirectory
. This information is not serialized into Footer
(if you use XML serializer), because then the cycle will occur. This information is calculated from existing hierarchical structure. Each directory also holds size. After adding or removing file or directory, this Size
information is updated hierarchically until root (also here it is good to store information about parentDirectory
).
I’ll explain how SimplePack works in some examples. Examples demonstrated will only work with files, working with directories is analogical. All information about files and directories are stored in Footer
part of Archive. Footer
is in fact serialized object of Footer
class. Default serializer for Footer
is XML serializer, but user can implement his own serializer and put it into the constructor. This serializer must implement IFooterStorage
interface. Footer
is written into Archive only when method Close()
is called. This behavior can be changed if set attribute Atomic
to TRUE
, then footer
will be written after performing each writing operation into archive. This also slows down the Archive a little bit. You can specify the virtual path to files and directories in SimpePack archive. It’s possible to have several copies of the same file but each file will be stored in a different virtual directory (this will be demonstrated also in examples). Root of Archive is arch:\\
. You cannot delete root directory (basically you can, but then you will be honored with exception).
Add File
Figure 1.A: This is archive before new file will be inserted. Footer is at the end of file.
Figure 1.B: Footer is removed from archive (Footer is still present in memory), because new fill will be written at the end of archive
Figure 1.C: New file is inserted at the end of archive
Figure 1.D: Updated Footer is written at the end of archive file
Note: Archive can have only one Footer at all!
Remove File
Figure 2.A: Archive before removing file
Figure 2.B: Removed file is replaced with empty data, Footer was updated and written at the end of file
Note: Size of archive remains unchanged (when we don’t count small change of Footer
record). Removing files from archive this way does not require temporary files, because deleted file is overwritten with empty bytes (0x00
). SimplePack provides methods (synchronous and of course asynchronous) which removes unused space from Archive.
Removing Unused Space from Archive
After removing file from archive, is created unused space. This unused space will not change even when you add new file into directory, because then data will be defragmented and this will cause slow down of whole SimplePack. I decided to implement VacuumArchive
method (or VacuumArchiveAsync
) which is very similar to method vacuum
in SQLite. Unused space is shifted at the end of archive and then is archive truncated.
Figure 3.A: Archive with 2 unused spaces before call VacuumArchive
method
Figure 3.B: File 2 is moved right behind file 1 and so unused space is moved behind file 2
Figure 3.C: Archive is truncated to sum of all files
Figure 3.D: Footer is written at the end of Archive
Note: As you can see here, Footer
is unchanged, because Footer
does not hold information about unused spaces.
Using the Code
The first example demonstrates how to use base class Archive
in a synchronous way.
using(Archive simpelPackTestArchive = new Archive(@"c:\myTestArchive.smp"))
{
simpelPackTestArchive.Open();
simpelPackTestArchive.AddFile(@"c:\test1.txt",
@"arch:\nameOfFileInArchive");
simpelPackTestArchive.AddFile(@"c:\test1.txt",
@"arch:\nameOfFileInArchive2");
simpelPackTestArchive.AddFile(@"c:\test2.txt",
@"arch:\archiveDirectory\myArchivedFile");
simpelPackTestArchive.AddFile(@"c:\test3.txt",
@"arch:\archiveDirectory\myArchivedFile2");
simpelPackTestArchive.AddDirectory(@"c:\testDirectory\",
@"arch:\someDirInsideArchive\Additional Directory\myArchivedDirectory\");
simpelPackTestArchive.RemoveFile
(@"arch:\nameOfFileInArchive2");
simpelPackTestArchive.RemoveFile
(@"arch:\archiveDirectory\myArchivedFile2");
simpelPackTestArchive.RemoveDirectory
(@"arch:\someDirInsideArchive\");
(@"arch:\");
simpelPackTestArchive.VacuumArchive();
}
This is a demonstration of how to use base class Archive
in an asynchronous way. This demonstration is done on ordinary Form
class. All methods have synchronous and asynchronous versions. Most of asynchronous methods also provide progress information (except OpenAsync
and VacuumArchiveAsync
). See documentation for more information.
private void Form2_Load(object sender, EventArgs e)
{
testArchive = new Archive(@"c:\testArchive");
testArchive.OpenCompleted +=
(testArchive_OpenCompleted);
testArchive.OpenAsync();
}
void testArchive_OpenCompleted(Archive sender,
SimplePack.Events.OpenCompletedEventArgs openCompletedEventArgs)
{
if (openCompletedEventArgs.Error == null)
{
MessageBox.Show("Archive was opened correctly");
} else {
MessageBox.Show(openCompletedEventArgs.Error.Message);
}
}
private void Form2_FormClosing(object sender, FormClosingEventArgs e)
{
if (!testArchive.IsBusy)
return;
e.Cancel = true;
MessageBox.Show("Can not close form, operation in progress");
}
How to get basic statistic information of Archive
.
ArchiveInfo archiveInfo = new ArchiveInfo
(@"c:\testArchive");
MessageBox.Show(archiveInfo.BiggestFile.Length.ToString());
How to read data directly from Archive
(without extraction):
private void Form1_Load(object sender, EventArgs e)
{
using (Archive testArchive = new Archive
(@"c:\testArchive"))
{
testArchive.Open();
using (ArchiveFileStream archiveFileStream =
new ArchiveFileStream(testArchive, @"arch:\myPicture.jpg"))
{
Image testImage = new Bitmap
(archiveFileStream);
pictureBox1.Image = testImage;
}
}
}
Points of Interest
The reason why I created this library is because I need to store several directories into one file. Providing asynchronous approach makes code easy to read and you can implement displaying progress of operation with only a few lines of code. You don’t even need to know anything about threads and re-invoking event in correct thread is also not necessary, so code is at the end more readable. Compressing of data is not supported and Archive
will be never be extended this way (but you can if you want :)). I decide this way, because primary usage of this archive is focused on storing multimedia data where zipping make source data even bigger. SimplePack is limited only with possibilities of the file system.
Benchmark
I made a simple benchmark of 3 libraries support directory packing (SimplePack, ChilkatDotNet2 for tar packing, SharpZipLib for zip packing with 0 compression level). Test program creates 20 times same archive and count minimum, maximum and average time in ms. Test directory contains 5230 files and 1004 subdirectories with total size 237,799,409 bytes. Results are displayed in following graph. In case of SimplePack, the latest version 1.0.3 with custom binary serialization of Footer
was used which is of course faster than XML serialization and costs less resources.
History
- Version 1.0.3 - Custom Binary Serialization/Deserialization of Footer
- Version 1.0.2.2 – Base exception
ArchiveException
was implemented + some small improvements - Version 1.0.2 –
ArchiveFileStream
was implemented - Version 1.0.1 – File and directory attributes are preserved
- Version 1.0.0 – Implementation of Asynchronous method call