Introduction
When I was trying to implement a "download directory" function into my custom web application, all solutions I could find were based on creating the zip file first and then sending it. In my case, this could result in large temporary files which (as being mostly images) couldn't be compressed anyway.
So I came up with the idea to just create an uncompressed ZIP archive on-the-fly around the raw data - and as I found out, this is quite easy.
Background
For this purpose, it's enough to regard the minimum necessary structure of a ZIP archive: We won't need multi-part files and we won't need extra information stored for each file - and of course, we don't need any knowledge of compression algorithms.
The basic structure of a ZIP archive makes it easy to just assemble it on-the-fly:
File Entry 1 | | File Header 1 | | File Data 1 |
|
File Entry 2 | | File Header 2 | | File Data 2 |
|
|
File Entry n | | File Header n | | File Data n |
|
|
|
|
|
|
In detail, let's dive into to some bytes. This is a ZIP file containing an uncompressed text file "test.txt" containing the text "The quick brown fox jumps over the lazy dog." I colored the regions from above and each value and included a summary of each value's meaning:
General Data Types
File Entry
A file entry is a part describing a file and containing its data. File entries are stacked one after another.
Name | Length | Data type | Description |
Signature | 4 | Signature | A file entry signature consisting of "PK " followed by the bytes 03 and 04 |
Version | 2 | UInt16 | The host system and compatibility version - for this purpose, I just use 0x000A indicating Windows/NTFS but it really doesn't matter that much |
Flags | 2 | UInt16 | Options as to how to read this file - for this purpose, I use 0x0800, meaning UFT-8 encoded filename and comments and nothing else |
Compression method | 2 | UInt16 | The method the data was compressed with - for this purpose, 0x0000 is used, meaning "uncompressed" |
Filetime | 4 | UInt32 | The last modification time of the file, no other time is saved, format see above |
Checksum | 4 | UInt32 | The CRC-32 checksum of the file data, format see above |
Compressed size | 4 | UInt32 | The size of the compressed file data - for this purpose, the same as the file size |
Uncompressed size | 4 | UInt32 | The size of the uncompressed file |
Filename length | 2 | UInt16 | The length of the filename |
Extra data length | 2 | UInt16 | The length of the extra data - for this purpose, no extra data is used, so this is always 0x0000 |
Filename | * | String | The filename in UTF-8 encoding |
File data | * | Bytes | The file data - usually compressed but in this case, just the raw data |
Extra data | * | Special | Extra data, e.g., for creation time, attributes and more - for this purpose not used |
Central Directory Entry
A central directory entry contains more detailed data about a file entry. The central directory entries are stacked on another and build a kind of table of content.
Name | Length | Data type | Description |
Signature | 4 | Signature | A central directory entry signature consisting of "PK " followed by the bytes 01 and 02 |
OS version | 2 | UInt16 | The version the archive was made by - for this purpose, I just use 0x003F |
Version | 2 | UInt16 | The minimum required version for extracting - for this purpose, I just use 0x000A |
Flags | 2 | UInt16 | Options as to how to read this file - for this purpose I use 0x0800 , meaning UFT-8 encoded filename and comments and nothing else |
Compression method | 2 | UInt16 | The method the data was compressed with - for this purpose, 0x0000 is used, meaning "uncompressed" |
Filetime | 4 | DateTime | The last modification time of the file, no other time is saved, format see above |
Checksum | 4 | CRC32 | The CRC-32 checksum of the file data, format see above |
Compressed size | 4 | UInt32 | The size of the compressed file data - for this purpose, the same as the file size |
Uncompressed size | 4 | UInt32 | The size of the uncompressed file |
Filename length | 2 | UInt16 | The length of the filename |
Extra data length | 2 | UInt16 | The length of the extra data - for this purpose, no extra data is used, so this is always 0x0000 |
Comment length | 2 | UInt16 | The length of the file comment |
Disk | 2 | UInt16 | The disk number the file is on - for this purpose, I only use a single file so this is always 0x0000 |
Internal attributes | 2 | UInt16 | Attributes for internal usage - for this purpose, this is not used and always 0x0000 |
External attributes | 4 | UInt32 | Attributed for external usage - for this purpose, this is not used and always 0x00000000 |
Offset of file entry | 4 | UInt32 | The offset inside the file where the fileentry to this central directory entry starts |
Filename | * | String | The filename in UTF-8 encoding |
Extra data | * | Special | Extra data, e.g., for creation time, attributes and more - for this purpose, not used |
Comment | * | String | A comment for the described file |
End of Central Directory Entry
This entry only occurs once - at least for this purpose - directly stacked on the last central directory entry.
Name | Length | Data type | Description |
Signature | 4 | Signature | A central directory entry signature consisting of "PK " followed by the bytes 05 and 06 |
Disk index | 2 | UInt16 | The index of this disk - for this purpose, I do not use multiple disks so this is always 0x0000 |
Start disk | 2 | UInt16 | The disk index this central directory starts on - for this purpose, I do not use multiple disks so this is always 0x0000 |
File count, disk | 2 | UInt16 | The number of files on this disk - for this purpose, this is always the total count of included files |
File count, central dir | 2 | UInt16 | The number of files in this central directory - for this purpose, this is always the total count of included files |
Size | 4 | UInt32 | The size of the central directory, excluding this entry |
Offset | 4 | UInt32 | The offset of the first central directory entry on this disk - for this purpose, this is always the offset of the first central directory entry in this file |
Comment length | 2 | UInt16 | The length of the archive comment |
Comment | * | String | The archive comment |
Using the Code
The code is a PHP class named BjSZipper
which includes static and instance functionality depending on the method you choose to use. In both cases, only file information is stored in memory, the file data is streamed just-in-time.
1. Collect Information Then Send (Instance)
This method uses an instance of the class, collects information for each file to send (including calculating CRC-32 checksums) and then starts to send the archive. The profit for the user is that he get's a progress bar because the client get's to know the archive size in advance. The downside is a slightly later start of the download after requesting it - especially if there are a lot of or big files to process.
Methods
__construct($zipName = "download.zip", $comment = "")
The constructor of the BjSZipper
. Takes two parameters:
$zipName
- the filename of the ZIP archive sent to the client, optional, default is "download.zip" $comment
- An archive comment, optional, default is empty
AddDir($path, $recursive = true, $filter = null)
Prepares a path and its content for including into the zip archive. Paths are stored relative to $path
to the archive root. Takes three parameters:
$path
- a directory path to take the files from $recursive
- a bool
, if true
the directory is scanned recursively, optional, default is true
$filter
- a Regular Expression for files to include, optional, by default all files are included
AddFile($file, $name = null, $relativePath = "", $comment = "")
Prepares a single file to be included into the archive. Takes four arguments:
$file
- a full file path $name
- the name of the file in the archive, optional, default is the base name of the file $relativePath
- the path of the file inside the archive, optional, default is the archive root, use slash '/
' as path separator $comment
- a file comment, optional, default is empty
AddData($data, $name, $relativePath = '', $comment = '', $filetime = null)
Prepares a single file to be sent from raw data. Takes five parameters:
$data
- the raw data of the file, stored in memory $name
- the name of the file in the archive $relativePath
- the path of the file inside the archive, optional, default is the archive root, use slash '/
' as path separator $comment
- a file comment, optional, default is empty $filetime
- the last modification time of the file, optional, default is current time
Clear()
Resets the instance to start from scratch.
Send()
Sends the collected files in an assembled ZIP archive to the client.
Example
require_once('BjSZipper.php');
$zip = new BjSZipper('images.zip');
$zip->AddDir(dirname(__FILE__), true, '/\.(jpg|jpeg)/i');
$zip->AddFile('/var/www/html/testdata.bin');
$zip->AddData('All the JPEG images.', 'desc.txt');
$zip->Send();
2. Immediately Start Sending (Static)
This method uses a static approach. Each file is directly sent after collecting its data, file information is stored in memory for the final central directory. The profit is a faster reaction time for the client because the download starts immediately after the first file is processed, also the memory usage is slightly better as only archive relevant data is stored and in case raw data is added that is not kept for later sending. The downside is that the script cannot know the resulting archive size thus there will be no progress display for the client.
Methods
static Begin($zipName = 'downlaod.zip', $unlimitedTime = true)
Sends the download header to the client. Takes two parameters:
$zipName
- the filename of the archive presented to the client, optional, default is 'download.zip' $unlimitedTime
- if true
, set_time_limit(0)
is used to disable the PHP execution time limit, optional, default is true
static SendFile($file, $name = null, $relativePath = '', $comment = '')
Appends a single file to the archive stream to the client. Takes four parameters:
$file
- the full path of the file $name
- the name of the file in the archive, optional, default is the file's base name $relativePath
- the path of the file relative to the archive root, delimiter is a slash '/
', optional, default is the archive root $comment
- a comment for this file, optional, default is empty
static SendDir($path, $recursive, $filter = null)
Appends all specified files from a directory to the archive stream to the client. All files are added relative to $path
to the archive root. Takes three parameters:
$path
- the full path of the directory to get the files from $recursive
- if true
, subdirectories are searched also, optional, default is true
$filter
- a Regular Expression filtering files to add, optional, default is all files found
static SendData($data, $name, $relativePath = '', $comment = '', $filetime = null)
Appends a file from raw data to the archive stream to the client. Takes five parameters:
$data
- the raw data of the file to append $name
- the name of the file in the archive $relativePath
- the path of the file in the archive relative to the archive root, optional, default is the archive root $comment
- a comment for this file, optional, default is empty $filetime
- the file modification time for the file in the archive, optional, default is the current time
static End($comment = '')
Sends the central directory and end part to the client and thus ends the archive. Takes one parameter:
$comment
- a comment for the archive
Example
require_once('BjSZipper.php');
BjSZipper::Begin('images.zip');
BjSZipper::SendDir(dirname(__FILE__), true, '/\.(jpg|jpeg)/i');
BjSZipper::SendFile('/var/www/html/testdata.bin');
BjSZipper::SendData('All the JPEG images.', 'desc.txt');
BjSZipper::End();
Points of Interest
I wrote this code with the aim to get it to work - there are basically no security measures included and almost no exception handling. Please be aware of that when using this.
History
- Version 1.0: Instance and static functionality