Introduction
The PDF format is the go-to medium for document exchange around the world, whether it’s on a PC, phone, or tablet. But if the original author of a PDF isn’t careful when creating it, a PDF’s file size can quickly balloon. Many PDFs suffer from this condition of bloated file size, and are far bigger than they need to be. This presents huge problems in terms of storage, transmission, and shareability of such content for users, companies, and websites.
Bloated PDF documents take a long time to display, especially when viewed online or through a network, and they take a lot of processing power, as well. At a time when most content creators want their documents to be mobile-friendly, bloated PDFs can take an unacceptably long time to download to mobile devices, and can tax mobile devices’ processing power when displayed. Corporate repositories overflow their available storage space when thousands or millions of stored PDF files are bigger than they need to be.
All of these problems caused by oversize PDF documents can be effectively addressed through technology that compresses PDF documents for optimal size.
This whitepaper describes a proven method for conveniently compressing PDF documents to reduce their storage footprint and accelerate their transmission and display speed.
Compression is the Key
The majority of PDF bloat is due to embedded images in the PDF document. Many PDFs are nothing more than pages and pages of uncompressed scanned images, which can take up huge amounts of space. Even when they have been compressed, the images often still take up far more space than they should because non-optimal compression has been applied to them.
Besides compressing images, the method described in this whitepaper takes additional avenues to make documents even smaller. By removing embedded thumbnails, unneeded fonts and metadata, and by down-sampling images, this method can reduce a PDF document’s size even more.
While any method for compressing PDF documents must first address file size, convenience is a factor, as well. For example, if you have a repository of millions of scanned medical authorization forms from the last decade, rescanning the actual hardcopy with better compression is simply impractical. Any workable solution must not only reduce file sizes and preserve document quality, but also be convenient to apply to a large number of documents in a single operation.
Deficiencies in Some Solutions
Although there are many PDF compression applications available in today’s marketplace, document quality is not their strong suit. Many are written using open-source PDF libraries that in turn use open-source compressors, which can typically complete the task, but often at the cost of quality and accuracy. The resulting PDF document is smaller, but not as small as it could be, and the appearance of the document may have been significantly degraded in the process. When evaluating options for PDF compression, it’s important to consider both compression ratio and the quality of compressed documents.
Another common weakness of PDF compression software is revealed in the compression of images that use a color space other than DeviceRGB for color images. Some compression applications can process only DeviceRGB; these applications typically convert images that use other color spaces to DeviceRGB for compression, with unpredictable results. PDF producers intend the color in a document to look a certain way, and radically changing the intended color space for no good reason is tantamount to sacrilege in the publishing industry.
Other common failings in some PDF compression tools include corrupted output with visible errors, an inability to compress secure PDFs, and compression that fails on certain types of documents in ways that actually make the file size larger, not smaller.
Example
To show what can be achieved with effective compression, the code sample below calls PDF Xpress, a software development kit for adding PDF functions to applications, including creation, modification and compression. PDF Xpress handles color spaces other than DeviceRGB properly, compresses secure documents, and avoids other problems common in PDF compression applications.
PDF Xpress enables you to customize compression to suit your needs. For example, the intuitive API allows you to choose whether you want to target JPEG or JPEG2000 compression for grayscale and color images. The toolkit applies JBIG2 compression for monochrome images, and empowers you to control how aggressively the images will be compressed. Without any customization, PDF Xpress automatically selects and applies a good compression ratio that yields visually lossless results in most cases. It can optionally apply lossless compression.
The following intuitive C# code opens a PDF document in PDF Xpress, compresses it, and saves it as a new, smaller PDF file:
using (PdfXpress pdf = new PdfXpress())
{
pdf.Initialize();
using (Document doc = new Document(pdf, "document.pdf"))
{
Accusoft.PdfXpressSdk.SaveOptions saveOptions = new Accusoft.PdfXpressSdk.SaveOptions();
saveOptions.Compress = true;
saveOptions.Linearized = true;
saveOptions.Overwrite = true;
saveOptions.Filename = "compressed.pdf";
doc.Save(saveOptions);
}
}
Summary
PDF document compression is a popular feature in a PDF workflow, but it is often misunderstood and sometimes even grossly mishandled. In order to ensure both effective compression and preservation of the document’s quality, it’s important to select capable PDF compression tools and to apply them with settings customized to meet the requirements of your content management goals.