…Using Multiple Compression Algorithms within Individual PDFs

PDF files are ubiquitous because they provide a standards-based way to store and disseminate content without being tied to specific viewing software. PDF files often makeup a large percentage of the total number of files and occupy a large share of the total drive space used by an organization.

One important fact that is often overlooked by organizations using PDFs is that ever since Adobe published PDF standard 1.5 in 2003, the standard has provided for compressing PDF files using multiple compression algorithms for individual PDF files. Using multiple compression algorithms can save significantly in storage space and file transmission loads.

PDF Compression Algorithms v01Using multiple algorithms means that the algorithm that is best suited for individual types of objects within the files can be used. For example, different algorithms will do better jobs on monochrome images, color images, vector graphics, or text. PDF creation software permits systems administrators to determine how many and which types of algorithms will be used to compress the differing objects in the underlying files or documents. Some of the possibilities include:

  • CCITT G3/G4
  • Flate
  • JBIG2
  • JPEG
  • JPEG2000
  • LZW
  • RLE
  • ZIP

Benefits

The benefits of using multiple algorithms for different objects in a PDF are significant. The most obvious one is reducing required storage space by being able to store more files per any given space. The following table provides representative savings achieved under different options, with the usual caveat that results vary with individual content collections:

PDF_File_Compression_metrics
Note that with standards-compliant compression there is no additional load on the computer or device used to view or print the compressed PDF files.

As an example of compression savings, this LINK is to a single-algorithm compressed file that was over 460 MB completely uncompressed but was compressed to 4 MB. This LINK is to the same file compressed with multiple compression algorithms, which is only 2.3 MB. The two files are indistinguishable when viewed or printed but the second is 43% smaller.

Of course, very few things in life or IT are free. As noted in the above table, using multiple compression algorithms requires more computing resources during the compression phase, making it less likely that providers who charge by the page or by the document will want to recommend it. Further, providers who charge by the gigabyte to store content may also be less apt to recommend the process.

More Information

For more information on how BeyondRecognition can help you maximize your IT/InfoGov expenditures contact us using the below contact form or send an email to info@beyondrecognition.net.

References

If you’d like to dig deeper into some of the nuances of PDF file compression like lossy vs. lossless compression or the history of the initial Adobe standard and its conversion to an ISO standard, see the following links:

Comments are closed.