Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

PDF Manipulation for Programmers

0.00/5 (No votes)
12 Nov 2021 1  
This tutorial will explain PDF manipulation, its importance, and its use cases.

PDF manipulation, in its simplest form, is creating, reading, and editing PDF files. As these files are widely used for many purposes across multiple industries, it’s important that you’re able to manipulate PDF files in your software or apps.

This tutorial will explain PDF manipulation, its importance, and its use cases. It will also demonstrate some challenges developers face when creating a PDF manipulation tool and the way these challenges can be handled easily with SDK libraries from Foxit.

Visit the Foxit PDF SDK Web Demo and see for yourself, by exploring the configurations and features.

Why Use PDF Manipulation

Storing documents and files in PDF has become essential, as PDFs are an easily shareable copy of a file that contains text, graphics, or other data. PDF files enable businesses and governments to store, read, and share documents while ensuring that the files meet their data retention policy.

Along with this growing reliance on PDF files comes the need to create, edit, and read these files. Businesses might also need to merge files or remove pages from them. These and other types of changes, such as the ability to add signatures, annotate, or add comments, are part of PDF manipulation.

As the required features and functionalities expand with each use case, PDF manipulation becomes a more essential skill for programmers.

Challenges of PDF Manipulation

PDF files are meant to be easily viewable and shareable, so it can be hard to manipulate them. This can make building a PDF manipulation tool more difficult.

Following are some of the challenges involved and the way they can be resolved with Foxit’s SDKs.

You can find the code for these examples in this GitHub repository.

Decode and Read PDF Files

An essential part of any PDF manipulation tool is to allow users to view their PDF files. This can be hard to implement, even with the most popular libraries.

Without Foxit

Let’s say we want to build a PDF viewer that allows users to open and view a PDF file. Using PDF.js, a popular JavaScript library, the code will look something like this:

JavaScript
<input type="file" name="pdf" id="pdf_input" />
<canvas id="pdf" style="display: block;"></canvas>
<script src=""></script>
<script>
  const fileInput = document.getElementById("pdf_input");
  const pdfElement = document.getElementById("pdf");
  pdfjsLib.workerSrc = '';

  //bind change event to file input
  fileInput.addEventListener('change', decodePDF)

  function decodePDF() {
      const fileReader = new FileReader();
      fileReader.readAsDataURL(fileInput.files[0]);
      fileReader.onloadend = function (event) {
          convertToBinary(event.target.result);
      }
  }

  const BASE64_MARKER = ';base64,';

  function convertToBinary(dataURI) {
      const base64Index = dataURI.indexOf(BASE64_MARKER) + BASE64_MARKER.length;
      const base64 = dataURI.substring(base64Index);
      const raw = window.atob(base64);
      const rawLength = raw.length;
      const array = new Uint8Array(new ArrayBuffer(rawLength));

      for (let i = 0; i < rawLength; i++) {
          array[i] = raw.charCodeAt(i);
      }
      getPDF(array);
  }

  function getPageText(pageNum, PDFDocumentInstance) {
      // Return a Promise that is solved once the text of the page is retrieved return new Promise(function (resolve, reject) {
          PDFDocumentInstance.getPage(pageNum).then(function (pdfPage) {
              // The main trick to obtain the text of the PDF page, use the getTextContent method
              pdfPage.getTextContent().then(function (textContent) {
                  const textItems = textContent.items;
                  let finalString = "";

                  // Concatenate the string of the item to the final string for (let i = 0; i < textItems.length; i++) {
                      const item = textItems[i];

                      finalString += item.str + " ";
                  }

                  resolve(finalString);
              });
          });
      });
  }

  function getPDF(pdfAsArray) {

      pdfjsLib.getDocument(pdfAsArray).promise.then(function (pdf) {

        for(let i = 1; i <= pdf._pdfInfo.numPages; i++) {
          pdf.getPage(i).then(function (page) {
            const scale = 1.5;
            const viewport = page.getViewport({ scale: scale, });

            const canvas = document.getElementById('pdf');
            const context = canvas.getContext('2d');
            canvas.height = viewport.height;
            canvas.width = viewport.width;

            const renderContext = {
              canvasContext: context,
              viewport: viewport
            };
            page.render(renderContext);
          })
        }

      }).catch(console.error);
  }
</script>

This code allows users to choose a PDF file by using the file input. The file is retrieved as a base64 string, then converted to an array of bytes and passed to PDF.js’s method getDocument. This allows you to traverse the pages of the PDF and render the file in a canvas element.

As you can see, it correctly displays the PDF when the user chooses a file.

Viewer in PDF.js

Using Foxit’s SDK

Foxit’s SDK makes the viewing process easier. This is the code to create the reader:

JavaScript
<input type="file" name="pdf" id="pdf_input" />
<div id="pdf"></div>
<script src="/license-key.js"></script>
<script src="/lib/PDFViewCtrl.full.js"></script> <script> const fileInput = document.getElementById("pdf_input");

  const pdfViewer = new PDFViewCtrl.PDFViewer({
      libPath: '/lib',
      jr: {
          licenseSN: licenseSN,
          licenseKey: licenseKey,
          tileSize: 300,
      },
  });
  pdfViewer.init('#pdf');


  function decodePDF (e) {
    pdfViewer.openPDFByFile(fileInput.files[0]);
  }

  fileInput.addEventListener('change', decodePDF);
</script>

Be sure to load the license-key.js and /lib/PDFViewCtrl.full.js scripts that you received when downloading the SDK. Then initialize the PDF viewer and specify the HTML element it should render the PDF into. Use the method openPDFByFile to view files that the user chooses.

When you run the code, observe that not only does it successfully render the PDF but it makes the quality much clearer. Foxit’s rendering engine is able to achieve stronger results because it’s been developed to quickly provide best-quality images, even with large, complex files.

With Foxit’s SDK

Adding and Viewing Annotations

Part of sharing or collaborating on PDF files is the ability to annotate or add comments to the file.

If you were to add annotations to your PDF manipulation tool functionality, you’d need to draw the annotation into the PDF rather than add it as text. To add an annotation, you’d need to add a rectangle at certain coordinates (usually chosen by the user), then insert the text inside that rectangle.

This is because PDF files are meant to be viewed rather than manipulated, and simply adding text to the document will not work. The visual of the output data is most important in this format. For that reason, adding annotations can be challenging, especially since users need to be able to see those annotations on any supported PDF viewer after exporting the file.

With Foxit’s SDK

Foxit’s SDK has a full editor that provides out-of-the-box support for adding, viewing, and properly exporting annotations in PDF files. To use annotations and comments, include all the scripts required from the SDK you downloaded and add an HTML element for the editor to be rendered in:

HTML
<head>
        <link rel="stylesheet" href="/lib/UIExtension.css">
        <script src="/lib/adaptive.js"></script>
</head>
<body>
<div id="pdf"></div>
<script src="/license-key.js"></script> <script src="/lib/UIExtension.full.js"></script> <script src="/lib/preload-jr-worker.js"></script>

Then add a script that first initializes the PDF UI:

JavaScript
const readyWorker = preloadJrWorker({
        workerPath: '/lib/',
        enginePath: '/lib/jr-engine/gsdk',
        licenseSN: licenseSN,
        licenseKey: licenseKey
    });

    const pdfui = new UIExtension.PDFUI({
        viewerOptions: {
            libPath: '/lib',
            jr: {
                readyWorker: readyWorker
            }
        },
        renderTo: '#pdf',
        appearance: UIExtension.appearances.adaptive,
        addons: UIExtension.PDFViewCtrl.DeviceInfo.isMobile ?
            '/lib/uix-addons/allInOne.mobile.js':
            '/lib/uix-addons/allInOne.js'
    });

    //open a ready file
    pdfui.openPDFByHttpRangeRequest({
        range: {
            url: '/Sample.pdf',
        }
    }, { fileName: 'Sample.pdf' });

This will initialize the PDF viewer with a UI provided by Foxit’s SDK, which adds features, including comments and annotations.

PDF UI

You can add and view comments inside the PDF viewer.

Manage comments

And when you download the PDF file, you’ll be able to view the annotation in other PDF viewers.

Using another PDF viewer

Try our SDK for Web Demo in your browser, no download or login required.

Digital Signatures

Digital signatures are used often in some businesses, but implementing them to ensure their security and validity can be complex.

There are a few steps required when signing a document:

  1. Open the document and add the UI to allow users to sign the document.
  2. Transform the signature into a filestream.
  3. Calculate the message digest of the document with the signature.
  4. Encrypt the message and the document with the signer’s digital private key, like a p12 file, and with your certificate or keys.
  5. Write the encrypted signed data into a filestream.

Then you need to verify the signed data on the server with the following steps:

  1. Get the original PDF document’s content with the signature’s byteRange, the signed data, and the signer’s info.
  2. Calculate the message digest of the content with the signature’s byteRange.
  3. Verify the calculated digest with the signed data.

As you can see, this process is complex, and building it from the ground up can be troublesome.

With Foxit’s SDK

Foxit’s SDK provides ready-to-use APIs that can perform this entire process.

To add a digital signature feature in your PDF manipulation tool, first initialize Foxit’s PDF UI. (The code for this was included in the previous section.)

Among other features, the SDK provides a signature field for digital signatures.

Adding signature form

Then add a signature handler. When the user signs a signature form field, the handler transforms the data related to the signature and the signer with the PDF into a filestream, then sends a Blob to the server. The server calculates the message digest of the signature and returns the signed document.

JavaScript
pdfui.registerSignHandler({
    filter: 'Adobe.PPKLite',
    subfilter: 'adbe.pkcs7.sha1',
    flag: 0x100,
    distinguishName: 'e=support@yourcompany.com',
    location: 'FZ',
    reason: 'Test',
    signer: 'web sdk',
    showTime: true,
    sign: function(setting, buffer) {

        const formData = new FormData();
        formData.append('plain', new Blob([buffer]));


        return fetch('https://webviewer-demo.foxitsoftware.com/signature/digest_and_sign', {
          method: 'POST',
          body: formData
        }).then((response) => response.arrayBuffer());
    }
});

Signing signature form field

The example above uses Foxit’s test servers. Using the example provided by Foxit, your server’s code for digest_and_sign might look something like this:

JavaScript
router.post('/digest_and_sign', koabody({ multipart: true }), async (ctx) => {
    fs.copyFileSync(ctx.request.files.plain.path, '.\\temp\\plain');
    let { filter, subfilter, signer, md } = ctx.request.body;
    if (!md) md = 'sha1';
    if (!subfilter) subfilter = 'adbe.pkcs7.detached';
    if (subfilter == 'adbe.pkcs7.sha1') {
        process.execSync(
            '.\\bin\\pkcs7.exe digest .\\temp\\plain .\\temp\\sha1'
        );
        process.execSync(
            '.\\bin\\pkcs7.exe sign .\\bin\\foxit_all.pfx 123456 .\\temp\\sha1 .\\temp\\signedData'
        );
    } else if ((subfilter = 'adbe.pkcs7.detached')) {
        switch (md) {
            case 'sha1':
                md = '0';
                break;
            case 'sha256':
                md = '1';
                break;
            case 'sha384':
                md = '2';
                break;
        }
        process.execSync(
            '.\\bin\\pkcs7.exe sign .\\bin\\foxit_all.pfx 123456 .\\temp\\plain .\\temp\\signedData Yes ' +
                md
        );
    }
    ctx.body = fs.createReadStream('.\\temp\\signedData');
    return;
});

Note that this code requires you to use your PKCS7 and PFX keys.

Next is the verification step. To verify the signature of the PDF, send the message digest to the server, along with all the data related to the signature and the signer:

JavaScript
pdfui.setVerifyHandler(function (signatureField, plainBuffer, signedData){
  const formData = new FormData();
  formData.append('filter', signatureField.getFilter());
  formData.append('subfilter', signatureField.getSubfilter());
  formData.append('signer', signatureField.getSigner());
  formData.append('plainContent', new Blob([plainBuffer]));
  formData.append('signedData', new Blob([signedData]));

  return fetch('https://webviewer-demo.foxitsoftware.com/signature/verify', {
          method: 'POST',
          body: formData
        }).then((response) => response.text());
});

This handler will be executed when the signed signature field is clicked in Foxit’s SDK UI:

Verification response

Again, this is using Foxit’s testing servers to verify the signature. If you want to verify the signature on your server, the code will be something like this:

JavaScript
router.post('/verify', koabody({ multipart: true }), async (ctx) => {
    let { filter, subfilter, signer } = ctx.request.body;


    fs.copyFileSync(
        ctx.request.files.plainContent.path,
        '.\\temp\\plainBuffer'
    );
    fs.copyFileSync(ctx.request.files.signedData.path, '.\\temp\\signedData');


    if (subfilter == 'adbe.pkcs7.sha1') {
        process.execSync(
            '.\\bin\\pkcs7.exe digest .\\temp\\plainBuffer .\\temp\\digest'
        );
        process.execSync(
            '.\\bin\\pkcs7.exe verify .\\temp\\signedData .\\temp\\digest .\\temp\\output'
        );
    } else if ((subfilter = 'adbe.pkcs7.detached')) {
        process.execSync(
            '.\\bin\\pkcs7.exe verify .\\temp\\signedData .\\temp\\plainBuffer .\\temp\\output'
        );
    }


    ctx.body = fs.createReadStream('.\\temp\\output');
});

Conclusion

With the increased need to use, share, and collaborate on PDF files, it’s necessary to integrate a PDF manipulation tool into your system or app. However, the PDF was designed to be easily viewable by multiple users, not necessarily editable.

For that reason, using PDF manipulation SDKs like Foxit is essential to your project. Foxit provides SDKs and APIs for all platforms and project types so that you can manipulate PDF files with ease.

Try Foxit PDF SDK’s advanced technology on your chosen platform(s): Web, Windows, Android, iOS, Linux, UWP, or Mac. Sign up for a free thirty-day trial today.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here