Reporting Services 2008 – Embedding a PDF Document

Steve MunLeeuw

3.63/5 (7 votes)

29 Mar 2009CPOL3 min read

132.6K

1.5K

Embedding the content of a PDF document is not supported, but we can add this feature with a SQL CLR function to parse the PDF into a list of images.

Download source code - 29.6 KB

Introduction

The users of our web application have the ability to upload supporting PDF documents, and then have them embedded in-line. This works great where the content author is uploading documents to support the content they have entered into the web application. One example of this use is an appendix where printing a separate document wouldn’t enable header/footer consistency, or page numbering. This example code uses TallComponents.PDF.Rasterizer, there are alternatives listed below.

Background

The controls available in the Reporting Services (RS) toolbar don’t allow you to add a PDF Document to a report. To meet the requirement, we reference TallComponents PDF Rasterizer in a C# assembly to parse the PDF into a list of images. We wrap the assembly with a SQL CLR table valued function to return a dataset of images. We are then able to use the Reporting Services Tablix and Image controls to render a list image in an RS report.

Using the code

Create a database to contain the PDF Documents and host the CLR assembly. I use Visual Studio Data dude:

SQL

CREATE TABLE [Document] (
   [Name] nvarchar(255) NOT NULL,
   [Extension] nvarchar(10) NOT NULL,
   [Document] image NOT NULL)
  
GO
--We will need this later in the SQL CLR function
CREATE PROCEDURE GetDocument
      @Name nvarchar(255)
AS
      SELECT [Document] FROM [Document] WHERE [Name] = @Name
 
--Instead of having the Web app load the test document we'll do it directly.
INSERT INTO [Document] ([Name], [Extension], [Document])
SELECT 'myPDF' AS [Name], 'pdf' AS [Extension], 
* FROM OPENROWSET(BULK 'C:\CodeCamp2009.pdf', SINGLE_BLOB) AS [Document]
 
--Test that we can get the document back out
EXEC GetDocument 'myPDF'
 
--enable CLR for the SQL instance
exec sp_configure 'clr enabled', '1'
GO
reconfigure
GO
 
--We will be using C# System.Drawing and TallComponent 
--assemblies which are not marked as safe for the SQL environment
ALTER DATABASE CodeCamp2009
SET TRUSTWORTHY ON
GO

Next up, create the assembly to parse the PDF into a list of images. SQL 2008 CLR is at .NET 2.0, so we set the PDFParser assembly to run with .NET 2.0.

I use the sample app ConvertToImage from TallComponents as a template for the parsing function. I then add a button to the sample to test the Parser assembly. Here is the PDFParser assembly.

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Text;
using TallComponents.PDF.Rasterizer;
using System.IO;
 
namespace PDFParser
{
    public class Parse
    {
        public List<Image> Split(byte[] document)
        {
            Document pdfDoc = new Document(new BinaryReader(new MemoryStream(document)));
            Page page = null;
            List<Image> returnVal = new List<Image>();
 
            for (int i = 0; i < pdfDoc.Pages.Count; i++)
            {
                page = pdfDoc.Pages[i];
                using (Bitmap bitmap = new Bitmap((int)page.Width, (int)page.Height))
                {
                    Graphics graphics = Graphics.FromImage(bitmap);
                    graphics.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.AntiAlias;
                    page.Draw(graphics);
                    returnVal.Add((Image)bitmap.Clone());
                }
            }
 
            return returnVal;
        }
    }
}

And, here is the code I use to test the above assembly:

private void cmdTestPDFParser_Click(object sender, EventArgs e)
{
  FileStream fs = new FileStream(@"C:\CodeCamp2009.pdf", FileMode.Open);
  byte[] pdf = new byte[fs.Length];
  fs.Read(pdf, 0, (int)fs.Length);

  PDFParser.Parse parser = new PDFParser.Parse();
  List<Image> images = parser.Split(pdf);
}

So now, we have an assembly that we can use to parse PDF documents into a list of images. Now, let’s wrap that in a SQL CLR function. For a SQL CLR function to make use of a reference, it needs to be registered in the SQL database along with all of its dependencies, so we’ll run the following SQL.

Copy TallComponents.PDF.Rasterizer.dll to 'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\Binn\TallComponents.PDF.Rasterizer.dll’.

SQL

CREATE ASSEMBLY [System.Drawing] FROM
'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll'
WITH PERMISSION_SET = UNSAFE
 
CREATE ASSEMBLY [System.Web] FROM
'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Web.dll'
WITH PERMISSION_SET = UNSAFE
 
CREATE ASSEMBLY [TallComponents.PDF.Rasterizer] FROM
'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\
          MSSQL\Binn\TallComponents.PDF.Rasterizer.dll'
WITH PERMISSION_SET = UNSAFE
 
CREATE ASSEMBLY [PDFParser.Parse] FROM
'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\
        MSSQL\Binn\ CodeCamp2009\PDFParser.dll'
WITH PERMISSION_SET = UNSAFE

On the SQL Project properties, make the following changes: Database tab, set to Unsafe, set owner to dbo.

Now that we have the SQL CLR environment set up and the dependencies created and registered, we will write the wrapper table valued function.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Security.Permissions;
using System.Collections;
using System.Drawing;
using System.Collections.Generic;
using System.IO;
using System.Drawing.Imaging;
 
[assembly: System.Security.AllowPartiallyTrustedCallers, 
  FileIOPermission(SecurityAction.RequestMinimum, Unrestricted = true)]
namespace SQLCLR
{
    [System.Security.Permissions.PermissionSet(
     System.Security.Permissions.SecurityAction.Assert, Unrestricted = true)]
    public partial class UserDefinedFunctions
    {
        [Microsoft.SqlServer.Server.SqlFunction(
             FillRowMethodName = "GetPDF_FillRow",
             TableDefinition = "PDFPageImage Varbinary(max)",
             DataAccess = DataAccessKind.Read)]
        public static IEnumerable GetPDF(SqlString DocumentName)
        {
            ArrayList items = new ArrayList();
            List<Image> pages = new List<Image>();
            object[] images;
            images = new object[1];
            MemoryStream pageStream = new MemoryStream();
            PDFParser.Parse pdfParser = new PDFParser.Parse();
 
            using (SqlConnection conn = new SqlConnection("context connection = true"))
            {
                conn.Open();
                SqlPipe pipe = SqlContext.Pipe;
                SqlCommand cmd = new SqlCommand("GetDocument", conn);
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add(new SqlParameter("@Name", DocumentName));
                SqlDataReader reader = cmd.ExecuteReader();
                byte[] pdfContent = null;
                while (reader.Read())
                  {
                    pdfContent = (byte[])reader.GetSqlBinary(0);
                    pages = pdfParser.Split(pdfContent);
                    for (int i = 0; i < pages.Count; i++)
                          {
                        MemoryStream ms = new MemoryStream();
                        pages[i].Save(ms, ImageFormat.Png);
                        items.Add((SqlBinary)ms.ToArray());
                          }
                  }
                reader.Close();
                reader = null;
                pdfContent = null;
            }
            return items;
        }
 
        private static void GetPDF_FillRow(Object obj, out SqlBinary sItem)
        {
            SqlBinary sTemp = (SqlBinary)obj;
            sItem = sTemp;
        }
    }
}

After deploying the SQL CLR project, you can test with the following query in SQL Management Studio: select * from dbo.GetPDF('myPDF').

You should get back something like this dataset of image data.

Now comes the easy part, using the dataset in a report.

Create an RS project and add a report. Use the wizard with the query select * from dbo.GetPDF('myPDF'). Replace the detail textbox with a rectangle followed by an Image control and resize to fill the page. The hierarchy should be Tablix \ Rectangle \ Image, use the Document Outline View (Ctrl + Alt + T) to check.

Set the image properties:

Image source = Database
Field = PDFPageImag
MimeType = png

Click Preview to see the PDF document translated into a set of images for consumption by the report. Yee-haw!

Points of interest

There are several approaches to parsing the PDF document into a list of images:

PDF TallComponents Rasterizer: Pro: Scales well. Con: Third party, unsure how well everything that can be placed in a PDF document can be rendered and how soon updates to Adobe will be integrated.

http://www.tallcomponents.com/pdfrasterizer2.aspx

Adobe SDK: Pro: confident that it will work on all PDF content. Con: Full SDK is expensive, using the desktop DLL is not supported in a server environment.

http://www.daniweb.com/forums/thread32369.html

Clipboard and thumbnail image: Pro: Easy to implement. Con: Clipboard, won’t work in a server environment, user A could see user B’s data.

http://ryanfarley.com/blog/archive/2006/01/31/15840.aspx

History

First CodeProject article, feels great.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)