Introduction
The users of our web application have the ability to upload supporting PDF documents, and then have them embedded in-line. This works great where the content author is uploading documents to support the content they have entered into the web application. One example of this use is an appendix where printing a separate document wouldn’t enable header/footer consistency, or page numbering. This example code uses TallComponents.PDF.Rasterizer, there are alternatives listed below.
Background
The controls available in the Reporting Services (RS) toolbar don’t allow you to add a PDF Document to a report. To meet the requirement, we reference TallComponents PDF Rasterizer in a C# assembly to parse the PDF into a list of images. We wrap the assembly with a SQL CLR table valued function to return a dataset of images. We are then able to use the Reporting Services Tablix
and Image
controls to render a list image in an RS report.
Using the code
Create a database to contain the PDF Documents and host the CLR assembly. I use Visual Studio Data dude:
CREATE TABLE [Document] (
[Name] nvarchar(255) NOT NULL,
[Extension] nvarchar(10) NOT NULL,
[Document] image NOT NULL)
GO
CREATE PROCEDURE GetDocument
@Name nvarchar(255)
AS
SELECT [Document] FROM [Document] WHERE [Name] = @Name
INSERT INTO [Document] ([Name], [Extension], [Document])
SELECT 'myPDF' AS [Name], 'pdf' AS [Extension],
* FROM OPENROWSET(BULK 'C:\CodeCamp2009.pdf', SINGLE_BLOB) AS [Document]
EXEC GetDocument 'myPDF'
exec sp_configure 'clr enabled', '1'
GO
reconfigure
GO
ALTER DATABASE CodeCamp2009
SET TRUSTWORTHY ON
GO
Next up, create the assembly to parse the PDF into a list of images. SQL 2008 CLR is at .NET 2.0, so we set the PDFParser assembly to run with .NET 2.0.
I use the sample app ConvertToImage from TallComponents as a template for the parsing function. I then add a button to the sample to test the Parser assembly. Here is the PDFParser assembly.
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Text;
using TallComponents.PDF.Rasterizer;
using System.IO;
namespace PDFParser
{
public class Parse
{
public List<Image> Split(byte[] document)
{
Document pdfDoc = new Document(new BinaryReader(new MemoryStream(document)));
Page page = null;
List<Image> returnVal = new List<Image>();
for (int i = 0; i < pdfDoc.Pages.Count; i++)
{
page = pdfDoc.Pages[i];
using (Bitmap bitmap = new Bitmap((int)page.Width, (int)page.Height))
{
Graphics graphics = Graphics.FromImage(bitmap);
graphics.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.AntiAlias;
page.Draw(graphics);
returnVal.Add((Image)bitmap.Clone());
}
}
return returnVal;
}
}
}
And, here is the code I use to test the above assembly:
private void cmdTestPDFParser_Click(object sender, EventArgs e)
{
FileStream fs = new FileStream(@"C:\CodeCamp2009.pdf", FileMode.Open);
byte[] pdf = new byte[fs.Length];
fs.Read(pdf, 0, (int)fs.Length);
PDFParser.Parse parser = new PDFParser.Parse();
List<Image> images = parser.Split(pdf);
}
So now, we have an assembly that we can use to parse PDF documents into a list of images. Now, let’s wrap that in a SQL CLR function. For a SQL CLR function to make use of a reference, it needs to be registered in the SQL database along with all of its dependencies, so we’ll run the following SQL.
Copy TallComponents.PDF.Rasterizer.dll to 'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\MSSQL\Binn\TallComponents.PDF.Rasterizer.dll’.
CREATE ASSEMBLY [System.Drawing] FROM
'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll'
WITH PERMISSION_SET = UNSAFE
CREATE ASSEMBLY [System.Web] FROM
'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Web.dll'
WITH PERMISSION_SET = UNSAFE
CREATE ASSEMBLY [TallComponents.PDF.Rasterizer] FROM
'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\
MSSQL\Binn\TallComponents.PDF.Rasterizer.dll'
WITH PERMISSION_SET = UNSAFE
CREATE ASSEMBLY [PDFParser.Parse] FROM
'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2008\
MSSQL\Binn\ CodeCamp2009\PDFParser.dll'
WITH PERMISSION_SET = UNSAFE
On the SQL Project properties, make the following changes: Database tab, set to Unsafe, set owner to dbo.
Now that we have the SQL CLR environment set up and the dependencies created and registered, we will write the wrapper table valued function.
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Security.Permissions;
using System.Collections;
using System.Drawing;
using System.Collections.Generic;
using System.IO;
using System.Drawing.Imaging;
[assembly: System.Security.AllowPartiallyTrustedCallers,
FileIOPermission(SecurityAction.RequestMinimum, Unrestricted = true)]
namespace SQLCLR
{
[System.Security.Permissions.PermissionSet(
System.Security.Permissions.SecurityAction.Assert, Unrestricted = true)]
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(
FillRowMethodName = "GetPDF_FillRow",
TableDefinition = "PDFPageImage Varbinary(max)",
DataAccess = DataAccessKind.Read)]
public static IEnumerable GetPDF(SqlString DocumentName)
{
ArrayList items = new ArrayList();
List<Image> pages = new List<Image>();
object[] images;
images = new object[1];
MemoryStream pageStream = new MemoryStream();
PDFParser.Parse pdfParser = new PDFParser.Parse();
using (SqlConnection conn = new SqlConnection("context connection = true"))
{
conn.Open();
SqlPipe pipe = SqlContext.Pipe;
SqlCommand cmd = new SqlCommand("GetDocument", conn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("@Name", DocumentName));
SqlDataReader reader = cmd.ExecuteReader();
byte[] pdfContent = null;
while (reader.Read())
{
pdfContent = (byte[])reader.GetSqlBinary(0);
pages = pdfParser.Split(pdfContent);
for (int i = 0; i < pages.Count; i++)
{
MemoryStream ms = new MemoryStream();
pages[i].Save(ms, ImageFormat.Png);
items.Add((SqlBinary)ms.ToArray());
}
}
reader.Close();
reader = null;
pdfContent = null;
}
return items;
}
private static void GetPDF_FillRow(Object obj, out SqlBinary sItem)
{
SqlBinary sTemp = (SqlBinary)obj;
sItem = sTemp;
}
}
}
After deploying the SQL CLR project, you can test with the following query in SQL Management Studio: select * from dbo.GetPDF('myPDF')
.
You should get back something like this dataset of image data.
Now comes the easy part, using the dataset in a report.
Create an RS project and add a report. Use the wizard with the query select * from dbo.GetPDF('myPDF')
. Replace the detail textbox with a rectangle followed by an Image
control and resize to fill the page. The hierarchy should be Tablix
\ Rectangle
\ Image
, use the Document Outline View (Ctrl + Alt + T) to check.
Set the image properties:
- Image source = Database
- Field =
PDFPageImag
- MimeType = png
Click Preview to see the PDF document translated into a set of images for consumption by the report. Yee-haw!
Points of interest
There are several approaches to parsing the PDF document into a list of images:
- PDF TallComponents Rasterizer: Pro: Scales well. Con: Third party, unsure how well everything that can be placed in a PDF document can be rendered and how soon updates to Adobe will be integrated.
http://www.tallcomponents.com/pdfrasterizer2.aspx
- Adobe SDK: Pro: confident that it will work on all PDF content. Con: Full SDK is expensive, using the desktop DLL is not supported in a server environment.
http://www.daniweb.com/forums/thread32369.html
- Clipboard and thumbnail image: Pro: Easy to implement. Con: Clipboard, won’t work in a server environment, user A could see user B’s data.
http://ryanfarley.com/blog/archive/2006/01/31/15840.aspx
History
First CodeProject article, feels great.