Introduction
Converting HTML pages to different formats and especially to PDF has become a widely spread routine for web developers. The process itself is plenty straightforward, because there are quite a lot of PDF development libraries and services around the web. However, one day you may need not just to make a PDF copy of the page, but to automatically add some modifications to the result PDF output (for example, you may want to access SVG data on the page). In this article, I’m going to show a simple example of accomplishing this task in ASP.NET using some .NET and XSL tips and a PD4ML PDF library.
Step 1: Searching for xHTML Markup
ASP.NET is great for easily creating complicated pages. However, all these controls and other stuff have very little in common with result xHTML markup, which is rendered and sent to the client. That’s why the first thing we are going to do is to somehow bring it to the light. The markup is created with the help of “Render
” method of the page’s life cycle, so we need to override this method.
protected override void Render(HtmlTextWriter output)
{
StringWriter writer = new StringWriter();
HtmlTextWriter htmlWriter = new HtmlTextWriter(writer);
base.Render(htmlWriter);
string htmlMarkup = writer.ToString();
StreamWriter XMLwriter = new StreamWriter(Server.MapPath("Htmloutput.xml"));
XMLwriter.Write(htmlMarkup);
XMLwriter.Close();
output.Write(htmlMarkup);
}
Step 2: Getting Ready for XSL Transformation
Now we need to prepare our XSLT file. ASP.NET produces a valid xHTML markup, hence we just need to change it according to our needs, but there are still some problems you may face:
Step 3: Creating PDF File
That is, where we come to our final goal. All we need to do is to perform XSL transformation and create PDF file. I‘ll use – PD4ML HTML to PDF converting library, because it’s possible to use it in different programming languages, like Java, PHP, Ruby, etc. I’m going to use MemoryStream
because I don’t want to save any intermediate data to hard drive.
protected void MakePDFButton_Click(object sender, EventArgs e)
{
string XSLTFile = Server.MapPath("XSLTFile.xslt");
string XMLFile = Server.MapPath("HTMLoutput.xml");
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
XmlReader reader = XmlReader.Create(XMLFile, settings);
XslCompiledTransform XSLTransform = new XslCompiledTransform();
XSLTransform.Load(XSLTFile);
Stream memoryStream = new MemoryStream();
XSLTransform.Transform(reader, null, memoryStream);
memoryStream.Flush();
memoryStream.Position=0;
reader.Close();
StreamReader streamReader=new StreamReader(memoryStream);
string output=streamReader.ReadToEnd();
HTMLoutput.Text = Server.HtmlEncode(output);
PD4ML PDFcreator = new PD4ML();
PDFcreator.PageSize = PD4Constants.A4;
PDFcreator.DocumentTitle = "The result PDF file";
string path=Server.MapPath("Output.pdf");
StreamWriter streamWriter = new StreamWriter(path);
memoryStream.Position = 0;
PDFcreator.render(memoryStream as MemoryStream, streamWriter);
streamReader.Close();
streamWriter.Close();
}
Conclusion
That's it! Now let's come up with a short summary:
- Use override “
Render
” method to manipulate and obtain xHTML markup.
- Use custom XML namespace prefix to reach non-prefixed xHTML nodes.
- Use little xslt “
xmlns=http://www.w3.org/1999/xhtml
” hack to get rid of numerous xmlns=""
nodes.
- Use
<xsl:template match="xhtml:body//text()">
if you need to get rid of plain text, which isn't wrapped by any element.
I hope that the combination of a valid xHTML markup, which is taken “for granted” by Visual Studio developers and several easy tips, which were described above will give you countless possibilities of manipulating your document's data.
History
- 1st March, 2011: Initial post