HTML to PDF HttpHandler

February 17, 2012

By

Recently, we were looking for lightweight reporting tools for integration into one of our Asp.Net projects, that would make things easy to pull html and pdf versions of the report. There is no shortage of enterprise reporting frameworks each with their own daunting learning curve, custom integration points, installation processes, and export features. Of course, the word enterprise immediately throws us off track, and if we needed an earth mover maybe we’d consider products like:

Many of these solutions (Crystal in particular) have more features than anyone can learn in a lifetime, and those features you do learn could keep you up at nights with complicated designers and VB syntax instead of sheep to count. We were in the market for something simple, easy to maintain, and flexible enough to leverage existing application architecture. Ideally it would be inexpensive (or free), easy to deploy and easy to integrate with our applications. That basically excludes any of the options listed above.

Even better would be a solution that would align with our core compentencies in C# ASP.NET, semantic HTML, CSS and the like. This thought led us in the direction of looking for a WebKit enabled HTML to PDF conversion tool that we could include in the project. There are many available including open source projects and we finally settled on wkhtmltopdf based on reviews and the final product is extremely powerful.

The way this utility works is simple: give it a url and it will use WebKit to convert to pdf. This left us with only a .NET wrapper and an HttpHandler to implement. I created a simple interface for the wrapper so that the conversion utility is injectable (who knows when a better solution comes along and we’ll want to swap it out). The wrapper just creates a Process object out of the System.Diagnostics namespace. In this case I am assuming we’ll be in a web context and that there is a wkhtmltopdf directory relative to the root of the application:

public interface IPdfGenerator
{
	void ConvertToPdf(string url, Stream ouputStream);
}

public class WkHtmlToPdfGenerator : IPdfGenerator
{
	private const int maxFileSize = 32768;
	private const int maxWaitTime = 60000;

	public void ConvertToPdf(string url, Stream outputStream)
	{
		using (var proc = CreateProcess())
		{
			var fileName = " - "; // some "empty" name since we are redirecting the output
			proc.StartInfo.Arguments = url + " " + fileName;
			proc.Start();
			WriteToOutput(proc, outputStream);
			// give it a few seconds to finish
			proc.WaitForExit(maxWaitTime);
		}
	}

	private void WriteToOutput(Process proc, Stream outputStream)
	{
		byte[] buffer = new byte[maxFileSize];
		while (true)
		{
			int read = proc.StandardOutput.BaseStream.Read(buffer, 0, buffer.Length);

			if (read <= 0)
			{
				break;
			}
			outputStream.Write(buffer, 0, read);
		}
	}

	private Process CreateProcess()
	{
		var workingDirectory = HttpContext.Current.Server.MapPath("~/wkhtmltopdf");
		var wkexe = workingDirectory + "\\wkhtmltopdf.exe";
		Process proc = new Process();
		proc.StartInfo.CreateNoWindow = true;
		proc.StartInfo.RedirectStandardOutput = true;
		proc.StartInfo.RedirectStandardError = true;
		proc.StartInfo.RedirectStandardInput = true;
		proc.StartInfo.UseShellExecute = false;
		proc.StartInfo.FileName = wkexe;
		proc.StartInfo.WorkingDirectory = workingDirectory;
		return proc;
	}

}

Now for the hanlder. In GetDescriptor, the handler expects that you’ll pass a query parameter of “url” and optionally allows you to name the file that gets downloaded. Then we use a memory stream to export the output of our IPdfGenerator implementation to a byte array. We name the file and write the byte array out using a buffered response stream:

public class HtmlToPdfHandler : IHttpHandler
{
	public void ProcessRequest(HttpContext context)
	{
		context.Response.Clear();
		context.Response.ClearHeaders();

		var info = GetTemplateInfo(context);
		if (info == null)
		{
			context.Response.Write("Please specify a url");
		}
		else
		{
			byte[] content = null;
			using (MemoryStream outputStream = new MemoryStream())
			{
				var pdfGen = GetPdfGenerator();
				pdfGen.ConvertToPdf(info.Url, outputStream);
				content = outputStream.ToArray();
			}

			string fileName = string.Format("{0}.pdf", CreateSafeFileName(info.DisplayName));
			WriteToResponse(context, fileName, content);
		}
		context.Response.End();
	}

	public bool IsReusable { get { return false; } }

	#region Helper methods

	private HtmlDescriptor GetDescriptor(HttpContext context)
	{
		// if no url is specified we exit with null
		string url = context.Request.QueryString["url"];
		if (string.IsNullOrEmpty(url))
			return null;

		// if no filename is specified we default to converted
		string displayName = context.Request.QueryString["filename"];
		if(string.IsNullOrEmpty(displayName))
			displayName = "converted";

		return new HtmlDescriptor() {
			DisplayName = displayName,
			Url = url
		};
	}

	private IPdfGenerator GetPdfGenerator()
	{
		// TODO: we'd use dependency injection for production use
		return new WkHtmlToPdfGenerator();
	}

	private void WriteToResponse(HttpContext context, string fileName, byte[] content)
	{
		context.Response.Buffer = true;
		//set the appropriate content type
		context.Response.ContentType = "application/pdf";
		//Add the appropriate headers
		context.Response.AddHeader("Content-Disposition", "attachment; filename=" + fileName);

		//Stream it out via a Binary Write and close the stream
		context.Response.BinaryWrite(content);
	}

	/// <summary>
	/// Method created to convert a title into a safe filename.
	/// </summary>
	/// <param name="name"></param>
	/// <returns></returns>
	private string CreateSafeFileName(string name)
	{
		// first trim the raw string
		string safe = name.Trim();

		// replace spaces with hyphens
		safe = safe.Replace(" ", "-").ToLower();

		// replace any 'double spaces' with singles
		if (safe.IndexOf("--") > -1)
			while (safe.IndexOf("--") > -1)
				safe = safe.Replace("--", "-");

		// trim out illegal characters
		safe = System.Text.RegularExpressions.Regex.Replace(safe, "[^a-z0-9\\-]", "");

		// trim the length
		if (safe.Length > 50)
			safe = safe.Substring(0, 49);

		// clean the beginning and end of the filename
		char[] replace = { '-', '.' };
		safe = safe.TrimStart(replace);
		safe = safe.TrimEnd(replace);

		return safe;
	}

	#endregion
}

There are a couple of options for wiring this up. There is the web.config handler where in IIS 7 you add (note: in a website project you’ll use App_Code as MyAssemblyName):

<system.webServer>
  <handlers>
    <add verb="GET" path="GetPdf.ashx" type="MyNamespace.HtmlToPdfHandler, MyAssemblyName" />
  </handlers>
</system.webServer>

In IIS 6 you add it here:

<system.web>
   <httpHandlers>
      <add verb="GET" path="GetPdf.ashx" type="MyNamespace.HtmlToPdfHandler, MyAssemblyName" />
   </httpHandlers>
</system.web>

Or you can create a GetPdf.ashx file where the contents tells the type name:

<%@ WebHandler Language="C#" CodeBehind="HtmlToPdfHandler.ashx.cs" Class="MyNamespace.HtmlToPdfHandler" %>

Once you have all this wired into a web project, you can now hit the handler with any url and the contents of that page gets converted to pdf and streamed into the response as a download.

ex: http://localhost/temp/HtmlToPdfHandler.ashx?url=http://www.washingtonpost.com&filename=washingtonpost

Now all we need is to handle authentication so that wkhtmltopdf doesn’t end up on the login page for auth protected resources and we are back to writing Asp.Net applications and getting a full night’s sleep. We’ll enjoy full featured support for CSS 3, HTML 5, javascript, jquery, you name it. There are no extra deployment considerations, or additional services to install on the server. Any existing business or data access logic is readily at our disposal. You can even see how this opens the door to any page in the application to suddenly download as pdf. All in the time it took to write this blog post.

Software Engineer
Erik Nelsestuen

Passionate for timely delivery of elegant, object oriented, test-driven web apps, balanced with a diet of mountain sports and plenty of fruits and vegetables.

Comments (1)

  1. Jim Ingwersen

    Jan 31, 2013

    @ 9:30 am

    Thanks for this!

Leave a Comment