Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Simple CSS Parser

0.00/5 (No votes)
24 Feb 2012 1  
A simple CSS parser designed to work with iTextSharp for HTML to PDF generation

Introduction

Cascading Style Sheets allow developers to create nice user interfaces for the web. They are easy to build, use, and maintain. iTextSharp can take advantage of CSS when using its built in HTML to PDF functionality. Getting the style sheet information from the CSS into iTextSharp requires the developer to read the CSS file and convert it to Dictionary consumable by iTextSharp. This article will illustrate a simple solution for performing just that task. The included solution includes Unit Tests and an ASP.Net project which demonstrate how to use the CSSParser.

Background

While working on an HTML to PDF utility I found the need to parse Cascading Style Sheets. There are many CSS parsers on the internet but none fit my needs. I created this simple Regular Expression based CSS parser in C# to facilitate PDF generation in iTextSharp. The requirements for the CSS Parser are as follows:

Requirements

  1. Read a CSS file
  2. Store CSS in a Collection
  3. Query for the classes and their properties
  4. Query for the elements and their properties
  5. Easy to maintain and enhance
  6. Easily feed the style information into iTextSharp to turn HTML into PDF
  7. It should be lean
  8. Something another developer can use

Using the code

The CSSParser inherits from a generic List of KeyValuePair. The key will be the CSS selector. The value will be another list of key value pairs. The key here is the CSS attribute name. The value will be the CSS property value. I used a generic List instead of a Dictionary because Cascading Style Sheets can have the same selector or attributes listed multiple times.

public partial class CSSParser : List<KeyValuePair<String,List<KeyValuePair<String,String>>>>, ICSSParser

The core of the CSS parser is a regular expression which I found on Stack Overflow (http://stackoverflow.com/a/2694121/899290). The CSSGroups regular expression will take the stylesheet and break it up into named groups. Before parsing the CSS the CSSComments regular expression will be used to remove CSS comments from the file.

public const String CSSGroups = @"(?<selector>(?:(?:[^,{]+),?)*?)\{(?:(?<name>[^}:]+):?(?<value>[^};]+);?)*?\}";

public const String CSSComments = @"(?<!"")\/\*.+?\*\/(?!"")";

private Regex rStyles = new Regex(CSSGroups, RegexOptions.IgnoreCase | RegexOptions.Compiled);

The Read method is responsible for parsing the values in the style sheet and filling the generic List. It will use the .Net Regex class to remove any comments and populate the collections.

public void Read(String CascadingStyleSheet)
{
    this.StyleSheet = CascadingStyleSheet;

    if (!String.IsNullOrEmpty(CascadingStyleSheet))
    {
        //Remove comments before parsing the CSS. Don't want any comments in the collection.
        MatchCollection MatchList = rStyles.Matches(Regex.Replace(CascadingStyleSheet, 
            RegularExpressionLibrary.CSSComments, String.Empty));
        foreach (Match item in MatchList)
        {
            //Check for nulls
            if (item != null && item.Groups != null && 
                item.Groups[SelectorKey] != null && 
                item.Groups[SelectorKey].Captures != null && 
                item.Groups[SelectorKey].Captures[0] != null && 
                !String.IsNullOrEmpty(item.Groups[SelectorKey].Value))
            {
                String strSelector = item.Groups[SelectorKey].Captures[0].Value.Trim();
                var style = new List<KeyValuePair<String,String>>();

                for (int i = 0; i < item.Groups[NameKey].Captures.Count; i++)
                {
                    String className = item.Groups[NameKey].Captures[i].Value;
                    String value = item.Groups[ValueKey].Captures[i].Value;
                    //Check for null values in the properies
                    if (!String.IsNullOrEmpty(className) && !String.IsNullOrEmpty(value))
                    {
                        className = className.TrimWhiteSpace();
                        value = value.TrimWhiteSpace();
                        //One more check to be sure we are only pulling valid css values
                        if (!String.IsNullOrEmpty(className) && !String.IsNullOrEmpty(value))
                        {
                            style.Add(new KeyValuePair<String,String>(className, value));
                        }
                    }
                }
                this.Add(new KeyValuePair<String,List<KeyValuePair<String,String>>>(strSelector, style));
            }
        }
    }
}

Once the list is populated it’s a simple matter of using LINQ or Lambda expressions to pull the information you need. The Classes and Elements properties expose the values of the style sheet as a Dictionary which can be fed to iTextSharp.

public Dictionary<String, Dictionary<String,String>> Classes
{
    get
    {
        if (classes == null || classes.Count == 0)
        {
            this.classes = this.Where(cl => cl.Key.StartsWith("."))
                .ToDictionary(cl => cl.Key.Trim(new Char[] { '.' }), cl => cl.Value
                    .ToDictionary(p => p.Key, p => p.Value));
        }

        return classes;
    }
}

public public Dictionary<String, Dictionary<String,String>> Elements
{
    get
    {
        if (elements == null || elements.Count == 0)
        {
            elements = this.Where(el => !el.Key.StartsWith("."))
                .ToDictionary(el => el.Key, el => el.Value
                    .ToDictionary(p => p.Key, p => p.Value));
        }
        return elements;
    }
}

Using the CSS Parser

The CSSParser gives you two options to read a Cascading Style Sheet, read a CSS file or a string. The ReadCSSFile method will read a CSS file and populate the collections. You can read a String containing CSS information by calling the Read method or passing the CSS values to the constructor.

void lnkParseCSSFile_Click(object sender, EventArgs e)
{
    CSSParser parser = new CSSParser();
    parser.ReadCSSFile(Server.MapPath("~/CSSParserStyle.css"));
    //Display the Original CSS with some formating for the web
    this.divOriginalCSS.InnerHtml = parser.StyleSheet.FixLineBreakForWeb().FixTabsForWeb().FixSpaceForWeb();
    //Display the parsed CSS
    this.divParsedCSS.InnerHtml = parser.ToString();
    this.spnOriginalCSSLength.InnerText = parser.StyleSheet.Length.ToString();
    this.spnParsedCSSLength.InnerText = this.divParsedCSS.InnerHtml.Length.ToString();
}

Points of Interest

The CSSParser Elements and Classes properties target iTextSharp version 5.x

History

  1. Version 1.0- Initial Release

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here