Writing Custom Markdig Extensions

Richard James Moss

5.00/5 (1 vote)

8 Aug 2017MIT10 min read

10.9K

How to write custom Markdig extensions

Downloads

Download MarkdigMantisLink.zip - 10.8 KB

Introduction

Markdig, according to its description, "is a fast, powerful, CommonMark compliant, extensible Markdown processor for .NET". While most of our older projects use MarkdownDeep (including an increasingly creaky cyotek.com), current projects use Markdig and thus far, it has proven to be an excellent library.

One of the many overly complicated aspects of cyotek.com is that in addition to the markdown processing, every single block of content is also run through a byzantine number of regular expressions for custom transforms. When cyotek.com is updated to use Markdig, I definitely don't want these expressions to hang around. Enter, Markdig extensions.

Markdig extensions allow you extend Markdig to include additional transforms, things that might not conform to the CommonMark specification such as YAML blocks or pipe tables.

MarkdownPipeline pipline;
string html;
string markdown;

markdown = "# Header 1";

pipline = new MarkdownPipelineBuilder()
  .Build();

html = Markdown.ToHtml(markdown, pipline); // <h1>Header 1</h1>

pipline = new MarkdownPipelineBuilder()
  .UseAutoIdentifiers() // enable the Auto Identifiers extension
  .Build();

html = Markdown.ToHtml(markdown, pipline); // <h1 id="header-1">Header 1</h1>

Example of using an extension to automatically generate id attributes for heading elements.

I recently updated our internal crash aggregation system to be able to create MantisBT issues via our MantisSharp library. In these issues, stack traces include the line number or IL offset in the format #<number>. To my vague annoyance, Mantis Bug Tracker treats these as hyperlinks to other issues in the system in a similar fashion to how GitHub automatically links to issues or pull requires. It did however give me an idea to create a Markdig extension that performs the same functionality.

Deciding on the Pattern

The first thing you need to do is decide the markdown pattern to trigger the extension. Our example is perhaps a bit too basic as it is a simple #<number>, whereas if you think of other issue systems such as JIRA, it would be <string>-<number>. As well as the "body" of the pattern, you also need to consider the characters which surround it. For example, you might only allow white space, or perhaps brackets or braces - whenever I reference a JIRA issue, I tend to surround them in square braces, e.g. [PRJ-1234].

The other thing to consider is the criteria of the core pattern. Using our example above, should we have a minimum number of digits before triggering, or a maximum? #999999999 is probably not a valid issue number!

Extension Components

A Markdig extension is comprised of a few moving parts. Depending on how complicated your extension is, you may not need all parts, or could perhaps reuse existing parts.

The extension itself (always required)
A parser
A renderer
A object used to represent data in the abstract syntax tree (AST)
A object used to configure the extension functionality

In this plug in, I'll be demonstrating all of these parts.

Happily enough, there's actually already an extension built into Markdig for rendering JIRA links which was great as a getting started point, including the original MarkdigJiraLinker extension by Dave Clarke. As I mentioned at the start, Markdig has a lot of extensions, some simple, some complex - there's going to be a fair chunk of useful code in there to help you with your own.

Supporting Classes

I'm actually going to create the components in a backwards order from the list above, as each step depends on the one before it, so it would make for awkward reading if I was referencing things that don't yet exist.

To get started with some actual code, I'm going to need a couple of supporting classes - an options object for configuring the extension (at the bare minimum, we need to supply the base URI of a MantisBT installation), and also class to present a link in the AST.

First the options class. As well as that base URI, I'll also add an option to determine if the links generated by the application should open in a new window or not via the target attribute.

public class MantisLinkOptions
{
  public MantisLinkOptions()
  {
    this.OpenInNewWindow = true;
  }

  public MantisLinkOptions(string url)
    : this()
  {
    this.Url = url;
  }

  public MantisLinkOptions(Uri uri)
    : this()
  {
    this.Url = uri.OriginalString;
  }

  public bool OpenInNewWindow {get; set; }

  public string Url { get; set; }
}

Next up is the object which will present our link in the syntax tree. Markdig nodes are very similar to HTML, coming in two flavours - block and inline. In this article, I'm only covering simple inline nodes.

I'm going to inherit from LeafInline and add a single property to hold the Mantis issue number.

There is actually a more specific LinkInline element which is probably a much better choice to use (as it also means you shouldn't need a custom renderer). However, I'm doing this example the "long way" so that when I move onto the more complex use cases I have for Markdig, I have a better understanding of the API.

[DebuggerDisplay("#{" + nameof(IssueNumber) + "}")]
public class MantisLink : LeafInline
{
  public StringSlice IssueNumber { get; set; }
}

String vs StringSlice

In the above class, I'm using the StringSlice struct offered by Markdig. You can use a normal string if you wish (or any other type for that matter), but StringSlice was specifically designed for Markdig to improve performance and reduce allocations. In fact, that's how I heard of Markdig to start with, when I read Alexandre's comprehensive blog post on the subject last year.

Creating the Renderer

With the two supporting classes out the way, I can now create the rendering component. Markdig renderer's take an element from the AST and spit out some content. Easy enough - we create a class, inherit HtmlObjectRenderer<T> (where T is the name of your AST class, e.g. MantisLink) and override the Write method. If you are using a configuration class, then creating a constructor to assign that is also a good idea.

public class MantisLinkRenderer : HtmlObjectRenderer<MantisLink>
{
  private MantisLinkOptions _options;

  public MantisLinkRenderer(MantisLinkOptions options)
  {
    _options = options;
  }

  protected override void Write(HtmlRenderer renderer, MantisLink obj)
  {
    StringSlice issueNumber;

    issueNumber = obj.IssueNumber;

    if (renderer.EnableHtmlForInline)
    {
      renderer.Write("<a href=\"").Write
          (_options.Url).Write("view.php?id=").Write(issueNumber).Write('"');

      if (_options.OpenInNewWindow)
      {
        renderer.Write(" target=\"blank\" rel=\"noopener noreferrer\"");
      }

      renderer.Write('>').Write('#').Write(issueNumber).Write("</a>");
    }
    else
    {
      renderer.Write('#').Write(obj.IssueNumber);
    }
  }
}

So how does this work? The Write method we're overriding supplies the HtmlRenderer to write to, and the MantisLink object to render.

First, we need to check if we should be rendering HTML by checking the EnableHtmlForInline property. If this is false, then we output the plain text, e.g., just the issue number and the # prefix.

If we are writing full HTML, then it's a matter of building a HTML a tag with the fully qualified URI generated from the base URI in the options object, and the AST node's issue number. We also add a target attribute if the options state that links should be in a new window. If we do add a target attribute, I'm also adding a rel attribute as per MDN guidelines.

Notice how the HtmlRenderer objects Write method happily accepts string, char or StringSlice arguments, meaning we can mix and match to suit our purposes.

Creating the Parser

With rendering out of the way, it's time for the most complex part of creating an extension - parsing it from a source document. For that, we need to inherit from InlineParser and overwrite the Match method, as well as set up the characters that would trigger the parse routine - that single # character in our example.

public class MantisLinkInlineParser : InlineParser
{
  private static readonly char[] _openingCharacters =
  {
    '#'
  };

  public MantisLinkInlineParser()
  {
    this.OpeningCharacters = _openingCharacters;
  }

  public override bool Match(InlineProcessor processor, ref StringSlice slice)
  {
    bool matchFound;
    char previous;

    matchFound = false;

    previous = slice.PeekCharExtra(-1);

    if (previous.IsWhiteSpaceOrZero() || previous == '(' || previous == '[')
    {
      char current;
      int start;
      int end;

      slice.NextChar();

      current = slice.CurrentChar;
      start = slice.Start;
      end = start;

      while (current.IsDigit())
      {
        end = slice.Start;
        current = slice.NextChar();
      }

      if (current.IsWhiteSpaceOrZero() || current == ')' || current == ']')
      {
        int inlineStart;

        inlineStart = processor.GetSourcePosition(slice.Start, out int line, out int column);

        processor.Inline = new MantisLink
                            {
                              Span =
                              {
                                Start = inlineStart,
                                End = inlineStart + (end - start) + 1
                              },
                              Line = line,
                              Column = column,
                              IssueNumber = new StringSlice(slice.Text, start, end)
                            };

        matchFound = true;
      }
    }

    return matchFound;
  }
}

In the constructor, we set the OpeningCharacters property to a character array. When Markdig is parsing content, if it comes across any of the characters in this array, it will automatically call your extension.

This neatly leads us onto the meat of this class - overriding the Match method. Here, we scan the source document and try to build up our node. If we're successful, we update the processor and let Markdig handle the rest.

We know the current character is going to be # as this is our only supported opener. However, we need to check the previous character to make sure that we try and process an distinct entity, and not a # character that happens to be in the middle of another string.

previous = slice.PeekCharExtra(-1);

if (previous.IsWhiteSpaceOrZero() || previous == '(' || previous == '[')

Here, I use an extension method exposed by Markdig to check if the previous character was either whitespace, or nothing at all, i.e., the start of the document. I'm also checking for ( or [ characters in case the issue number has been wrapped in brackets or square braces.

If we pass this check, then it's time to parse the issue number. First, we advance the character stream (to discard the # opener) and also initalize the values for creating a final StringSlice if we're successful.

slice.NextChar();

current = slice.CurrentChar;
start = slice.Start;
end = start;

As our GitHub/MantisBT issue numbers are just that, plain numbers, we simply keep advancing the stream until we run out of digits.

while (current.IsDigit())
{
  end = slice.Start;
  current = slice.NextChar();
}

As I'm going to work exclusively with the StringSlice struct, I'm only recording where the new slice will end. Even if you wanted to use a more traditional string, it probably makes sense to keep the above construct and then build your string at the end.

Once we've run out of digits, we now essentially do a reverse of the check we made at the start - now we want to see if the next character is white space, the end of the stream, or a closing bracket/brace.

if (current.IsWhiteSpaceOrZero() || current == ')' || current == ']')

I didn't add a check for this, but potentially you should also look for matching pair - so if a bracket was used at the start, a closing bracket should therefore be present at the end.

Assuming this final check passes, that means we have a valid #<number> sequence, and so we create a new MantisLink object with the IssueNumber property populated with a brand new string slice. We then assign this new object to the Inline property of the processor.

inlineStart = processor.GetSourcePosition(slice.Start, out int line, out int column);

processor.Inline = new MantisLink
                    {
                      Span =
                      {
                        Start = inlineStart,
                        End = inlineStart + (end - start)
                      },
                      Line = line,
                      Column = column,
                      IssueNumber = new StringSlice(slice.Text, start, end)
                    };

I'm not sure if the Line and Column properties are used directly by Markdig, or if they are only for debugging or advanced AST scenarios. I'm also uncertain what the purpose of setting the Span property is - even though I based this code on the code from the Markdig repository, it doesn't seem to quite match up should I print out its contents. This leaves me wondering if I'm setting the wrong values. So far, I haven't noticed any adverse effects though.

Creating the Extension

The first thing to set up is the core extension. Markdig extensions implement the IMarkdownExtension interface. This simple interface exposes two overloads of a Setup method for configuring the parsing and rendering aspect of the extension.

One of these overloads is for customising the pipeline - we'll add our parser here. The second overload is for setting up the renderer. Depending on the nature of your extension, you may only need one or the other.

As this class is responsible for creating any renders or parsers your extension needs, that also means it needs to have access to any required configuration classes to pass down.

public class MantisLinkerExtension : IMarkdownExtension
{
  private readonly MantisLinkOptions _options;

  public MantisLinkerExtension(MantisLinkOptions options)
  {
    _options = options;
  }

  public void Setup(MarkdownPipelineBuilder pipeline)
  {
    OrderedList<InlineParser> parsers;

    parsers = pipeline.InlineParsers;

    if (!parsers.Contains<MantisLinkInlineParser>())
    {
      parsers.Add(new MantisLinkInlineParser());
    }
  }

  public void Setup(MarkdownPipeline pipeline, IMarkdownRenderer renderer)
  {
    HtmlRenderer htmlRenderer;
    ObjectRendererCollection renderers;

    htmlRenderer = renderer as HtmlRenderer;
    renderers = htmlRenderer?.ObjectRenderers;

    if (renderers != null && !renderers.Contains<MantisLinkRenderer>())
    {
      renderers.Add(new MantisLinkRenderer(_options));
    }
  }
}

Firstly, I make sure the constructor accepts an argument of the MantisLinkOptions class to pass to the renderer.

In the Setup overload that configures the pipeline, I first check to make sure the MantisLinkInlineParser parser isn't already present; if not I add it.

In a very similar fashion, in the Setup overload that configures the renderer, I first check to see if a HtmlRenderer renderer was provided - after all, you could be using a custom renderer which wasn't HTML based. If I have got a HtmlRenderer renderer, then I do a similar check to make sure a MantisLinkRenderer instance isn't present, and if not, I create one using the provided options class and add it.

Adding an Initialisation Extension Method

Although you could register extensions by directly manipulating the Extensions property of a MarkdownPipelineBuilder, generally Markdig extensions include an extension method which performs the boilerplate code of checking and adding the extension. The extension below checks to see if the MantisLinkerExtension has been registered with a given pipeline, and if not, adds it with the specified options.

public static MarkdownPipelineBuilder UseMantisLinks
(this MarkdownPipelineBuilder pipeline, MantisLinkOptions options)
{
  OrderedList<IMarkdownExtension> extensions;

  extensions = pipeline.Extensions;

  if (!extensions.Contains<MantisLinkerExtension>())
  {
    extensions.Add(new MantisLinkerExtension(options));
  }

  return pipeline;
}

Using the Extension

MarkdownPipeline pipline;
string html;
string markdown;

markdown = "See issue #1";

pipline = new MarkdownPipelineBuilder()
  .Build();

html = Markdown.ToHtml(markdown, pipline); // <p>See issue #1</p>

pipline = new MarkdownPipelineBuilder()
  .UseMantisLinks(new MantisLinkOptions("https://issues.cyotek.com/"))
  .Build();

html = Markdown.ToHtml(markdown, pipline); // <p>See issue <a href="https://issues.cyotek.com/view.php?id=1" target="blank" rel="noopener noreferrer">#1</a></p>

Example of using an extension to automatically generate links for MantisBT issue numbers.

Wrapping Up

In this article, I showed how to introduce new inline elements parsed from markdown. This example at least was straightforward, however there is more that can be done. More advanced extensions such as pipeline tables have much more complex parsers that generate a complete AST of their own.

Markdig supports other ways to extend itself too. For example, the Auto Identifiers shown at the start of the article doesn't parse markdown but instead manipulates the AST even as it is being generated. The Emphasis Extra extension injects itself into another extension to add more functionality to that. There appears to be quite a few ways you can hook into the library in order to add your own custom functionality!

A complete sample project can be downloaded from the URL below or from the GitHub page for the project.

Although I wrote this example with Mantis Bug Tracker in mind, it wouldn't take very much effort at all to make it cover innumerable other websites.

History

05/08/2017 - First published on cyotek.com
06/08/2017 - Updated
07/08/2017 - Published on CodeProject

License

This article, along with any associated source code and files, is licensed under The MIT License