I'm currently working on a project which will be a developer/tech community. I am building out a framework for users to post tutorials and general blog-type postings, and decided to use Markdown syntax on all user-created content.
If you don't know what Markdown is, it is a simple and efficient syntax for marking up text to HTML. It has become popular throughout the developer community, largely due to it's use on StackOverflow and GitHub.
Since the specification of Markdown was sort of defined in an ad-hoc manner by John Gruber, there have been several subtly different implementations of it. Jeff Atwood is proposing to standardize it, but whether or not that will happen any time soon is unknown.
Anyway, going against the very direction that Jeff Atwood is calling for above (sorry Jeff), I realized that a common desire of mine is to embed a jsFiddle into a post. Such a feature has been discussed for use in StackOverflow multiple times, and has had a fairly positive reception (first post got 67 upvotes...)
If you don't know what jsFiddle is, and you are a web developer, you are missing out. jsFiddle is a purely awesome service that makes communicating with code snippets significantly easier.
With Fully-capable markdown, embedding fiddles is somewhat simple, since full HTML syntax is allowed. However, on sites like StackOverflow and GitHub, it would be unwise to allow full HTML support. This would technically allow users to embed whatever nasty javascript they wanted onto a certain page, and because of this concern, one usually decides to whitelist only certain HTML elements so that a user cannot post any malicious content.
Of course, it seems clear that <iframe>
HTML elements should not be included in the whitelisted group of elements. I sought out, however, to include a simple syntax for including embedded jsFiddle iframes into the markdown specification to make writing about javascript, HTML, or CSS that much easier (not to mention interactive).
Convention
The simplest convention I could come up with was the following:
$[http:
$[http:
In addition, following similar convention to links and images, referencing the link by id will also be valid:
$[1]
$[1][result,js,css]
[1]: http:
This would result in the following iframes being rendered as the following:
<iframe src="http://jsfiddle.net/uzMPU/embedded/" allowfullscreen="allowfullscreen" frameborder="0"></iframe>
<iframe src="http://jsfiddle.net/uzMPU/embedded/result,js,css/" allowfullscreen="allowfullscreen" frameborder="0"></iframe>
(In reality, I have modified this slightly, and included a style tag: width: 100%; height: 300px;
by default)
In addition to the $[]
syntax, I have made it a requirement that the $[]
be entirely on it's own line. As a result, writing the following:
This is an inline $[http:
will not result in an iframe being rendered.
Implementation
Since I am working in .Net, I am using MarkdownSharp to process the markdown into HTML, and also using the same Sanitizer that (I think) is used on StackOverflow. Uncommon to most parsers, MarkdownSharp is implemented largely using Regular Expressions. I tried to follow similar conventions as were being used in the MarkdownSharp project, but I simply added the following code into the Markdown.cs
file:
private static Regex _fiddlesRef = new Regex(@"
^ # must be at start of line
( # wrap whole match in $1
\$\[
([^\]]*?) # id = $2, match anything except for endbracket
\]
(
\[
([a-zA-Z,]*) # tab options in $4
\]
)? # tab options are optional
)$ # must be entire line
", RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline | RegexOptions.Compiled);
private string DoFiddles(string text)
{
return _fiddlesRef.Replace(text, new MatchEvaluator(FiddleEvaluator));
}
private string FiddleEvaluator(Match match)
{
string idOrLink = match.Groups[2].Value;
string tabOptions = match.Groups[4].Length > 0 ? match.Groups[4].Value : null;
string linkID = idOrLink.ToLowerInvariant();
string url = _urls.ContainsKey(linkID) ? _urls[linkID] : idOrLink;
url = EncodeProblemUrlChars(url);
url = EscapeBoldItalic(url);
return CreateFiddle(url, tabOptions);
}
private static string CreateFiddle(string url, string tabOptions = null)
{
url = CreateFiddleUrl(url, tabOptions);
if (string.IsNullOrEmpty(url))
{
return "";
}
return String.Format(
@"<iframe style=""width: 100%; height: 300px"" src=""{0}"" allowfullscreen=""allowfullscreen"" frameborder=""0""></iframe>",
url
);
}
I wanted it to be simple to add a fiddle, and one way of making it simple is to make it so that I could be on any fiddle page in my browser, and simply Copy-and-paste the current browser url into a bracketed $[]
and have it work. In order to properly embed the fiddle, it is required that the url have the /embedded/
flag at the end of the url. In addition to wanting this to handle a default fiddle URL without the /embedded/
flag, I thought it would be good to make sure the URL was a proper fiddle URL, and not some malicious URL which could produce an unsafe iframe. This magic currently happens in the CreateFiddleUrl(url, tabOptions);
method.
public static string CreateFiddleUrl(string originalUrl, string tabOptions = null)
{
var match = _fiddlesUrl.Match(originalUrl);
if (!match.Success)
{
return "";
}
return string.Format(
@"http://jsfiddle.net/{0}/{1}/{2}{3}{4}{5}",
match.Groups[5].Value,
match.Groups[6].Value,
match.Groups[7].Length > 0 ? match.Groups[7].Value + "/" : "",
"embedded/",
tabOptions != null ?
tabOptions + "/" :
(match.Groups[9].Length > 0 ?
match.Groups[9].Value + "/" :
""),
match.Groups[10].Value
);
}
private static Regex _fiddlesUrl = new Regex(@"
^ # url starts string
( # wrap whole match in $1
(https?://)? # optional http:// or https:// in $2
(www\.)? # allows www. even though jsfiddle doesn't use it in $3
(jsfiddle\.net) # required jsfiddle.net in $4
/([a-zA-Z0-9_]*) # required username of fiddle creator in $5
/([a-zA-Z0-9]*) # required fiddle ID in $6
/([0-9]*)? # optional fiddle version number in $7
/?(embedded)? # embedded option - not required. will add manually if not present in $8
/?([a-zA-Z,]*)? # fiddle tabs options csv list of tabs in $9
/?(.*) # other options are possible, put them all in $10
)
", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled);
Thus, the following urls are all valid:
Markdown.CreateFiddleUrl("http://jsfiddle.net/username/AbCdEf/")
Markdown.CreateFiddleUrl("http://jsfiddle.net/AbCdEf/")
Markdown.CreateFiddleUrl("jsfiddle.net/AbCdEf/result,js/")
Markdown.CreateFiddleUrl("http://jsfiddle.net/AbCdEf/embedded/")
Markdown.CreateFiddleUrl("http://jsfiddle.net/AbCdEf/", "result,js")
Now I simply hook this into the main RunBlockGamut
method:
private string RunBlockGamut(string text, bool unhash = true)
{
text = DoHeaders(text);
text = DoHorizontalRules(text);
text = DoLists(text);
text = DoCodeBlocks(text);
text = DoBlockQuotes(text);
text = DoFiddles(text);
text = HashHTMLBlocks(text);
text = FormParagraphs(text, unhash: unhash);
return text;
}
and all is well!
Thus, the result is - by me writing the following in markdown on this current post:
$[http:
results in the following fiddle being embedded like so:
<iframe style="width: 100%; height: 300px" src="http://jsfiddle.net/nGE9q/embedded/result,html,css/" allowfullscreen="allowfullscreen" frameborder="0"></iframe>
Sanitization
If one is interested in including this syntax into the markdown spec, but still ensuring that safe HTML is always produced, one can use the following Sanitization code (borrowed and extended from here):
private static Regex _tags = new Regex("<[^>]*(>|$)",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled);
private static Regex _whitelist = new Regex(@"
^</?(b(lockquote)?|code|d(d|t|l|el)|em|h(1|2|3)|i|kbd|li|ol|p(re)?|s(ub|up|trong|trike)?|ul)>$|
^<(b|h)r\s?/?>$",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
private static Regex _whitelist_a = new Regex(@"
^<a\s
href=""(\#\d+|(https?|ftp)://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+)""
(\stitle=""[^""<>]+"")?\s?>$|
^</a>$",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
private static Regex _whitelist_img = new Regex(@"
^<img\s
src=""https?://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+""
(\swidth=""\d{1,3}"")?
(\sheight=""\d{1,3}"")?
(\salt=""[^""<>]*"")?
(\stitle=""[^""<>]*"")?
\s?/?>$",
RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
private static Regex _whitelist_fiddle = new Regex(@"
^<iframe
(\sstyle=""width:\s?\d{1,3}(%|px);\s?height:\s?\d{1,3}(%|px)"")?
(\ssrc=""http://jsfiddle.net/[a-zA-Z0-9,_/]*"")
(\sallowfullscreen=""allowfullscreen"")?
(\sframeborder=""\d"")?
>$|^</iframe>$
",
RegexOptions.Multiline | RegexOptions.ExplicitCapture | RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
public static string Sanitize(string html)
{
if (String.IsNullOrEmpty(html)) return html;
string tagname;
Match tag;
MatchCollection tags = _tags.Matches(html);
for (int i = tags.Count - 1; i > -1; i--)
{
tag = tags[i];
tagname = tag.Value.ToLowerInvariant();
if (!(_whitelist.IsMatch(tagname) || _whitelist_a.IsMatch(tagname) || _whitelist_img.IsMatch(tagname) || _whitelist_fiddle.IsMatch(tagname)))
{
html = html.Remove(tag.Index, tag.Length);
System.Diagnostics.Debug.WriteLine("tag sanitized: " + tagname);
}
}
return html;
}
Synopsis
At the end of the day - I want to be clear: I am not necessarily proposing that this become a markdown standard. Similar to some additions/deviations from the current "standard" that sites like GitHub have taken, I saw a chance to significantly improve the convenience of an often (or what should be often) task in writing developer-focused content, and I took it. If other people have similar ideas, then perhaps they will read this first and choose to implement it in the same way.
The biggest issue with doing something like this is the markdown syntax becomes dependent on 3rd party services (ie, jsFiddle.net), which is generally not good. At this point, the specification becomes much less portable.
What this does do, however, is makes it a lot easier for a developer to write interactive (and thus, hopefully better) tutorials by including executable and/or editable code snippets. To me, this is worth it.
I would love to hear anyone's opinion on ways to make this particular convention better, or even other things to change about the markdown spec which could help improve it's convenience to the developer in this context.