Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / regular-expression

The Regular Expression Skidmarklet

5.00/5 (2 votes)
12 May 2012CPOL7 min read 15.1K  
From Pins to Poops, your bookmark bar can do more. Read about the Skidmarklet... a JavaScript Bookmarklet and lessons in RegEx.

Introduction

Regular expressions are hard. Reading about them is confusing and boring at best. Until you need the functionality they provide it’s hard to understand why they exist.  You need to apply them to something you do everyday.

Following these lessons you’ll be taken through the assembly of a Skidmarklet... a JavaScript Bookmarklet that leverages Regular Expression matching and replacement to skidmark the crappy parts of any web page. Think about it this way, Regular expressions (or regex) are hard, everybody poops and we can all relate to that.

Click the button below to see the finished product in action or download the example files here.

Skidmark!

Background

Recent forays into fatherhood have revitalized my once rampant infatuation with poop. So let’s make my poop obsession your regular expression!

I developed the poop game when I was 17 working at Blockbuster Video.  Don’t worry, it happened in the store not the toilet. For fun between the postal monotony of shelving cassette tapes I swapped poop for each word in a movie title. Using The Green Mile as an example there were three potential poop game results:

  1. Poop Green Mile
  2. The Poop Mile
  3. The Green Poop 

One word at a time I’d just laugh at all the possible combinations and one word at a time customers would distance themselves from my creepy giggles as this all happened in my head.  Here poop made a crappy job a little less of a turd.  Regular Expressions have the power to smear joy and delight if only we can understand them.  So let’s get to know them better by playing the poop game on the internet. 

Lesson 1

Replace a list of words with another list of words. It iterates through an array of Regular Expressions using a special character, called a word boundary, to make sure we are only getting whole words. Then we can make sure it is replacing only the word "go" and not the letters "go" in the middle of the word "engorge."

1. Encapsulate

We have to make sure we are only leaving a skidmark and not clogging up the toilet. In JavaScript terms it’s best to avoid collisions and as a general guideline this means encapsulating the main entry point within an anonymous function call. All the main functionality is squeezed out of a main method named “skidmark.”

JavaScript
(function() {
        !function skidmark(){
                //Do some crap.
        }();
})(); 

2. Define some variables:

JavaScript
/* * CONSTANTS */
var POOP = "Poop";
var PATTERNS_TO_GO = [/\bgo\b/g,/\bgoing\b/g,/\bwent\b/g];
var REPLACEMENTS_TO_POOP = [POOP, "Pooping", "Pooped"];

var P_TAGS = ["h2", "h3", "h4", "h5", "h6", "p"];

POOP is a string for the actual term “Poop” defined as a sudo-constant because even though Poop is definitely a constant of life JavaScript has no rules.

PATTERNS_TO_GO is an array of Regular Expression Patterns, each similarly structured. ‘/\bgo\b/g’. We first find a word boundary, ‘\b’, then the characters ‘go’, then another word boundary, ‘\b’. The global flag, ‘g’, ensures that we get every instance of the match in a string.

REPLACEMENTS_TO_POOP contains corresponding terms for each element of the PATTERNS_TO_GO array. This is the “replacement” for each “match” that the regular expressions will need.

P_TAGS contains a list of element selectors representing non level 1 headings. Like a dog these are elements we want to mark.

3. Simplify DOM Selection

The method pickOutUnderwearByTag will return DOM Elements in which to leave our mark. Just like potty training we learn to go on our own, without jQuery.

JavaScript
function pickOutUnderwearByTag(tags) {

        var underwearSelectors = tags;
        var underwearEls = [];

        for(var i in underwearSelectors){
                var els = Array.prototype.slice.call(document.getElementsByTagName(underwearSelectors[i]));
                underwearEls = underwearEls.concat(els);
        }

        return underwearEls;
}

4. Put it all together:

poopIfYouHaveToGo loops through each entry in the PATTERNS_TO_GO array to find a match somewhere in the underwear elements. Each is replaced with the corresponding options from REPLACEMENTS_TO_MATCH.

JavaScript
function poopIfYouHaveToGo(){

        //find all paragraph elements
        var sourceEls = pickOutUnderwearByTag(P_TAGS);

        for(var i = 0; i
                var sourceEl = sourceEls[i];
                var searchStr = sourceEl.innerHTML;

                //identify matches of each form of to go
                for(var j = 0; j< PATTERNS_TO_GO.length; j++){

                        var toGo = PATTERNS_TO_GO[j];
                        var toPoop = REPLACEMENTS_TO_POOP[j];

                        searchStr = searchStr.replace(
                                toGo,
                                (searchStr.match(/^[A-Z]/)) ?
                                        toPoop :
                                        toPoop.toLowerCase()
                        );

                }

                sourceEl.innerHTML = searchStr;

        }
}

5. Run it

Our first skidmark task is ready to go.

JavaScript
!function skidmark(){
        poopIfYouHaveToGo();
}();

Example 1 - And that’s how we leave skidmarks on the page.

Lesson 2

In lesson 2, we are expanding on replacing several words within the text. Here, instead of using a list of regular expressions to find and replace individual words we are using the regex “or” operator, “|”.

1. Define some variables

Okay, it’s only one variable a regex named POOPY_TERMS matching any of three words (loaf, duty and business).

JavaScript
var POOPY_TERMS = /\b(loaf|duty|business)\b/g;

2. Define the method

Running the method poopWhereYouSeeIt runs over the same underwear elements replacing any near turd POOPY_TERM very literally with POOP.

Java
function poopWhereYouSeeIt() {
        var sourceEls = pickOutUnderwearByTag(P_TAGS);
        for (var i = 0; i< sourceEls.length; i++) {
                sourceEls[i].innerHTML = sourceEls[i].innerHTML.replace(
                        POOPY_TERMS,
                        POOP
                );
        }
}

3. Run it

I call them like I see’em and so too skidmark must poopWhereYouSeeIt.

JavaScript
!function skidmark(){
        poopWhereYouSeeIt();
}();

Example 2 - And that’s how we leave skidmarks on the page.

Lesson 3

Through our shared human experience craps have found a name.  The corn poop, the butt pee and rabbit dropping are a few that come to mind. Poop like the poop game is all about titles so let’s concentrate our regex sphincter on squeezing out some work in the title areas of a web page.  To name this page poo we are doing something a bit more, well, dangerous.

Before we knew what words we were replacing.  Now we’ll realize the essence of the poop game by replacing one randomly-picked word in the page title <title> and the main headline element <h1>.  In the previous lessons, if the page did not contain any of our pre-picked words, the page would still not have any poop on it. Now, however, the page can’t escape and we know it will wind up smeared somehow.

1. Define Some Variables

JavaScript
var TEST_CASE = /^[A-Z]/;
var POOP_BOUNDARY = /\b(\S+)\b/g;
var NAME_TAGS = ["title", "h1"];

TEST_CASE is a simple expression that tests a string that begins with any character in uppercase alphabet.

POOP_BOUNDARY matches what is considered a "word." In this instance, one or more consecutive non-whitespace character between word boundaries.

NAME_TAGS is an array of the tags we are matching against. The spot in your underwear you might write you name, so as not to lose it.

2. Separate Re-usable Utilities

If you’ve read Clean Code I don’t care about your opinion. It is mine that code is more readable and re-usable if each method has a single purpose. The following methods encapsulate the individual tasks that together accomplish our goal.

JavaScript
function insertPoopHere(str) {
        var word = randomWord(str);
        return poopInCase(str, word);
}

function poopInCase(str, word) {
        return str.replace(
                new RegExp('\\b(' + word + ')+\\b'),
                word.match(TEST_CASE) ? POOP : POOP.toLowerCase()
        );
}

function randomWord(str){
        var arr = str.match(POOP_BOUNDARY);
        return arr[Math.floor((Math.random()*arr.length))];
}

randomWord - Picks a random word, replaces it with "poop".
poopInCase - Case-sensitive replacement of a word with "poop"
insertPoopHere - Selects a random element within an array of values

3. Define the method 

function nameYourPoop() { var sourceEls = pickOutUnderwearByTag(NAME_TAGS); for (var i = 0; i< sourceEls.length; i++) { sourceEls[i].innerHTML = insertPoopHere(sourceEls[i].innerHTML); } }

nameYourPoop grabs all the words from within the underwear name tags, picking a random one to replace with “poop”. We use two regexes to do this, POOP_BOUNDARY to grab all the string matches that qualify as words and POOP as the substitution. Once we have an array or all the words individually, we pick a random one and inject that word into a new JavaScript RegExp object. Here again, the regex pattern sandwiches the word itself between word boundaries so that, if the word is “with”, we won’t also change the middle of the word “wherewithall”.

4. Run it

JavaScript
!function skidmark(){
      nameYourPoop();
}();

Example 3 - And that’s how we leave skidmarks on the page.

Lesson 4

Lesson 3 gave us the insertPoopHere method as applied in a narrow manner to just page title elements. In this final lesson we will take that and make it a blow out; we are going to replace a word in every sentence of every paragraph through out the page. In order to do so we need to first identify a sentence and then stain it.

1. Define a variable

JavaScript
var POOP_SENTENCES = /(\S.+?[.!?])(?=\s+|$)/g;

POOP_SENTENCES is a regex pattern to match the structure of a sentence. In our language, which is called, I believe, English, sentences are predictable: they usually start with a “\S” non-whitespace character with a bunch of other characters and words ending in a period, exclamation point or question mark followed by whitespace or linebreaks. Funny enough, at least a couple of those are operators in regular expressions but inside a character set they are used literally. That is why [.!?] does not require an “/” escape character for each punctuation mark.

2. Define the method

JavaScript
/**
* insert poop where you find p's
*/
function poopAndP(){
     var sourceEls = pickOutUnderwearByTag(P_TAGS);
     for (var i = 0; i< sourceEls.length; i++) {
          var text = sourceEls[i].innerHTML;
          var sentences = text.match(POOP_SENTENCES);
          if(!sentences)
          continue;
          for(var j = 0; j<sentences.length; j++)
          var sentence = sentences[j];
          var poopySentence = insertPoopHere(sentence);
           text = text.replace(sentence, poopySentence);
          }
     }
     sourceEls[i].innerHTML = text;
}

Once we have each sentence singled out, we can then pick a random word, just like we did in Lesson 3, and replace it with a poop. poopAndP does just this within the elements of our existing P_TAGS underwear.

3. Run it

JavaScript
/**
 * Smear poop on some underwear
 */
!function skidmark(){
     poopIfYouHaveToGo();
     poopWhereYouSeeIt();
     nameYourPoop();
     poopAndP();
}();

Example 4- And that’s how we leave skidmarks on the page.

Conclusion

As with any successful pooping endeavor, let's finish things off by lighting a match and possibly igniting a conversation. The goal is to have a bit of fun with the tutorial and leave a mark on you, so say what you think! Leave a comment or flushit  back down the intertubes, it's your call. Hopefully this was a somewhat educational diversion from the possibly dull results a typical tutorial search turns up. 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)