Coding challenge: bad word filter

Question

4.90/5 (9 votes)

See more:

This is the first of our weekly coding challenges. Once a week we'll post a simple programming problem and the person who posts the best answer wins a T-shirt. The winner is decided by votes, or be comments, or be the audacity of their answer. Use any language in your answer except profanity.

Imagine that you allow users of your website to post comments, but some users, mainly the Australians, get a little colourful with their language. You decided to implement a Bad Word Filter that will replace certain words in a sentence with safer versions of that word.

The list of bad words and their replacements is

Bad: "poophead", "PHB", "gotten"
Replacement: "p**phead", "boss", "become"

So "My PHB is such a poophead. It's gotten worse since his promotion" should be "My boss is such a p**phead. It's become worse since his promotion".

We also have to allow shouting. So

"My PHB is such a POOPHEAD!" should become "My boss is such a P**PHEAD!"

Bonus points:

Let's make it harder. If the "bad" word starts with "*" then it means any word that ends with that word. If it ends with a star then any word starting with that. If it ends with an "!" then it means that it should do the match case sensitive.

Bad words: "poop*", "PHB!", "gotten"
Replacement: "p**p", "boss", "become"

"My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!"

should become

"My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!"

What I have tried:

Remember: any programming language can be used.

Posted 25-Nov-16 3:23am

Chris Maunder775.6K

Updated 27-Jul-18 8:04am

v2

Add a Solution

Comments

Suvendu Shekhar Giri 25-Nov-16 10:33am

That's really an awesome initiative. Will come up with a solution soon. Thanks Chris :)

PIEBALDconsult 25-Nov-16 10:36am

"My PHB is such a POOPHEAD!" should become "My ~~PHB~~ boss is such a P**PHEAD!"
Right?

Chris Maunder 25-Nov-16 11:19am

Correct, and thanks for pointing that out. Updated

Jochen Arndt 25-Nov-16 11:03am

Deadline for posting answers?

Chris Maunder 25-Nov-16 11:20am

End of the day? Monday? We're pretty relaxed about it, but there's a limit to our attention span.

PIEBALDconsult 25-Nov-16 11:15am

Sooo... if I have a local variable to point to the The Word and I name it tw, that's OK, but I can't have another local variable pointing to The Word After That named twat?

Chris Maunder 25-Nov-16 11:21am

I'm ignoring you.

PIEBALDconsult 25-Nov-16 11:26am

Pffft. It's not like I asked about the First Unresolved Character Kernel.

Chris Maunder 25-Nov-16 12:02pm

If I could upvote comments I would.

Dave Kreskowiak 26-Nov-16 0:56am

Is adding that going to be the next coding "challenge"?

PIEBALDconsult 25-Nov-16 12:09pm

"For Christmas I had gotten a sugar plum fairy."

Chris Maunder 25-Nov-16 12:39pm

You see how horrible and unnecessary that word is? We're doing the world a favour here.

PIEBALDconsult 25-Nov-16 12:56pm

I'm sure he just hasn't gotten around to correcting his bad habit.

Aless Alessio 25-Nov-16 12:10pm

Not clear the distinction of behavior when '!' belongs to a word in the dictionary of the not allowed words (ex.: "PHB!") and when it occurs in the comment (ex.: ".. . POOPHEAD!")

Thanks

PIEBALDconsult 25-Nov-16 13:38pm

I call bullpoop.

Jörgen Andersson 25-Nov-16 14:33pm

What about handling perfectly legal words such as Nincompoop?

PIEBALDconsult 25-Nov-16 19:49pm

Did someone say "poop"?

PIEBALDconsult 26-Nov-16 15:09pm

Maybe you should have made it another "Code Lean and Mean" contest.

Jon McKee 26-Nov-16 19:12pm

Can't wait for another one. I love this sort of stuff =D

Graeme_Grant 28-Nov-16 10:32am

That was interesting... Thanks! Looking forward to the next one! :)

thewazz 28-Nov-16 15:23pm

2 typos in the 1st para [be -> by].

thewazz 28-Nov-16 15:23pm

2 typos in the 1st para [be -> by].

Robert g Blair 30-Nov-16 22:33pm

I enjoyed that. I haven't coded in COBOL for 30 years.

I will do the next one in UCSD Pascal, or RPG III, or maybe FORTRAN IV.
(depending on the problem).

Robert g Blair 1-Dec-16 19:42pm

Chris:

I believe it is appropriate to give an award for "Least Lines of Code".

A very objective metric, and serves the higher purpose of combating code-bloat.

17 solutions

Solution 1

Yay! A chance to use Regular Expressions without summoning the elder gods! :)

Start with a structure to convert a string to a bad word, taking the "bonus points" rules into account:

C#

public struct BadWord
{
    public BadWord(string word)
    {
        if (string.IsNullOrWhiteSpace(word)) throw new ArgumentNullException(nameof(word));
        
        int startIndex = 0;
        int length = word.Length;
        
        // Skip leading / trailing white-space:
        while (length > 0 && char.IsWhiteSpace(word[startIndex]))
        {
            startIndex++;
            length--;
        }
        while (length > 0 && char.IsWhiteSpace(word[startIndex + length - 1]))
        {
            length--;
        }
        
        // If the word ends with "!", then it's a case-sensitive match:
        if (length > 0 && word[startIndex + length - 1] == '!')
        {
            CaseSensitive = true;
            length--;
        }
        else
        {
            CaseSensitive = false;
        }
        
        // If the word ends with "*", filter anything starting with the word:
        if (length > 0 && word[startIndex + length - 1] == '*')
        {
            Suffix = "(?=\\w*\\b)";
            length--;
        }
        else
        {
            Suffix = "\\b";
        }
        
        // If the word starts with "*", filter anything ending with the word:
        if (length > 0 && word[startIndex] == '*')
        {
            Prefix = "(?<=\\b\\w*)";
            startIndex++;
            length--;
        }
        else
        {
            Prefix = "\\b";
        }
        
        Word = length != 0 ? word.Substring(startIndex, length) : null;
    }
    
    public string Word { get; }
    public string Prefix { get; }
    public string Suffix { get; }
    public bool CaseSensitive { get; }
    
    public Regex ToRegularExpression()
    {
        if (string.IsNullOrWhiteSpace(Word)) return null;
        
        string pattern = Prefix + Regex.Escape(Word) + Suffix;
        var options = CaseSensitive ? RegexOptions.ExplicitCapture : RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase;
        return new Regex(pattern, options);
    }
}

Then a class to represent a single bad word and its replacement:
EDIT: Now with the part of the spec the "customer" forgot to mention! :)

C#

public sealed class WordReplacement
{
    public WordReplacement(BadWord word, string replacement)
    {
        if (string.IsNullOrWhiteSpace(word.Word)) throw new ArgumentNullException(nameof(word));
        
        Pattern = word.ToRegularExpression();
        CaseSensitive = word.CaseSensitive;
        Replacement = replacement;
        
        if (CaseSensitive || replacement == null || replacement.Any(char.IsUpper))
        {
            Replacer = (Match m) => Replacement;
        }
        else
        {
            Replacer = (Match m) => MatchCase(m.Value, Replacement);
        }
    }
    
    public WordReplacement(string word, string replacement) : this(new BadWord(word), replacement)
    {
    }
    
    public Regex Pattern { get; }
    public string Replacement { get; }
    public bool CaseSensitive { get; }
    public MatchEvaluator Replacer { get; }
    
    public static string MatchCase(string wordToReplace, string replacement)
    {
        if (null == replacement) return string.Empty;
        if (wordToReplace.All(char.IsLower)) return replacement;
        if (wordToReplace.All(char.IsUpper)) return replacement.ToUpperInvariant();
        
        char[] result = replacement.ToCharArray();
        bool changed = false;
        
        if (wordToReplace.Length == replacement.Length)
        {
            for (int index = 0; index < result.Length; index++)
            {
                if (char.IsUpper(wordToReplace[index]))
                {
                    char c = result[index];
                    result[index] = char.ToUpperInvariant(c);
                    if (result[index] != c) changed = true;
                }
            }
        }
        else
        {
            if (char.IsUpper(wordToReplace[0]))
            {
                char c = result[0];
                result[0] = char.ToUpperInvariant(c);
                if (result[0] != c) changed = true;
            }
            if (char.IsUpper(wordToReplace[wordToReplace.Length - 1]))
            {
                int index = result.Length - 1;
                char c = result[index];
                result[index] = char.ToUpperInvariant(c);
                if (result[index] != c) changed = true;
            }
        }
        
        return changed ? new string(result) : replacement;
    }
    
    public string Replace(string input) => Pattern.Replace(input, Replacer);
}

And finally, a class to represent a list of bad word replacements:

C#

public sealed class Clbuttifier2000
{
    public Clbuttifier2000(IEnumerable<KeyValuePair<string, string>> replacements)
    {
        Replacements = replacements.Select(p => new WordReplacement(p.Key, p.Value)).ToList().AsReadOnly();
    }
    
    public IReadOnlyList<WordReplacement> Replacements { get; }
    
    public string Clbuttify(string message)
    {
        if (!string.IsNullOrWhiteSpace(message))
        {
            foreach (var replacement in Replacements)
            {
                message = replacement.Replace(message);
            }
        }
        
        return message;
    }
}

Sample usage:

C#

var filter = new Clbuttifier2000(new Dictionary<string, string>
{
    ["poop*"] = "p**p",
    ["PHB!"] = "boss",
    ["gotten"] = "become",
});

string input = "My PHB has gotten started on his new blog phblog.com. He's SUCH A MISBEGOTTEN Poophead!";
string expected = "My boss has become started on his new blog phblog.com. He's SUCH A MISBEGOTTEN P**phead!";
string actual = filter.Clbuttify(input);
Debug.Assert(actual == expected);

Posted 25-Nov-16 5:48am

Richard Deeming

Updated 25-Nov-16 6:32am

v3

Comments

Chris Maunder 25-Nov-16 12:00pm

Nicely done. Lots of string allocations though. Now: does it do "Poophead" -> "P**phead"? This wasn't in the specs, but I don't think it does.

Richard Deeming 25-Nov-16 12:02pm

Bl**dy customers! You follow the specs, and then they complain that it doesn't do something they forgot to mention in the specs, that's suddenly the most important part! :)

Richard Deeming 25-Nov-16 12:36pm

Updated to account for the missing requirement. :)

Still lots of string allocations in the Clbuttify loop, so an answer that doesn't use regular expressions might be better. But they should all be gen-0, so you might get away with it.

Chris Maunder 25-Nov-16 12:05pm

You should try writing surveys one day ;)

Your answer solves the problem. The point is to make the question a little loose to allow invention, improvement, and most importantly, pointless religious wars.

Nelek 25-Nov-16 16:56pm

allowing pointless religion wars outside the soapbox? That's going to create a precedent... :rolleyes:

Solution 8

Actually, stating that this challenge is easy is misleading. Handling text is never easy, if you want to work with Unicode. Here is a my take on it.

C#

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;

class WordFilter
{
    private static Dictionary<string, string> badWords = null;

    private static List<KeyValuePair<string, string>> badWordsStart = new List<KeyValuePair<string, string>>();
    private static List<KeyValuePair<string, string>> badWordsEnd = new List<KeyValuePair<string, string>>();
    private static List<KeyValuePair<string, string>> badWordsMiddle = new List<KeyValuePair<string, string>>();
    private static Dictionary<string, string> badWordsCaseSensitive = new Dictionary<string, string>();
    private static Dictionary<string, string> badWordsCaseInsensitive = new Dictionary<string, string>();
        
    private static void Init()
    {
        if (badWords != null)
        {
            return;
        }

        badWords = new Dictionary<string, string>()
        {
            {"poop*", "p**p"},
            {"PHB!", "boss"},
            {"gotten", "become"},
            {"*crap*", "taco supreme"}
        };

        foreach (var item in badWords)
        {
            bool startWord = item.Key.EndsWith("*");
            bool endWord = item.Key.StartsWith("*");
            bool caseSensitive = item.Key.EndsWith("!");

            if (startWord && endWord)
            {
                badWordsMiddle.Add(new KeyValuePair<string, string>(item.Key.Trim('*').ToLower(CultureInfo.InvariantCulture), item.Value));
            }
            else if (startWord)
            {
                badWordsStart.Add(new KeyValuePair<string, string>(item.Key.TrimEnd('*').ToLower(CultureInfo.InvariantCulture), item.Value));
            }
            else if (endWord)
            {
                badWordsEnd.Add(new KeyValuePair<string, string>(item.Key.TrimStart('*').ToLower(CultureInfo.InvariantCulture), item.Value));
            }
            else if (caseSensitive)
            {
                badWordsCaseSensitive.Add(item.Key.TrimEnd('!'), item.Value);
            }
            else
            {
                badWordsCaseInsensitive.Add(item.Key.ToLower(CultureInfo.InvariantCulture), item.Value);
            }
        }
    }

    public WordFilter()
    {
        Init();
    }

    public string Filter(string s)
    {
        var word = new StringBuilder();
        var sout = new StringBuilder();

        foreach (var c in s.GetUTF32Chars())
        {
            if (c.IsLetter)
            {
                word.Append(c);
            }
            else 
            {
                if (word.Length > 0)
                {
                    var niceWord = Replace(word.ToString());
                    word.Clear();
                    sout.Append(niceWord);
                }
                sout.Append(c);
            }
        }
        return sout.ToString();
    }

    private string Replace(string word)
    {
        string newWord;
        if (badWordsCaseSensitive.TryGetValue(word, out newWord)) return newWord;

        var lword = word.ToLower(CultureInfo.InvariantCulture);
        if (badWordsCaseInsensitive.TryGetValue(word, out newWord)) return newWord;

        newWord = badWordsStart.Where(it => word.StartsWith(it.Key)).Select(it => word.Replace(it.Key, it.Value)).FirstOrDefault();
        if (newWord != null) return newWord;

        newWord = badWordsEnd.Where(it => word.EndsWith(it.Key)).Select(it => word.Replace(it.Key, it.Value)).FirstOrDefault();
        if (newWord != null) return newWord;

        newWord = badWordsMiddle.Where(it => word.Contains(it.Key)).Select(it => word.Replace(it.Key, it.Value)).FirstOrDefault();
        if (newWord != null) return newWord;

        return word;
    }
}

public static class StringExtensions
{
    public static System.Collections.Generic.IEnumerable<UTF32Char> GetUTF32Chars(this string s)
    {
        var tee = System.Globalization.StringInfo.GetTextElementEnumerator(s);

        while (tee.MoveNext())
        {
            yield return new UTF32Char(s, tee.ElementIndex);
        }
    }
}

public struct UTF32Char
{
    private string s;
    private int index;

    public UTF32Char(string s, int index)
    {
        this.s = s;
        this.index = index;
    }

    public override string ToString()
    {
        return char.ConvertFromUtf32(this.UTF32Code);
    }

    public int UTF32Code {  get { return char.ConvertToUtf32(s, index); } }
    public double NumericValue { get { return char.GetNumericValue(s, index); } }
    public UnicodeCategory UnicodeCategory { get { return char.GetUnicodeCategory(s, index); } } 
    public bool IsControl { get { return char.IsControl(s, index); } }
    public bool IsDigit { get { return char.IsDigit(s, index); } }
    public bool IsLetter { get { return char.IsLetter(s, index); } }
    public bool IsLetterOrDigit { get { return char.IsLetterOrDigit(s, index); } }
    public bool IsLower { get { return char.IsLower(s, index); } }
    public bool IsNumber { get { return char.IsNumber(s, index); } }
    public bool IsPunctuation { get { return char.IsPunctuation(s, index); } }
    public bool IsSeparator { get { return char.IsSeparator(s, index); } }
    public bool IsSurrogatePair { get { return char.IsSurrogatePair(s, index); } }
    public bool IsSymbol { get { return char.IsSymbol(s, index); } }
    public bool IsUpper { get { return char.IsUpper(s, index); } }
    public bool IsWhiteSpace { get { return char.IsWhiteSpace(s, index); } }
}

class Program
{
    static void Main(string[] args)
    {
        var wf = new WordFilter();

        var rawSentence = "My PHB is such a poophead. It's gotten worse since his promotion.  In fact you might call him a supercraphead!";
        var niceSentence = wf.Filter(rawSentence);
        Console.WriteLine(rawSentence);
        Console.WriteLine(niceSentence);
    }
}

Posted 28-Nov-16 0:57am

Andrei Bozantan

Comments

PIEBALDconsult 28-Nov-16 19:14pm

How does that handle p00p and cr@p?

Solution 9

Not sure if my Excel VBA solution posted, so I'm trying again

VB.NET

Public Function noBad(ByVal badstr As String) As String
    Dim badWords As Variant, goodWords As Variant
    Dim i As Integer, bw As String
    badWords = Array("poop*", "PHB", "gotten", "POOP*")
    goodWords = Array("p**p", "boss", "become", "P**P")
    For i = 0 To UBound(badWords)
        bw = badWords(i)
        With CreateObject("VBScript.RegExp")
            .Pattern = "\b" & bw & IIf(InStr(bw, "*"), "", "\b")
            .Global = True
            badstr = .Replace(badstr, goodWords(i))
        End With
    Next i
    noBad = badstr
End Function

Worth a go, I thought. Okay so the case thing isn't so good, but I only get a short lunch

Posted 28-Nov-16 2:22am

DarrenWheatley

Comments

Kornfeld Eliyahu Peter 28-Nov-16 8:46am

It has to be really boring there to go for VBA :-)

DarrenWheatley 29-Nov-16 3:27am

Why the downer on VBA? To be fair, it's as much as I expected, but it's not like we're a code shop or anything.

Kornfeld Eliyahu Peter 29-Nov-16 3:31am

:-) <-- did you saw it?
I was joking... I believe in the "right tool for..."

DarrenWheatley 29-Nov-16 3:36am

Well, I guess it was a quiet lunch...

Solution 14

COBOL has awesome string handling:

C#

identification division.
program-id. FixBadWords.

data division.
  
working-storage section.
   
	01 wsData.
	    03 wsComment occurs 5 times pic a(255).
	01 i         pic 9(1).

procedure division. 
	
	move "My PHBis such a poophead. It's gotten worse since his promotion" 
            to wscomment(1)
	move "My PHB is such a POOPHEAD!" to wscomment(2)

	perform varying i from 1 by 1 until i > 5
	    
		inspect wsComment(i) 
			replacing all 'poophead' by 'p**phead'
				      'POOPHEAD' BY 'P**PHEAD'
				      'PHB'      BY 'boss'
				      'gotten'   BY 'become'
		
		display wsComment(i)

	end-perform

stop run
.

Posted 29-Nov-16 20:38pm

Robert g Blair

Updated 1-Dec-16 16:59pm

v2

Comments

Chris Maunder 1-Dec-16 16:04pm

So easy it's almost cheating!

Robert g Blair 1-Dec-16 19:33pm

Well, you could do a lot to make it more gooderer:

- The string literals can be variables, which you could get from an input source, eg, a database, an input form, xml, whatever.

- You can also use TALLYING and BEFORE INITIAL ...

But case - case in COBOL is not so easy.
Sorry, but for COBOL solutions you have to choose a case. Upper or lower.

Case is merely a social construct y'know :)

Robert g Blair 1-Dec-16 23:02pm

You can test this code here: https://www.tutorialspoint.com/compile_cobol_online.php

Solution 16

EDIT:
* Figured out way to make 'poopoopoop' become 'p**p**p**p' (without recursion) and keep other design constraints.

* Figured out an 'Oh Duh - that's so simple!' way to keep 'nincompoop' as 'nincompoop' - just replace it with itself!

* Made class static, which resulted in about 8% inprovement. But the additional 'p**p**p**p' routine (replaceRepeatingStrings) dropped speed from 0.24ms to 0.37ms. Still 7 times faster than Solution 10 on my machine.
-END EDIT

Quote:
The point is to make the question a little loose to allow ... pointless religious wars

Good, I'll use some 'goto's :)

I'm not very familiar with regex, but believe it doesn't allow you to change the case of a word based on the cases of the surrounding words. (If I'm wrong, please enlighten me.) If that is the case, 'PHB' will always become 'boss' in the regex solutions, regardless of whether the input 'shouted' the phrase.

The following solution is C++ all the way, and is about 7 times faster than Solution 10 on my machine using Grant's newest timing routine. It does take the capitalization of the surrounding words into effect. It will also change 'pooppoop' to 'p**pp**p', which doesn't seem to have been specified by the specifications, although logically should occur. Additionally, it changes 'poopoopoop' into 'p**p**p**p' without recursion.

Quote:
A weekly simple programming problem that should take no more than half an hour or so

Yeah, right Chris... :rolleyes: This logic was a friggin' pain!

C++

#include <map>
#include <string>
#include <algorithm>
#include <vector>
#include <tuple>
#include <ctime>
#include <iostream>
#include "windows.h" //For QueryPerformanceCounter
using namespace std;

//I am going to use a tuple of: the input word, the small case output word, and the
//capped case output word, so they don't have to be calculated each time.
//So some defines are just for additional clarity:
#define GetTupleWord      get<0>
#define GetTupleSmallCase get<1>
#define GetTupleCapCase   get<2>


//Utility class for the timing...
class TimerLowRes {
   private:
      std::clock_t begC;
      std::clock_t avgTotC;
      std::clock_t diffC;
      int numTimesC;
   public:
      TimerLowRes() : avgTotC(0), numTimesC(0) { }

      void start() { begC = std::clock(); }

      std::clock_t stop() {
         diffC = std::clock() - begC;
         avgTotC = avgTotC + diffC;
         numTimesC++;
         return diffC;
         }

      std::clock_t getAvg() {
         if (numTimesC == 0) return 0;
         return avgTotC / numTimesC;
         }

      void reset() {
         numTimesC = 0;
         avgTotC = 0;
         }

      std::clock_t getLapTime() { return std::clock() - begC; }
   };




//High precision timer utility class for the timing.
//Derived from https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx
class TimerHighRes {
   private:
      LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds, Frequency;
      int numTimesC;
   public:
      TimerHighRes() : numTimesC(0) {
         QueryPerformanceFrequency(&Frequency);
         }

      void start() {
         QueryPerformanceCounter(&StartingTime);
         }

      LARGE_INTEGER stop() {
         QueryPerformanceCounter(&EndingTime);
         ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
         ElapsedMicroseconds.QuadPart *= 1000000;
         ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
         numTimesC++;
         return ElapsedMicroseconds;
         }
   };


class Reformatter {
   private:
      //Good chance that the implied design requirements involve UNICODE text, but for quick
      //prototype of similar design use wstring:

      typedef tuple<wstring, wstring, wstring> WordTuples;
      vector<WordTuples> repeatingStringsC;     //This will address items like 'poopoopoop'
      vector<WordTuples> caseSensitiveSubsC;    //This means that the inputted word must be
                                                //matched on exact case.
      vector<WordTuples> nonCaseSensitiveSubsC; //By this, it is meant that the inputted word
                                                //can be either caps or small case in the text.
      vector<WordTuples> asteriskSubsC;         //The CapCase will be ignored - only the
                                                //asterisks in the small case will be modified.

      wstring resultC;
      size_t  strLenC;
      wstring::const_iterator beginLocC;
      wstring::const_iterator curLocC;
      wstring::const_iterator endLocC;

      bool replaceAsterisks();
      bool replaceNonCaseSensitiveBegin();
      bool replaceCaseSensitive();
      bool replaceRepeatingStrings();
      bool previousWordIsCapitalized(wstring::const_iterator & thisWord, wstring::const_iterator & prevWord);
      bool nextWordIsCapitalized(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord);
      void findBeginningOfPrevious(wstring::const_iterator & thisWord, wstring::const_iterator & prevWord);
      void findBeginningOfNext(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord);

      bool isWhiteSpaceOrPunctuation(const wstring::const_iterator & temp);

   public:
      Reformatter() {
         repeatingStringsC.push_back(make_tuple(L"poo", L"p**", L"p"));

         caseSensitiveSubsC.push_back(make_tuple(L"PHB", L"boss", L"BOSS"));

         //The following 'gotten to be' must be defined before 'gotten':
         nonCaseSensitiveSubsC.push_back(make_tuple(L"gotten to be", L"become", L"BECOME"));
         nonCaseSensitiveSubsC.push_back(make_tuple(L"gotten", L"become", L"BECOME"));

         asteriskSubsC.push_back(make_tuple(L"nincompoop", L"nincompoop", L"nincomp##p"));
         asteriskSubsC.push_back(make_tuple(L"poop", L"p**p", L"P**P"));
         asteriskSubsC.push_back(make_tuple(L"p##p", L"p**p", L"P**P"));
         asteriskSubsC.push_back(make_tuple(L"ass", L"a**", L"A**"));
         }

      void reformat(const wstring & str);
      void outputResult();
   };


void Reformatter::outputResult() {
   wcout << L"OUTPUT: " << resultC << endl << endl;
   }


bool Reformatter::isWhiteSpaceOrPunctuation(const wstring::const_iterator & it) {
   if (*it == L' ' || *it == L'\t' || *it == L'.' || *it == L'!' || *it == L'_' ||
               *it == L'\r' || *it == L'\n') return true;
   return false;
   }


void Reformatter::findBeginningOfNext(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord) {
   //There is the possibility that there is no next word, but there is whitespace.
   //If that is the case, return prevWord = thisWord = curLocC.

   //Go to the whitespace at the end of the current word:
   while (thisWord != endLocC && !isWhiteSpaceOrPunctuation(thisWord)) ++thisWord;
   //Move 'thisWord' back to the beginning of the whitespace:
   nextWord = thisWord;
   //Now skip any additional whitespace:
   while (nextWord != endLocC && isWhiteSpaceOrPunctuation(nextWord)) ++nextWord;
   if (nextWord == endLocC) {
      nextWord = endLocC;
      thisWord = endLocC;
      return;
      }
   }


void Reformatter::findBeginningOfPrevious(wstring::const_iterator & thisWord, wstring::const_iterator & prevWord) {
   //There is the possibility that there is no previous word, but there is whitespace.
   //If that is the case, return prevWord = thisWord = (beginning of word).

   //Go to the whitespace before the current word:
   while (thisWord != beginLocC && !isWhiteSpaceOrPunctuation(thisWord)) --thisWord;
   //Move 'thisWord' back to the beginning (one space forward), and set 'prevWord' current pos:
   prevWord = thisWord;
   ++thisWord;
   //Now skip any additional whitespace:
   while (prevWord != beginLocC && isWhiteSpaceOrPunctuation(prevWord)) --prevWord;
   if (prevWord == beginLocC) {
      if (isWhiteSpaceOrPunctuation(prevWord)) prevWord = thisWord;
      return;
      }
   //We are now on the last character of the previous word.  Iterate to the beginning of it:
   while (prevWord != beginLocC && !isWhiteSpaceOrPunctuation(prevWord)) --prevWord;
   //Check for the case where the user starts the input with a space character:
   if (isWhiteSpaceOrPunctuation(prevWord)) ++prevWord;
   }


bool Reformatter::previousWordIsCapitalized(wstring::const_iterator & thisWord,
               wstring::const_iterator & prevWord) {

   //We are working from the 'curLocC' position in the string.
   //Create a temporary iterator and find the beginning of the previous word.
   //If it reaches the beginning of the string, return 'true' so the 'shouting'
   //routines only rely on the following word.
   findBeginningOfPrevious(thisWord, prevWord);
   if (thisWord == prevWord) return true; //We will default to 'true' for the previous word
               //and 'false' for the next word, so next word processing can occur.
   //Now find the case of each letter until the next whitespace:
   while (!isWhiteSpaceOrPunctuation(prevWord)) {
      wchar_t temp = *prevWord;
      if (iswlower(*prevWord)) return false;
      ++prevWord;
      }
   return true;
   }


bool Reformatter::nextWordIsCapitalized(wstring::const_iterator & thisWord, wstring::const_iterator & nextWord) {
   //We are working from the 'curLocC' position in the string.
   //Create a temporary iterator and find the beginning of the previous word.
   //If it reaches the beginning of the string, return 'true' so the 'shouting'
   //routines only rely on the following word.
   findBeginningOfNext(thisWord, nextWord);
   if (thisWord == nextWord) return false;   //We are defaulting to 'true' for previous word
               //processing, and 'false' for this, so if there isn't any text before or
               //after the word, both processings will occur.
   //Now find the case of each letter until the next whitespace:
   while (nextWord != endLocC && !isWhiteSpaceOrPunctuation(nextWord)) {
      wchar_t temp = *nextWord;
      if (iswlower(*nextWord)) return false;
      ++nextWord;
      }
   return true;
   }


bool Reformatter::replaceCaseSensitive() {
   //This returns true if a replacement has been made.
   bool found = false;
   wstring::const_iterator inputIterator;
   for (auto it = caseSensitiveSubsC.begin(); it != caseSensitiveSubsC.end(); ++it) {
      const wstring & str = GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str.length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || *inputIterator != str[pos]) {
            //The string doesn't match the criteria
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) {
         //There is a non-specified scenario to take care of.  If the user inputs something
         //like 'PHBblog.com', in reality we may not want this to substitute 'boss'.  Here
         //is logic to take care of that scenario:
         if (inputIterator == endLocC || !isWhiteSpaceOrPunctuation(inputIterator)) {
            return false;
            }
         //We now know that we need to replace the string, but we don't know if it needs to
         //be 'shouted'.
         //curLocC is still at the beginning of the text being checked against.  This can
         //only be at the beginning of a word.
         wstring::const_iterator thisWord = curLocC;
         wstring::const_iterator prevWord = curLocC;
         wstring::const_iterator nextWord = curLocC;
         //If someone has different length words for cap and small, this will need to be changed in the cases:
         if (previousWordIsCapitalized(thisWord, prevWord) && nextWordIsCapitalized(thisWord, nextWord)) {
            resultC.append(GetTupleCapCase(*it));
            }
         else {
            resultC.append(GetTupleSmallCase(*it));
            }
         //Now append any whitespace after the string:
         curLocC += str.length();
         while (curLocC != endLocC && isWhiteSpaceOrPunctuation(curLocC)) {
            resultC.append(1, *curLocC);
            ++curLocC;
            }
         return found;
         }
      }
   return found;
   }


bool Reformatter::replaceNonCaseSensitiveBegin() {
   //By 'non-case-sensitive', it is meant that the routine will check for either capped or
   //non-capped matches.
   bool found;
   vector<tuple<wstring, wstring, wstring>>::iterator it;
   wstring::const_iterator inputIterator;
   const wstring * str;

   //If there isn't a character before 'curLocC' we don't need to look any further.
   if (curLocC == beginLocC) return false;
   if (!(curLocC-1==beginLocC || isWhiteSpaceOrPunctuation(curLocC-1))) return false;
   if (!isWhiteSpaceOrPunctuation(curLocC-1)) return false;

   for (it = nonCaseSensitiveSubsC.begin(); it != nonCaseSensitiveSubsC.end(); ++it) {
      str = &GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str->length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || towlower(*inputIterator) != (*str)[pos]) {
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) break;
      }
   if (found) {
      //We need to see if it is capped, and if so, cap the output:
      inputIterator = curLocC;
      bool isCapped = true;
      for (int pos=0, len=(*str).length(); pos<len; ++pos) {
         if (iswlower(*inputIterator)) {
            isCapped = false;
            break;
            }
         ++inputIterator;
         }
      if (isCapped) {
         resultC.append(GetTupleCapCase(*it));
         }
      else {
         resultC.append(GetTupleSmallCase(*it));
         }
      curLocC += GetTupleWord(*it).length();
      return true;
      }
   return false;
   }


bool Reformatter::replaceAsterisks() {
   bool found;
   vector<tuple<wstring, wstring, wstring>>::iterator it;
   wstring::const_iterator inputIterator;
   const wstring * str;

   for (it = asteriskSubsC.begin(); it != asteriskSubsC.end(); ++it) {
      str = &GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str->length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || towlower(*inputIterator) != (*str)[pos]) {
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) break;
      }
   if (found) {
      //Go through the small cap tuple and if a character is an asterisk, replace
      //the corresponding character with an asterisk:
      str = &GetTupleSmallCase(*it);
      inputIterator = curLocC;
      for (int pos=0, len=(*str).length(); pos<len; ++pos) {
         if ((*str)[pos] == L'*') {
            resultC.append(L"*");
            }
         else {
            resultC.append(1, *curLocC);
            }
         ++curLocC;
         ++inputIterator;
         }
      return true;
      }
   return false;
   }


bool Reformatter::replaceRepeatingStrings() {
   bool found;
   vector<tuple<wstring, wstring, wstring>>::iterator it;
   wstring::const_iterator inputIterator;
   const wstring * str;

   for (it = repeatingStringsC.begin(); it != repeatingStringsC.end(); ++it) {
      str = &GetTupleWord(*it);
      inputIterator = curLocC;
      for (int pos=0, len=str->length(); pos<len; ++pos) {
         found = true;
         if (inputIterator == endLocC || towlower(*inputIterator) != (*str)[pos]) {
            found = false;
            break;
            }
         ++inputIterator;
         }
      if (found) {
         //The inputIterator is pointing to the letter right after the initial string
         //(ie, the final 'p' in 'poop')
         str = &GetTupleCapCase(*it);
         if (*inputIterator != (*str)[0]) found = false;
         }
      if (found) break;
      }
   if (found) {
      //Go through the small cap tuple and if a character is an asterisk, replace
      //the corresponding character with an asterisk:
      str = &GetTupleSmallCase(*it);
      inputIterator = curLocC;
      for (int pos=0, len=(*str).length(); pos<len; ++pos) {
         if ((*str)[pos] == L'*') {
            resultC.append(L"*");
            }
         else {
            resultC.append(1, *curLocC);
            }
         ++curLocC;
         ++inputIterator;
         }
      return true;
      }
   return false;
   }


void Reformatter::reformat(const wstring & str) {
   resultC.reserve(str.length()+1);
   resultC.clear();
   beginLocC = str.begin();
   curLocC = beginLocC;
   strLenC = str.length();
   endLocC = str.end();
   while (curLocC != endLocC) {
      //Skip processing on whitespace:
      while (curLocC != endLocC && isWhiteSpaceOrPunctuation(curLocC)) {
         resultC.append(1, *curLocC);
         ++curLocC;
         }
      if (curLocC == endLocC) return;
      //Now do the non-whitespace processing:
      if (!(replaceCaseSensitive() || replaceNonCaseSensitiveBegin() ||
                  replaceRepeatingStrings() || replaceAsterisks())) {
         resultC.append(1, *curLocC);
         ++curLocC;
         }
      }
   }


int main() {
   static Reformatter reformatter;

   //First, some tests to show that it is working:
   wstring str = L"I GOTTEN PHB lettuce!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   str = L"I GOTTEN phb lettuce!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   str = L"Not Poopoop, and PoopPoop nInComPoop!!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   str = L"My PHB is such a poophead. It's gotten worse since his promotion";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   //Changed the following to 'phb.com', so that program checks for more general case.
   //phb.com redirects to another site in reality, but there is no reason not to check
   //against being a URL in the implied design requirements.
   cout << "NOTE THAT 'boss' BECOMES CAPPED OR NON-CAPPED DEPENDING ON THE SURROUNDING WORDS:"
               << endl << "ALSO, 'PHB' BECOMES 'boss' DEPENDING UPON WHETHER 'PHB' IS BY ITSELF."
               << endl;
   str = L"MY PHB HAS started his new BLOG PHB_BLOG.com. And PHBBlog.com! "
               "He's GOTTEn TO BE SUCH A MISBEGOTTEN POOPHEAD!";
   reformatter.reformat(str);
   wcout << L"INPUT:  " << str << endl;
   reformatter.outputResult();

   //Now do an timing run:
   TimerLowRes timerLowRes;
   TimerHighRes timerHighRes;
   LARGE_INTEGER timeRighRes;
   vector<float> timeList;
   int numIterations = 100;
   int totalTests = 10;
   for (int numTimes=0; numTimes<totalTests; ++numTimes) {
      timerLowRes.start();
      timerHighRes.start();
      for (int x=0; x<numIterations; ++x) {
         //reformatter.reformat(L"MY PHB HAS started his new BLOG PHB_BLOG.com. And PHBBlog.com! He's gotten TO BE "
         //            " SUCH A MISBEGOTTEN POOPHEAD!");
         reformatter.reformat(L"My PHB is such a poophead. It's gotten worse since his promotion");
         //reformatter.reformat(L"My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!");
         //reformatter.reformat(L"I GOTTEN PHB lettuce!");
         //reformatter.reformat(L"Not Poopoop, but PoopPoop!!");  //Note that this will handle subsequent 'poops' correctly if 2 'p's together at end.
         }
      timeRighRes = timerHighRes.stop();
      clock_t timeLowRes(timerLowRes.stop());
      cout << "low res time: " << timeLowRes << " milliseconds" << endl;
      cout << "high res time: " << timeRighRes.QuadPart/(float)1000 << " milliseconds" << endl;
      timeList.push_back(timeRighRes.QuadPart/(float)1000);
      }
   float avg = 0.0;
   for (int i=0; i<totalTests; ++i) avg += timeList[i];
   avg = avg/totalTests;
   cout << endl << "Average = " << avg << " milliseconds";
   cout << endl << "Enter any key, then an 'Enter' to exit: ";
   cin >> timeRighRes.QuadPart; //Junk way to exit console, but good enough!
   return 0;
   }

Didn't use any goto's, for those who are counting, but wouldn't have given a crap if I did.

It might be possible to streamline the logic in the replaceCaseSensitive and other replacement routines. All the 'found' checking and setting seemed excessive, but it worked and I moved on.

David

Posted 1-Dec-16 5:17am

David O'Neil

Updated 1-Dec-16 14:08pm

v2

Comments

Robert g Blair 1-Dec-16 15:59pm

David - my COBOL solution took just 15 minutes to write (and I haven't coded COBOL for many years).

And it will run blindingly fast. Everything is handled in a single intrinsic, compiler optimized, command.

I didn't add the "extra" bits you have, because they weren't in the spec, and I didn't want to spend any time on it.

To implement "pooppoop" -> "p**pp**p" is a simple REPLACING operation.

To implement "poopoopoop" -> "p**p**p**p" would involve a REPLACING "poo" BEFORE INITIAL "p**" - and recursing.

David O'Neil 1-Dec-16 20:22pm

I like the the simplicity of your COBOL solution! I can see a C++ STL equivalent in my head that would be a little longer, but keep your simplicity. Out of curiosity, can you determine how fast your routine compares to Graeme's output for Solution 10 on your machine? I'm just curious how efficient intrinsics are - I've never played with them.

Your "REPLACING "poo" BEFORE INITIAL "p**"" comment triggered my new approach to solving that part of the problem - without recursion! Thanks!

Robert g Blair 1-Dec-16 22:04pm

David - its very hard to compare.

If you remove the Display statement it will run in less than a millisecond on, say, a Fujitsu mainframe.
It runs in a couple milliseconds on a Ubuntu box with GnuCobol.

But what I coded was very compiler friendly - hard coded literals, single statement etc.

FYI: When I say "intrinsic" I mean in this sense: https://en.wikipedia.org/wiki/Intrinsic_function
Ie, the compiler is optimizing the string handling involved in the INSPECT statement execution. Whether it does it well or no is a function of the compiler's competence.

Solution 2

EDIT: Ah, the wonders sleep can do. Even though a solution was already chosen, I had an idea to improve my solution.

This was fun. I don't get to use regex too much in my day to day sadly.

C#

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

public class BadWordFilter
{
    public static void Replace(ref string input,
        IDictionary<string, string> badWordMap)
    {
        if (string.IsNullOrWhiteSpace(input))
            throw new ArgumentException(nameof(input),
                "String cannot be null, empty, or whitespace.");

        foreach (KeyValuePair<string, string> badWord in badWordMap)
            ReplaceBadWord(ref input, badWord);
    }

    private static void ReplaceBadWord(ref string input,
        KeyValuePair<string, string> badWord)
    {
        if (string.IsNullOrWhiteSpace(badWord.Key))
            throw new ArgumentException(nameof(badWord.Key),
                "Key cannot be null, empty, or whitespace.");

        string pattern = GetReplacementPattern(badWord.Key);
        MatchEvaluator evaluator = match =>
        {
            string replacement = badWord.Value;
            if (match.Value == match.Value.ToUpper())
            {
                if (badWord.Key != badWord.Key.ToUpper())
                    replacement = badWord.Value.ToUpper();
            }
            else if (match.Value[0] == char.ToUpper(match.Value[0]))
                replacement = char.ToUpper(badWord.Value[0]) + badWord.Value.Substring(1);
            return replacement;
        };
        input = Regex.Replace(input, pattern, evaluator, RegexOptions.IgnoreCase);
    }

    private static string GetReplacementPattern(string badWordKey)
    {
        StringBuilder pattern = new StringBuilder(
            $@"(?:\b){badWordKey.Trim('*')}"
        );
        if (badWordKey.StartsWith("*"))
            pattern.Insert(6, @"\w*", 1);
        if (badWordKey.EndsWith("*"))
            pattern.Append(@"\w*");
        return pattern.ToString();
    }
}

You use it like this:

C#

static void Main(string[] args)
{
    Dictionary<string, string> badWords = new Dictionary<string, string>()
    {
        {"poop*", "p**phead"},
        {"*HB", "boss"},
        {"gotten", "become"},
        {"*crap*", "taco supreme"}
    };
    string input = "My PHB is such a poophead. It's gotten worse since his promotion.  In fact you might call him a supercraphead!";
    BadWordFilter.Replace(ref input, badWords);
    Console.WriteLine(input);
    Console.ReadKey();
}
//Output:
//My boss is such a p**phead.  It's become worse since his promotion.  In fact you might call him a taco supreme!

Handles *text, text*, and *text* wildcards. Will not capitalize a replacement for a word that is by default capitalized (ex. PHB -> boss). Maintains first-letter capitalization when appropriate. Also support any regex code you'd like to put in the key since I directly inject it. For example {"p[oO0]+p*", "p**phead"} will catch poop, poOp, p0OPhead, or even PoO0o0Ophead.

Posted 25-Nov-16 16:34pm

Jon McKee

Updated 26-Nov-16 10:12am

v7

Solution 3

After a little more refactoring and such:

C#

PIEBALD.Type.ReplaceOmatic replacer = 
  new PIEBALD.Type.ReplaceOmatic
  (
    new System.Tuple<string,string> ( "PHB!"        , "boss!"  ) 
  ,
    new System.Tuple<string,string> ( "gotten"      , "become" ) 
  ,
    new System.Tuple<string,string> ( "*p[o0]{2}p*" , "p**p"   ) 
  ) ;

string intext = "My PHB is such a poophead. It's gotten worse since his promotion. My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!" ;

string outtext = intext ;

int total = replacer.Replace ( ref outtext ) ;

System.Console.WriteLine ( "{0}\n{1}\n{2}" , intext , total , outtext ) ;

Produces:

My PHB is such a poophead. It's gotten worse since his promotion. My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!
5
My boss is such a p**phead. It's become worse since his promotion. My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!

This just in (2016-11-27): ReplaceOmatic[^]

Late breaking news (2016-11-28):
I now have it tracking how many characters were replaced in each section, so it will stop applying replacements once the entire section has been replaced.
This is a reason to sort the more-frequently-hit replacements to the front of the List.

And I've tightened up the code for copying characters to a StringBuilder.

Part of the change:

C#

int r = 0 ;

/* Apply ReplacementSpecs to the section until we run out of them or we've replaced the entire section. */
for ( int i = 0 ; ( i < this.rep.Count ) && ( r < Candidate.Length ) ; i++ )
{
  /* Track how many characters have been replaced so far. */
  r += this.rep [ i ].Replace ( ref temp ) ;
}

News flash! (2016-11-30):
Added Sorting and updated the Article and uploaded the latest code (I hope).

Posted 25-Nov-16 21:55pm

PIEBALDconsult

Updated 30-Nov-16 12:49pm

v9

Solution 6

I saw that no one addressed JavaScript jet...
I did...
JavaScript

function Bad2Good(text) { var T = { 'poophead': 'p**phead', 'PHB!': 'boss', 'gotten': 'become', 'poop*': 'p**p', '*poop': 'p**p' }; for (var prop in T) { var flags = 'ig'; // starts with if (prop[prop.length-1] == '*') { prop = /(\s|[!?.,])poop(\w*)/; } // ends with if (prop[0] == '*') { prop = /(\w*)poop(\s|[!?.,])/; } // case sensitive if (prop[prop.length - 1] == '!') { flags = 'g'; prop = prop.substr(0, prop.length - 1); } text = text.replace(new RegExp(prop, flags), function (match, p1, p2, p3, p4) { if (Number.isInteger(p1)) { if (match.toUpperCase() == match) { // case sensitive if (T[match.toLowerCase()] == undefined) { return (T[match + '!']); } // shouting else { return (T[match.toLowerCase()].toUpperCase()); } } // normal else { return (T[match]); } } else { // starts with if(/(\s|[!?.,])/.test(p1)) { match = match.substr(p1.length, match.length - p1.length - p2.length); return (p1 + ((match.toUpperCase() == match) ? T[match.toLowerCase() + '*'].toUpperCase() : T[match.toLowerCase() + '*']) + p2); } // ends with else if (/(\s|[!?.,])/.test(p2)) { match = match.substr(p1.length, match.length - p1.length - p2.length); return (p1 + ((match.toUpperCase() == match) ? T['*' + match.toLowerCase()].toUpperCase() : T['*' + match.toLowerCase()]) + p2); } else { return (match); } } }); } return (text); }

Got home. Got time. Saw some of my bugs and smashed them...

JavaScript

function Bad2Good(text)
{
	var T = {
		'PHB!': 'boss',
		'gotten': 'become',
		'poop*': 'p**p',
		'*poop': 'p**p'
	};

	for (var prop in T)
	{
		var flags = 'ig';

        // case sensitive
		if (prop[prop.length - 1] == '!')
		{
			flags = 'g';
			prop = prop.substr(0, prop.length - 1);
		}

		// starts with
		if (prop[prop.length - 1] == '*')
		{
			prop = '(\\b)' + prop.substr(0, prop.length - 1) + '(\\w*)(\\b)';
		} 
        
        // ends with
		if (prop[0] == '*')
		{
			prop = '(\\b)(\\w*)' + prop.substr(1, prop.length) + '(\\b)';
		}

		text = text.replace(new RegExp(prop, flags), function (match, len, p1)
		{
            var lookup;
            var replace;
            var good;

            if(Number.isInteger(len))
            {
                replace = match;
                lookup = match;
            }
            else 
            {
                if(match.startsWith(p1)) 
                {
                    replace = match.replace(p1, '');
                    lookup = '*' + replace;
                }
                else
                {
                    replace = match.replace(p1, '');
                    lookup = replace + '*';
                }
            }

            good = T[lookup.toLowerCase()] || T[lookup + '!'];

            if(T[lookup + '!'] == undefined)
            {
                if((replace.toUpperCase() == replace))
                {
                    good = good.toUpperCase();
                }
                else if((replace.toLowerCase() == replace))
                {
                    good = good.toLowerCase();
                }
            }

            return(match.replace(replace, good));
		});
	}

	return (text);
}

Posted 27-Nov-16 1:15am

Kornfeld Eliyahu Peter

Updated 27-Nov-16 11:40am

v3

Comments

PIEBALDconsult 27-Nov-16 17:26pm

Are your replacements hard-coded?

Kornfeld Eliyahu Peter 27-Nov-16 17:40pm

As a bug... I just updated with the home-made version :-)

Solution 7

Quick and dirty Javascript

JavaScript

var bad = ["poop*", "PHB!", "gotten"];
var replacement = ["p**p", "boss", "become"];
function escapeRegExp(str) {
  return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
function replaceBad(sentence){
	return sentence.split(/\b/).map(w => {
		for(var i=0, m=bad.length; i<m; i++){
			var badToken = bad[i];
			if(badToken.slice(-1)==='*') { // ends with *
				return w.replace(new RegExp('^' + escapeRegExp(badToken.slice(0,-1)) + '(.*)$', 'i'), replacement[i]+'$1');
			}
			if(badToken.slice(0)==='*') { // starts with *
				return w.replace(new RegExp('^(.*)'+ escapeRegExp(badToken.slice(1)) + '$', 'i'), '$1' + replacement[i]);
			}
			if(badToken.slice(-1)==='!') { // ens with !
				return w.replace(new RegExp('^'+ escapeRegExp(badToken.slice(0,-1)) + '$', ''), replacement[i]);
			}
			return w.replace(new RegExp('^'+badToken+'$', 'i'), replacement[i]);
		}
	}).join('');
}

var test=['Hello poophead; glad your PHB has "gotten" less rotten',
'Hi poopheadpoop; glad your PhB has "undergotten" less NOPHB'
];

test.forEach(s => {
	console.log(s + ' => ' + replaceBad(s)); 
})

Posted 28-Nov-16 0:33am

Jacques Sineriz

Solution 10

Only spotted this 20 min ago, so hopefully not too late to the tea party!

C# code allows for individual word case sensitivity + shortcut override (in bonus requirement). All cases use the same bad word replacement extension methods. ReplaceWords string extension for a set of words or ReplaceText string extension for single word granularity.

C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace BadWordFilter
{
    class Program
    {
        static void Main(string[] args)
        {
            FirstTest();
            BonusTest();

            Console.WriteLine("-- Press any key to exit --");
            Console.ReadKey();
        }

        private static void FirstTest()
        {
            var badWords = new Dictionary<string, tuple<string, bool>>()
            {
                {"poophead", new Tuple<string, bool>("p**phead", false)},
                {"PHB", new Tuple<string, bool>("boss", false)},
                {"gotten",new Tuple<string, bool>("become", false) }
            };

            string badMessage = "My        PHB is such a poophead. It's gotten worse since his promotion";
            string goodMessage = badMessage.ReplaceWords(badWords, filtered: false);

            Console.WriteLine("First Test");
            Console.WriteLine("==========");
            Console.WriteLine($"Before: {badMessage}");
            Console.WriteLine($"After:  {goodMessage}");
            Console.WriteLine();

            badWords = new Dictionary<string, tuple<string, bool>>()
            {
                {"POOPHEAD", new Tuple<string, bool>("P**PHEAD", true)},
                {"poophead", new Tuple<string, bool>("p**phead", false)},
                {"PHB", new Tuple<string, bool>("boss", true)}
            };

            badMessage = "My PHB is such a POOPHEAD!";
            goodMessage = badMessage.ReplaceWords(badWords, filtered: false);

            Console.WriteLine($"Before: {badMessage}");
            Console.WriteLine($"After:  {goodMessage}");
            Console.WriteLine();

        }

        private static void BonusTest()
        {
            var badWords = new Dictionary<string, tuple<string, bool>>()
            {
                {"POOP*", new Tuple<string, bool>("P**P", true) },
                {"poop*", new Tuple<string, bool>("p**p", false) },
                {"PHB!", new Tuple<string, bool>("boss", true) },
                {"gotten",new Tuple<string, bool>("become", true) }
            };

            string badMessage = "My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!";
            string goodMessage = badMessage.ReplaceWords(badWords, filtered: true);

            Console.WriteLine("Bonus Test");
            Console.WriteLine("==========");
            Console.WriteLine($"Before: {badMessage}");
            Console.WriteLine($"After:  {goodMessage}");
            Console.WriteLine();
        }
    }

    public static class StringExtension
    {
        private static readonly string[] spaceSplitChar = { " " };
        private static readonly char[] wordSplitChars = @"?<=[\.!\?])".ToCharArray();

        public static string ReplaceWords(this string input, Dictionary<string, tuple<string, bool>> badWords, bool filtered = false, bool trimWhitespaces = true)
        {
            if (string.IsNullOrEmpty(input))
                return input;

            foreach (var word in badWords)
                input = input.ReplaceText(word.Value.Item1, word.Key, filtered: filtered, isCaseSensitive: word.Value.Item2, trimWhitespaces: trimWhitespaces);

            return input;
        }

        public static string ReplaceText(this string input, string replaceText, string findText = "", bool filtered = false, bool isCaseSensitive = false, bool trimWhitespaces = true)
        {
            if (string.IsNullOrEmpty(input))
                return input;

            var sb = new StringBuilder();
            bool isMatchStart = false, 
                 isMatchEnd = false;
            StringComparison compareMode = isCaseSensitive ? StringComparison.InvariantCulture : StringComparison.InvariantCultureIgnoreCase;

            if (filtered)
            {
                isMatchStart = findText.EndsWith("*");
                isMatchEnd = findText.StartsWith("*");
                compareMode = findText.EndsWith("!") ? StringComparison.InvariantCulture : compareMode;
                findText = findText.Trim(new[] { '*', '!' });
            }

            int lenOldValue = findText.Length,
                curPosition = 0,
                idxNext = input.IndexOf(findText, compareMode);

            while (idxNext >= 0)
            {
                if (isMatchStart || !isMatchEnd)
                {
                    if (input.Substring(idxNext - 1, 1).Equals(" "))
                        sb.Append(input, curPosition, idxNext - curPosition)
                          .Append(replaceText);
                }
                else if (isMatchEnd)
                {
                    sb.Append(input, curPosition, idxNext - curPosition)
                      .Append(curPosition < input.Length && input.Substring(curPosition, curPosition + 1)[0].IsEndChar() ? findText : replaceText);
                }

                curPosition = idxNext + lenOldValue;
                idxNext = input.IndexOf(findText, curPosition, compareMode);
            }
            sb.Append(input, curPosition, input.Length - curPosition);

            input = sb.ToString();
            if (trimWhitespaces)
                input= input.TrimCharacter(' ');

            return input;
        }

        public static bool IsEndChar(this char c)
            => wordSplitChars.Contains(c);

        public static string TrimCharacter(this string input, char c)
            => string.Join(c.ToString(), input.Split(c).Where(str => str != string.Empty).ToArray());

    }
}

Output from execution:

First Test
==========

Before: My        PHB is such a poophead. It's gotten worse since his promotion
After:  My boss is such a p**phead. It's become worse since his promotion

Before: My PHB is such a POOPHEAD!
After:  My boss is such a P**PHEAD!


Bonus Test
==========

Before: My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!
After:  My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!

-- Press any key to exit --

This code is built for efficiency, so no RegEx used! ;)

[edit: hopefully I have fixed all the weird stuff when code pasted in...]

Posted 28-Nov-16 3:41am

Graeme_Grant

Updated 28-Nov-16 4:29am

v12

Solution 11

VISUAL PROLOG 7.5 Solution.

Note: the bad_nice/2 clauses can be moved to an external file for maintenance independent of the rest of the code.

This could be tidied up more, but I considered speed of submission to be the priority here.

Harrison Pratt

class predicates
    badWordFix : ( string TextToFilter ) -> string FixedText.
clauses
    badWordFix( Text ) = FixedText :-
        Splitters = " .,:!",
        FixedText = fixText ( Text, Splitters, "" ).

        class predicates
            fixText : ( string, string Separators, string Temp ) -> string.
        clauses
            fixText( "",_,S ) = S :-!.
            fixText( S, Separators, Temp ) = FixedText :-
                string::splitStringBySeparators(S,Separators,HeadStr,CurrSep,Rest ),
                !,
                FixedText = fixText( Rest, Separators, string::concat( Temp, filteredWord(HeadStr), string::charToString(CurrSep) )).
            fixText( S, Separators, Temp ) = FixedText :-
                FixedText = fixText("",Separators,string::concat(Temp,S )).

    class predicates
        filteredWord : ( string ) -> string .
    clauses
        filteredWord( S ) = NiceWord :-
            CleanS = stripPrefix( stripSuffix(S,"*"), "*" ),
            bad_nice( CleanS, NiceWord ), !.
        filteredWord( S ) = S.

    class predicates
        bad_nice : ( string, string [out] ) determ.
    clauses
        bad_nice( "poophead", "p***head" ).
        bad_nice( "PHB", "boss" ).
        bad_nice( "gotten", "become" ).

    class predicates
        stripPrefix : ( string, string ) -> string.
    clauses
        stripPrefix( S, Pre ) = StripStr :-
            string::hasPrefix(S,Pre,StripStr), !.
        stripPrefix( S, _ ) = S.
    
    class predicates
        stripSuffix : ( string, string ) -> string.
    clauses    
        stripSuffix( S, Suff ) = StripStr :-
            string::hasSuffix(S,Suff,StripStr), !.
        stripSuffix( S, _ ) = S.

Posted 28-Nov-16 3:50am

Harrison Pratt

Solution 12

I'm a fan of clean code.
Which for me is at least:
- It should be expressive-> meaningful names
- Smaller is better -> methods should be short
- Can be easily extended by any other developer -> if requiremets changes easy to adapt
- Reading the code should be pleasant -> because you can follow the logic without overstress your brain

Here it is (BadWordsMatch.cs):

C#

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text;
using System.Text.RegularExpressions;

namespace BadWords
{
    public enum eHow { easy, hard };

    public class BadWordsMatchAndReplace
    {
            public enum eMatchMode { startsWith, Endswith, Equals };
            public struct sWildCardAndCase
            {
                public eMatchMode matchMode;
                public bool respectCase;
                public string key;
            }

            public static Dictionary<sWildCardAndCase, string> dictBadWordsEasy = new Dictionary<sWildCardAndCase, string>(){
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key ="poophead"}, "p**phead"},
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key ="PHB"}, "boss"},
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key ="gotten"} , "become"}
            };

            public static Dictionary<sWildCardAndCase, string> dictBadWordsHard = new Dictionary<sWildCardAndCase, string>(){
                {new sWildCardAndCase() { matchMode = eMatchMode.startsWith, respectCase = false, key = "poop" }, "p**p" },
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = true, key = "PHB" }, "boss" },
                {new sWildCardAndCase() { matchMode = eMatchMode.Equals, respectCase = false, key = "gotten" }, "become" }

           };

            // shouting is retrieved by allowing an exclamation mark in words and later on compared to last char therein
            public static Regex r = new Regex(@"([!\w])+", RegexOptions.None);
            public static bool shouting = false;

            /// <summary>
            /// checks an input sentence for bad words and replaces them by safer expressions
            /// </summary>
            /// <param name="input">sentence to check</param>
            /// <param name="how">easy or hard substitution rule</param>
            /// <returns>corrected string</returns>           
            /// 
            public static string MatchAndReplace(string input, eHow how)
            {

                string output = input;
                string s = string.Empty;
                shouting = false;

                Dictionary<sWildCardAndCase, string> baseDict = (how == eHow.easy ? dictBadWordsEasy : dictBadWordsHard);
                Match m = r.Match(input);

                while (m.Success)
                {
                    s = m.Value;
                    bool found = false;

                    foreach (KeyValuePair<sWildCardAndCase, string> kvp in baseDict)
                    {
                        bool respectCase = kvp.Key.respectCase;
                        if ( s.EndsWith("!"))
                        {
                            // case insensitive and make replacement uppercase
                            shouting = true;
                            respectCase = false;
                        }
                        // looks for matches under given constraints and updates output accordingly
                        found = ProcessWord(s, kvp.Key.key, respectCase, kvp.Key.matchMode, kvp.Value, ref output);
                        if (found) break;
                    }
                    m = m.NextMatch();
                }
                return output;
 

            }

            /// <summary>
            /// looks for a match of word under given constraints and updates output accordingly
            /// </summary>
            /// <param name="word">word from input string</param>
            /// <param name="key">bad word</param>
            /// <param name="respectCase">match only case exactly</param>
            /// <param name="matchMode">wildcard specifier</param>
            /// <param name="value">replacement</param>
            /// <param name="output">output -> output with replacement</param>
            /// <returns>no match -> false else true</returns>
            ///             
            private static bool ProcessWord(string word, string key, bool respectCase, eMatchMode matchMode, string value, ref string output)
            {
                if (respectCase && !word.Contains(key)) return false;
                if (!respectCase && !word.ToLower().Contains(key.ToLower())) return false;
                if (!ConstraintsMet(  word,  key,  respectCase,  matchMode)) {
                    return false;
                }
                if (respectCase)
                {
                    output = output.Replace(key, value);
                }
                else
                {
                    if (word.Any(char.IsUpper))
                        output = output.Replace(key.ToUpper(), value.ToUpper());
                    else if (shouting)
                        output = output.Replace(key, value.ToUpper());
                    else
                        output = output.Replace(key, value);
                }
                return true;

            }

            /// <summary>
            /// checks for constraints
            /// </summary>
            /// <param name="word">word from input string</param>
            /// <param name="key">bad word</param>
            /// <param name="respectCase">match only case exactly</param>
            /// <param name="matchMode">wildcard specifier</param>
            /// <returns>constraints met -> true else false</returns>
            ///              
            private static bool ConstraintsMet( string word, string key, bool respectCase, eMatchMode matchMode)
            {
                string wordToCompare = word;
                if (shouting) { wordToCompare = wordToCompare.TrimEnd('!'); };
                switch (matchMode)
                {
                    case eMatchMode.Equals:
                        if (respectCase && !wordToCompare.Equals(key)) { return false; }
                        else if (!respectCase && !wordToCompare.ToLower().Equals(key.ToLower())) { return false; }
                        break;
                    case eMatchMode.startsWith:
                        if (respectCase && !wordToCompare.StartsWith(key)) { return false; }
                        else if (!respectCase && !wordToCompare.ToLower().StartsWith(key.ToLower())) { return false; }
                         break;
                    case eMatchMode.Endswith:
                        if (respectCase && !wordToCompare.EndsWith(key)) { return false; }
                        else if (!respectCase && !wordToCompare.ToLower().EndsWith(key.ToLower())) { return false; }
                        break;
                }
                return true;

            }

    }
}

It should be self explaining.
Most important is: I concentrate the requirements about case sensitivity and wildcard rules in a data structure which relieves me from looking for *s before or
after the search key.
For the 'Easy part' I would not had a need for this but once I made it for the hard
part it's nice to have it for this part too, because the code can be one for both
(

C#

Dictionary<sWildCardAndCase, string> baseDict = (how == eHow.easy ? dictBadWordsEasy : dictBadWordsHard);

For Testing my Code I have provided a little Console Main, which is here (Program.cs):

C#

using System;
using System.Collections.Generic;
using System.Linq;


namespace BadWords
{
    class Program
    {
        static void Main(string[] args)
        {        
            string input = string.Empty;
            string s = string.Empty;
            string output = string.Empty;

            while (!(s.Equals("q")))
            {
                while (!(s = Console.ReadLine()).Equals("."))
                {
                    input = input + s;
                }
                Console.WriteLine("------------- Easy Match ---------------"); 
                Console.WriteLine(input);
                output = BadWordsMatchAndReplace.MatchAndReplace(input, eHow.easy);
                Console.WriteLine(output);
                Console.WriteLine("------------- Bonus Match ---------------");
                Console.WriteLine(input);
                output = BadWordsMatchAndReplace.MatchAndReplace(input, eHow.hard);
                Console.WriteLine(output);
                Console.WriteLine("-------------- Next Input ---------------");
                s = Console.ReadLine();
                input = s;

            }
        }
    }
}

Usage: Paste your input (multiline allowed), then type <return>.<return>
If you want to stop the program then type <return>q<return>

Posted 28-Nov-16 13:26pm

ernst zwecker

Comments

PIEBALDconsult 29-Nov-16 18:51pm

Oh, the humanity! Wasteful wasteful. You throw away perfectly good information that can be used later.
You don't even allow for multiple keys to match a word.

Solution 13

I decided to use F#…

Each filter specification is converted to a closure that takes a string, potentially sanitises it and returns the good string. To process all words in a string, use Regex.Replace with a MatchEvaluator that runs the filter list over the matched word.

F#

open System
open System.Text.RegularExpressions

// Define an active pattern to match a string starting with a prefix and return the remnant of the string
let (|Prefix|_|) (ignoreCase:bool) (prefix:string) (s:string) =
   let maybePrefix = s.Substring(0, (min prefix.Length s.Length))
   match String.Compare(maybePrefix, prefix, ignoreCase) with
   | 0 -> Some(maybePrefix, s.Substring(prefix.Length))
   | _ -> None

// Define an active pattern to match a string ending with a suffix and return the remnant of the string
let (|Suffix|_|) (ignoreCase:bool) (suffix:string) (s:string) =
   let maybeSuffix = s.Substring((max 0 (s.Length - suffix.Length)))
   match String.Compare(maybeSuffix, suffix, ignoreCase) with
   | 0 -> Some(maybeSuffix, s.Substring(0, s.Length - suffix.Length))
   | _ -> None

// Adjust case of a good word to reflect how the bad word's been used
let AdjustCase (suffix:string) (badBit:string) (goodWord:string) (ignoreCase:bool) =
   if not ignoreCase then
      goodWord
   else if badBit = suffix.ToUpper() then
      goodWord.ToUpper()
   else if badBit = suffix.ToLower() then
      goodWord.ToLower()
   else
      goodWord

// Create a filter (a closure of string -> string) from a filter spec (pair of strings, 
// first = bad word, second = replacement). Applying the closure to a string will sanitise it.
let createFilter (spec:string*string) =
   let rec createFilterHelper (badWord:string) (goodWord:string) (ignoreCase:bool) =
      match badWord with
      // Case sensitive?
      | Suffix true "!" (_, prefix) -> createFilterHelper prefix goodWord false
      // badWord is a prefix (<string>*)
      | Suffix true "*" (_, prefix) ->
         fun (word:string) ->
            match word with
            | Prefix ignoreCase prefix (badBit, restOfWord) -> (AdjustCase prefix badBit goodWord ignoreCase) + restOfWord
            | anyOtherWord -> anyOtherWord
      // badWord is a sufffix (*<string>)
      | Prefix true "*" (_, suffix) ->
         fun (word:string) ->
            match word with
            | Suffix ignoreCase suffix (badBit, restOfWord) -> restOfWord + (AdjustCase suffix badBit goodWord ignoreCase)
            | anyOtherWord -> anyOtherWord
      // badWord is fixed
      | anyOtherWord -> 
         fun (word:string) ->
            match String.Compare(word, badWord, ignoreCase) with
            | 0 -> AdjustCase badWord word goodWord ignoreCase
            | _ -> word
   // Invoke createFilterHelper
   createFilterHelper (fst spec) (snd spec) true

// Create a filter list from a spec list
let createFilters specs = specs |> List.map createFilter

// Apply a sequence of filters to a word
let filterAWord filters word = filters |> Seq.fold (fun word filter -> filter word) word

// Apply a sequence of filters to each word in a string
let filterWords filters text = Regex.Replace(text, "\w+", MatchEvaluator(fun regexMatch -> filterAWord filters regexMatch.Value))

// These are my test filter specs
let filterSpecs = [ ("poop*", "p**p"); ("PHB!", "boss") ; ("gotten", "become") ]

// And my test filters
let compiledFilters = createFilters filterSpecs

let tests = 
   printfn "Poophead -> p**phead = %b" ((filterAWord compiledFilters "Poophead") = "p**phead")
   printfn "PHB -> boss = %b" ((filterAWord compiledFilters "PHB") = "boss")
   printfn "Phb -> Phb = %b" ((filterAWord compiledFilters "Phb") = "Phb")
   printfn "gotten -> become = %b" ((filterAWord compiledFilters "gotten") = "become")
   printfn "<long string=""> - %b" ((filterWords compiledFilters "My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!") = "My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!")
</long></string></string>

Posted 29-Nov-16 7:26am

Stuart Dootson

Comments

PIEBALDconsult 29-Nov-16 18:52pm

Does the filter indicate whether or not anything was replaced?

Stuart Dootson 30-Nov-16 2:09am

No, but they could easily do so (return tuple of bool & string?). To stop filtering once a substitutions made, I presume? If that's the case, the Seq.fold needs to become a 'Seq.map |> Seq.find' - that's lazy evaluated, so not all the filters would be processed.

Stuart Dootson 30-Nov-16 5:25am

In fact - the easiest way to accomplish this is to:

1. Change the filter to return a string option (Some string indicates that filtering was done)

2. Add a 'null' filter (always returns the input word) at the end of the filter list to ensure there's always a succeeding filter.

3. Change the filterAWord function to be:

let filterAWord (filters:(string -> string option) seq) (word:string) =
      filters
      |> Seq.map (fun filter -> filter word)
      |> Seq.find Option.isSome
      |> Option.get

Solution 15

NOTE: This is a performance summary comparison of the above C# solutions only

** Updated 2/12/16: Stuart Dootson supplied test project, so timings updated.

I was curious to see how the other C# solutions compared performance wise to my own, so I build a test bed that you can download[^] and try yourself. I balanced out all the solutions so they ran equally against the same test strings.

~~Disclaimer: I did not check each solution to see if they all gave the same output or not.~~ I have now run tests on each of the solutions and 3 of them fail the benchmark set. (I suggest that the authors of solutions 3, 5 & 12 download the link above and check for errors.)

Here are the results (best to worse ranked by average) on my dev machine compiled in Release Mode and run from the command line.

(Note: Results will differ for you based on your system configuration)

After the initial positioning of results and tweaks to all the tests (v2), I felt that I could improve my results a little, so I've made some minor changes (can be found in the download link above), and the revised results are now:

-----------------------------------------------------------------------------------
 
Tests run:        10
Iterations/test:  100
Total executions: 1,000
 
Basic Test
 
   Solution 10   : MIN:  0.72670 ms | MAX:  0.96770 ms | AVG:  0.75483 ms
   Solution 1    : MIN:  1.17980 ms | MAX:  2.08300 ms | AVG:  1.30762 ms
   Solution 3    : MIN:  2.92230 ms | MAX:  3.71300 ms | AVG:  3.03632 ms
   Solution 8    : MIN:  3.97410 ms | MAX:  4.34740 ms | AVG:  4.06542 ms
   Solution 12   : MIN:  4.97950 ms | MAX:  5.31250 ms | AVG:  5.04871 ms
   Solution 13*2 : MIN:  7.2594 ms  | MAX: 10.2251 ms  | AVG:  8.18209 ms
   Solution 2    : MIN:  7.33320 ms | MAX:  8.42080 ms | AVG:  7.63468 ms
   Solution 13*1 : MIN:  8.4053 ms  | MAX: 14.0585 ms  | AVG: 10.4869 ms
   Solution 5    : MIN: 14.79730 ms | MAX: 17.29360 ms | AVG: 15.54762 ms
 
Bonus Test
 
   Solution 10   : MIN:  1.08750 ms | MAX:  1.13540 ms | AVG:  1.10539 ms
   Solution 1    : MIN:  1.09130 ms | MAX:  1.35580 ms | AVG:  1.15636 ms
   Solution 3    : MIN:  2.95310 ms | MAX:  3.38000 ms | AVG:  3.04058 ms
   Solution 8    : MIN:  5.37370 ms | MAX:  5.74020 ms | AVG:  5.46218 ms
   Solution 12   : MIN:  7.44160 ms | MAX:  7.72250 ms | AVG:  7.50184 ms
   Solution 13*2 : MIN:  7.4998 ms  | MAX: 12.9918 ms  | AVG:  8.99166 ms
   Solution 13*1 : MIN:  8.2866 ms  | MAX: 11.5506 ms  | AVG:  9.42119 ms
   Solution 2    : MIN:  8.54510 ms | MAX:  8.86510 ms | AVG:  8.64252 ms
   Solution 5    : MIN: 16.79520 ms | MAX: 17.75920 ms | AVG: 17.19089 ms
 
*1 F#Solution
*2 PreCompiledF#Solution
-----------------------------------------------------------------------------------
 
Tests run:        10
Iterations/test:  1000
Total executions: 10,000
 
Basic Test
 
   Solution 10   : MIN:   7.24660 ms | MAX:   7.52710 ms | AVG:   7.35793 ms
   Solution 1    : MIN:  10.38490 ms | MAX:  10.72050 ms | AVG:  10.53257 ms
   Solution 3    : MIN:  29.32480 ms | MAX:  29.52170 ms | AVG:  29.41673 ms
   Solution 8    : MIN:  39.87840 ms | MAX:  40.39850 ms | AVG:  40.05387 ms
   Solution 12   : MIN:  49.83420 ms | MAX:  50.64540 ms | AVG:  50.08076 ms
   Solution 2    : MIN:  73.15800 ms | MAX:  74.28660 ms | AVG:  73.62445 ms
   Solution 13*2 : MIN:  78.6023 ms  | MAX:  93.5715 ms  | AVG:  83.0625 ms
   Solution 13*1 : MIN:  88.4249 ms  | MAX:  93.8609 ms  | AVG:  90.9543 ms
   Solution 5    : MIN: 140.52420 ms | MAX: 148.83820 ms | AVG: 143.44404 ms
 

Bonus Test
 
   Solution 1    : MIN:   9.54210 ms | MAX:  10.12520 ms | AVG:   9.72796 ms
   Solution 10   : MIN:  10.84060 ms | MAX:  11.22680 ms | AVG:  10.94738 ms
   Solution 3    : MIN:  29.66310 ms | MAX:  30.13180 ms | AVG:  29.79657 ms
   Solution 8    : MIN:  54.05920 ms | MAX:  59.03390 ms | AVG:  56.43869 ms
   Solution 12   : MIN:  74.48620 ms | MAX:  75.41370 ms | AVG:  74.66223 ms
   Solution 13*2 : MIN:  78.3943 ms  | MAX:  98.3158 ms  | AVG:  83.7617 ms
   Solution 2    : MIN:  87.62770 ms | MAX:  92.20400 ms | AVG:  91.17817 ms
   Solution 13*1 : MIN:  87.9642 ms  | MAX:  99.3946 ms  | AVG:  91.8193 ms
   Solution 5    : MIN: 168.75850 ms | MAX: 201.33640 ms | AVG: 175.94069 ms

*1 F#Solution
*2 PreCompiledF#Solution
-----------------------------------------------------------------------------------

NOTE: with a small adjustment to the test to better reflect real-world examples, many of the solutions received a performance gain.

Lastly, if I have not correctly used a solution, please let me know so that I can update results. If any authors make changes, also please let me know and I will make sure that the tests are re-run and results published.

* old test: download[^]
* old v2 test: download[^]

Posted 30-Nov-16 3:25am

Graeme_Grant

Updated 2-Dec-16 0:20am

v6

Comments

Stuart Dootson 30-Nov-16 11:21am

Out of inerest, I converted your benchmarking code to F# to measure my F# solution & got the following timings:

Statistics
----------
Tests run:       10
Iterations/test: 100
-----------------------------------------------------------------------------
F# Solution - Basic: MIN = 2.6989, MAX = 3.2336, AVG = 2.78732
F# Solution - Bonus: MIN = 2.8122, MAX = 3.1169, AVG = 2.8664

Statistics
----------
Tests run:       10
Iterations/test: 1000
-----------------------------------------------------------------------------
F# Solution - Basic: MIN = 25.4413, MAX = 27.2665, AVG = 25.8091
F# Solution - Bonus: MIN = 28.5614, MAX = 30.4572, AVG = 29.0724

Graeme_Grant 30-Nov-16 11:38am

Send me a link to the F# code and I'll run it here so I can update the results. :)

Stuart Dootson 30-Nov-16 19:23pm

~~https://gist.github.com/studoot/f883c041cf6aaeaa1a752114963c16b7~~ You (of course) meant the timing code - silly me posted the link to my solution code... I'll get the timing harness code posted up when I can get to it...

Stuart Dootson 1-Dec-16 6:01am

Here we go - it's a Github gist

Easiest way for you to build it is probably create a new F# Console Application in VS2015, paste the code in that gist into the source file that's created by the template and then build/run the Release variant.

Graeme_Grant 1-Dec-16 6:37am

Cool... Compiled and ran first time! Never done F# before.

I set my 3+yr old MBP (Macbook Pro running Win10 in bootcamp) the same as the other speed tests and the results were as follows:

-----------------------------------------------------------------------------

Tests run:       10
Iterations/test: 100

Basic Test

  FsSolution   - Basic : MIN: 7.8449 ms | MAX 8.8849 ms | AVG 8.52395 ms
  *PreCompiled - Basic : MIN: 7.2831 ms | MAX 8.2619 ms | AVG 7.61122 ms

Bonus Test

  FsSolution   - Bonus : MIN: 7.7423 ms | MAX 8.4193 ms | AVG 7.92343 ms
  *PreCompiled - Bonus : MIN: 7.2196 ms | MAX 10.7562 ms | AVG 7.76979 ms

-----------------------------------------------------------------------------

Tests run:       10
Iterations/test: 1000

Basic Test

  FsSolution   - Basic : MIN: 77.7693 ms | MAX 80.7415 ms | AVG 78.8205 ms
  *PreCompiled - Basic : MIN: 72.5419 ms | MAX 77.3086 ms | AVG 73.9815 ms

Bonus Test

  FsSolution   - Bonus : MIN: 78.1498 ms | MAX 84.8175 ms | AVG 80.1699 ms
  *PreCompiled - Bonus : MIN: 73.0482 ms | MAX 76.4278 ms | AVG 74.1332 ms

-----------------------------------------------------------------------------

Once PIEBALDconsult finishes his bit I'll update the details with this table and add a link to the F# solution.

Stuart Dootson 1-Dec-16 7:52am

It gets wackier - that F# code was partly developed on a Windows 10 workstation (6 core Xeon, around 3GHz - that's where my previous times are from) and partly on my (8 year old) MBP, with a 2.4GHz Core2 Duo. In OS X, with Mono and Visual Studio Code (with the Ionide extension). The power of cross-platform .NET… Anyway - my Mac gave these timings:

Tests run:       10
Iterations/test: 100
-----------------------------------------------------------------------------

Basic Test

  FsSolution - Basic : MIN: 8.4053 ms | MAX 14.0585 ms | AVG 10.4869 ms
  PreCompiledFsSolution - Basic : MIN: 7.2594 ms | MAX 10.2251 ms | AVG 8.18209 ms


Bonus Test

  FsSolution - Bonus : MIN: 8.2866 ms | MAX 11.5506 ms | AVG 9.42119 ms
  PreCompiledFsSolution - Bonus : MIN: 7.4998 ms | MAX 12.9918 ms | AVG 8.99166 ms

Tests run:       10
Iterations/test: 1000
-----------------------------------------------------------------------------

Basic Test

  FsSolution - Basic : MIN: 88.4249 ms | MAX 93.8609 ms | AVG 90.9543 ms
  PreCompiledFsSolution - Basic : MIN: 78.6023 ms | MAX 93.5715 ms | AVG 83.0625 ms


Bonus Test

  FsSolution - Bonus : MIN: 87.9642 ms | MAX 99.3946 ms | AVG 91.8193 ms
  PreCompiledFsSolution - Bonus : MIN: 78.3943 ms | MAX 98.3158 ms | AVG 83.7617 ms

Graeme_Grant 1-Dec-16 8:02am

Looking at the averages, the MBP timings look inline with what I am seeing.

PIEBALDconsult 30-Nov-16 17:47pm

Cool. If you used the code I have attached to mine (3?), then it's probably not the latest because I forgot to update the attached code file.
And, if multiple runs are made I hope you didn't throw out the instance between runs.
I have updated my code and Article.

Graeme_Grant 30-Nov-16 19:58pm

There is a download link above so you can see how I have used it. ;)

PIEBALDconsult 30-Nov-16 22:31pm

Yes, and VS 2010 can't open it. :D Then I edited the Solution and Project files so VS 2010 can open it and it doesn't understand the nonsense code.

I also had trouble posting earlier. I have updated my code and Article.

Graeme_Grant 1-Dec-16 0:56am

Ah ... you're not using C#6 in VS2015 ... What version of C# are you using then in VS2010? You could use VS2015express - it is free... ;)

PIEBALDconsult 1-Dec-16 1:52am

I don't remember.
It's _all_ free with MSDN from my employer. :cool:
I have downloaded and installed VS 2015 on a junker system and now I'm trying to validate your tests.
I note that solution 10 doesn't meet the requirements.

Graeme_Grant 1-Dec-16 2:08am

LMAO...

I rolled a Dot Net 4.0 version for you using VS2013

Solution 10? It sure does pass the test. :)

PIEBALDconsult 1-Dec-16 2:10am

Not if it doesn't replace "gotten" with "become".

Graeme_Grant 1-Dec-16 2:22am

Ah, the Badword filter was set on the bonus and not the basic - thanks. Mistake when setting up the testing. I'll update the test solution.

Add it to the basic and it will pass with flying colours...

static List<badword10> basic10BadWords = new List<badword10>()
{
new BadWord10("POOPHEAD","P**PHEAD",true),
new BadWord10("poophead","p**phead",false),
new BadWord10("PHB","boss",true),
new BadWord10("gotten","become",true)
};

static List<badword10> bonus10BadWords = new List<badword10>()
{
new BadWord10("POOP*","P**P",true),
new BadWord10("poop*","p**p",false),
new BadWord10("PHB!","boss",true)
};

Graeme_Grant 1-Dec-16 2:27am

Send me your corrections and I'll added them to the next update.

PIEBALDconsult 1-Dec-16 8:41am

Was up until 00:30 trying things. Will get back on it tonight.

PIEBALDconsult 1-Dec-16 18:30pm

Some unbalanced tests as well. Such as having only the "bonus" test for mine (3) compared to the "basic" tests for the others.
I'm about to make a basicReplacer and see how it goes.

static PIEBALD.Type.ReplaceOmatic replacerBasic = new PIEBALD.Type.ReplaceOmatic
(
new Tuple<string, string="">("PHB", "boss"),
new Tuple<string, string="">("gotten", "become"),
new Tuple<string, string="">("poophead", "p**phead")
);

Graeme_Grant 1-Dec-16 18:43pm

How do you see them as unbalanced? All tests are running the same code on both the Basic and Bonus tests.

PIEBALDconsult 1-Dec-16 18:49pm

But with different configurations.

Graeme_Grant 1-Dec-16 18:57pm

Configuration, yes maybe. Code, no.

PIEBALDconsult 1-Dec-16 19:01pm

Maybe call the Garbage Collector between tests as well so bloat in one solution doesn't adversely affect the others.

Graeme_Grant 1-Dec-16 19:21pm

Does calling the GC inbetween make a difference? If it does, post the change and I'll add that too. However, from my testing, it does not appear to make a difference.

Stuart Dootson 2-Dec-16 12:52pm

So, I did a Haskell version of this challenge (see below) and used the Criterion benchmarking package. This produces an HTML report, showing the average times per iteration. Here's the one for my Haskell solution.:

It's a neat library/tool...

Graeme_Grant 2-Dec-16 13:20pm

Cool... Have you seen the next challenge?

Solution 17

A Haskell solution… Essentially a rough translation of my F# one!

{-# LANGUAGE OverloadedStrings, TemplateHaskell, QuasiQuotes #-}

module Main where

import Criterion.Main
import Data.CaseInsensitive (mk, CI)
import Data.Char as C
import Data.List (find)
import Data.Maybe (fromJust, isJust)
import Data.Stringable
import Data.Text (Text)
import qualified Data.Text as T
import Text.Regex.PCRE.Heavy

data FilterSpec = FilterSpec { badPattern::Text, goodWord::Text }
type FilterSpecs = [FilterSpec]
type Filter = Text -> Maybe Text
type Filters = [Filter]

double :: Text -> Text
double match = (match `T.append` match)

wordRegex = [re|\w+|]

stringsAreEqual ignoreCase left right
  | ignoreCase && (mk left) == (mk right) = True
  | (not ignoreCase) && left == right = True
  | otherwise = False

adjustCase badBit actualBit goodWord ignoreCase
  | not ignoreCase = goodWord
  | actualBit == (T.toUpper badBit) = T.toUpper goodWord
  | actualBit == (T.toLower badBit) = T.toLower goodWord
  | (T.head actualBit) == (C.toUpper $ T.head badBit) = makeFirstCharUpper
  | otherwise = goodWord
  where
    upperFirstChar = C.toUpper $ T.head goodWord
    makeFirstCharUpper = T.cons upperFirstChar (T.tail goodWord)

prefixFilter :: Text -> Text -> Bool -> Text -> Maybe Text
prefixFilter badWord goodWord ignoreCase word
  | stringsAreEqual ignoreCase prefix badWord = Just $ T.append caseAdjusted rest
  | otherwise = Nothing
  where
    (prefix, rest) = T.splitAt (T.length badWord) word
    caseAdjusted = adjustCase badWord prefix goodWord ignoreCase

suffixFilter :: Text -> Text -> Bool -> Text -> Maybe Text
suffixFilter badWord goodWord ignoreCase word
  | stringsAreEqual ignoreCase suffix badWord = Just $ T.append rest caseAdjusted
  | otherwise = Nothing
  where
    (rest, suffix) = T.splitAt (T.length word - T.length badWord) word
    caseAdjusted = adjustCase badWord suffix goodWord ignoreCase
    
wordFilter :: Text -> Text -> Bool -> Text -> Maybe Text
wordFilter badWord goodWord ignoreCase word
  | stringsAreEqual ignoreCase word badWord = Just goodWord
  | otherwise = Nothing
  where
    caseAdjusted = adjustCase badWord word goodWord ignoreCase

nullFilter :: Text -> Maybe Text
nullFilter = Just

createFilter :: FilterSpec -> Filter
createFilter spec = createFilterHelper (badPattern spec) (goodWord spec) True
  where
    createFilterHelper spec goodWord ignoreCase
      | (T.last spec) == '!' = createFilterHelper (T.init spec) goodWord False
      | (T.last spec) == '*' = prefixFilter (T.init spec) goodWord ignoreCase
      | (T.head spec) == '*' = suffixFilter (T.tail spec) goodWord ignoreCase
      | otherwise = wordFilter spec goodWord ignoreCase

createFilters specs = (map createFilter specs) ++ [nullFilter]

applyFilters :: Filters -> Text -> Text
applyFilters filters word = fromJust activeFilter
  where
    filteredWords = map ($ word) filters
    activeFilter = fromJust $ find isJust filteredWords 

filterWords :: Filters -> Text -> Text
filterWords filters text = gsub wordRegex sanitiseWord text
  where
    sanitiseWord word = applyFilters filters word

Posted 2-Dec-16 6:55am

Stuart Dootson

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Peter Leow · Accepted Answer · 2016-11-26T05:14:00

I was reading up on Python online, then chanced upon this ~~pooping~~ oops I mean coding challenge, thought why not try this out on Python. Here it is fresh from the ~~loo~~ oops again I mean oven.

Python

"""
poop.py

by Peter Leow the pooper

"""

import re

def functionStartWithPoop(m):
    wordFound = m.group(0)

    if wordFound[:5].lower()=='poop*':
        wordRepl = wordFound[0] + '**' + wordFound[3] + wordFound[5:]
    else: #wordFound[:4].lower()=='poop':
        wordRepl = wordFound[0] + '**' + wordFound[3] + wordFound[4:] 

    return wordRepl

def functionEndWithPoop(m):
    wordFound = m.group(0)

    if wordFound[-5:].lower()=='*poop':
        wordRepl = wordFound[:-5] + wordFound[-4] + '**' + wordFound[-1]
    else: #wordFound[-4:].lower()=='poop':
        wordRepl = wordFound[:-4] + wordFound[-4] + '**' + wordFound[-1]

    return wordRepl

def main():
    originalSentence = '''
    poop*ing is in front of make*poop.
    Whether poop* or *poop, there are just pOoP!
    A POOPHEAD cannot change but an exclaimed POOPHEAD! can.'''

    print('Before:')
    print(originalSentence)
    print()
    print('After:')
    
    # Without ! ending
    patternStartWithPoop=r'(?<!\S)poop\*?[\S]*'
    patternEndWithPoop=r'[\S]*\*?poop(?=[?!,.;]?$|[?!,.;]?\s+)'

    # with ! ending
    patternStartWithPoopEndWithExclamation = r'(?<!\S)poop\*?[\S]*!(?=\s|$)'
    patternEndWithPoopAndExclamation=r'[\S]*\*?poop!(?=[?!,.;]?$|[?!,.;]?\s+)'

    # Case sensitive
    filteredSentence = re.sub(patternStartWithPoop, functionStartWithPoop, originalSentence, flags=0)
    #print(filteredSentence)
    filteredSentence = re.sub(patternEndWithPoop, functionEndWithPoop, filteredSentence, flags=0)
    #print(filteredSentence)

    # Case ignorance
    filteredSentence = re.sub(patternStartWithPoopEndWithExclamation, functionStartWithPoop, filteredSentence, flags=re.IGNORECASE)
    #print(filteredSentence)
    filteredSentence = re.sub(patternEndWithPoopAndExclamation, functionEndWithPoop, filteredSentence, flags=re.IGNORECASE)
    print(filteredSentence)
  
main()

Try this out at Coding challenge bad word filter | Python Fiddle[^] and you should see the following OUTPOOP:

Before:

    poop*ing is in front of make*poop.
    Whether poop* or *poop, there are just pOoP!
    A POOPHEAD cannot change but an exclaimed POOPHEAD! can.

After:

    p**ping is in front of makep**p.
    Whether p**p or p**p, there are just p**P!
    A POOPHEAD cannot change but an exclaimed P**PHEAD! can.

I have ignored 'PHB' and 'gotten' as they are simply too trivial.

Jon McKee · Accepted Answer · 2016-11-26T12:55:00

Not sure on the rules, just wanted to add this for fun. I love code challenges.

This solution, in contrast to my other one, reduces the number of strings nuked during processing. It also uses only a single Regex call to process all bad words. Even though I construct an additional dictionary, my diagnostic tool shows me coming in at less memory consumed than my previous solution. This solution does everything the previous did, plus adds optional casing.

C#

public class BadWordFilter
{
    public static string Replace(string input, 
        IDictionary<string, string> badWordMap)
    {
        if (string.IsNullOrWhiteSpace(input))
            throw new ArgumentException(nameof(input),
                "String cannot be null, empty, or whitespace.");

        Dictionary<string, string> idMap = new Dictionary<string, string>();
        StringBuilder pattern = new StringBuilder();
        int idCounter = 0;
        //For each bad word pair, create an ID mapped to the value and construct the match pattern using the ID and key
        foreach (KeyValuePair<string, string> badWord in badWordMap)
        {
            string id = "ID" + idCounter++;
            idMap.Add(id, badWord.Value);
            ConstructMatchPattern(badWord.Key, id, pattern);
        }
        //Remove the first | from the pattern
        pattern.Remove(0, 1);

        Regex filter = new Regex(pattern.ToString(), RegexOptions.IgnoreCase);
        string[] groupNames = filter.GetGroupNames();
        MatchEvaluator evaluator = match =>
        {
            string replacement = "";
            //Find which group was matched and retrieve the replacement value
            for (int i = 1; i < groupNames.Length; i++)
                if (match.Groups[groupNames[i]].Success)
                {
                    replacement = idMap[groupNames[i]];
                    break;
                }

            //Handle casing
            if (replacement.StartsWith("!"))
            {
                replacement = replacement.Remove(0, 1);
                //All caps check
                if (match.Value == match.Value.ToUpper())
                    replacement = replacement.ToUpper();
                //First letter caps check
                else if (match.Value[0] == char.ToUpper(match.Value[0]))
                    replacement = char.ToUpper(replacement[0]) + replacement.Substring(1);
            }
            return replacement;
        };

        return filter.Replace(input, evaluator);
    }

    private static void ConstructMatchPattern(string badWord, string id, 
        StringBuilder pattern)
    {
        if (string.IsNullOrWhiteSpace(badWord))
            return;
        int patternLength = pattern.Length;
        pattern.Append($@"|(?<{id}>(?:\b){badWord.Trim('*')}");
        if (badWord.StartsWith("*"))
            pattern.Insert(patternLength + id.Length + 11, @"\w*", 1);
        if (badWord.EndsWith("*"))
            pattern.Append(@"\w*");
        pattern.Append(')');
    }
}

Used the same way as the previous example, with the inclusion of the "!" option.

C#

static void Main(string[] args)
{
    Dictionary<string, string> badWords = new Dictionary<string, string>()
    {
        {"poop*", "p**phead"},
        {"*HB", "boss"},
        {"gotten", "become"},
        {"*crap*", "taco supreme"}
    };
    string input = "My PHB is such a poophead. It's gotten worse since his promotion. In fact you might call him a supercraphead.";
    string filteredInput = BadWordFilter.Replace(input, badWords);
    Console.WriteLine(filteredInput);
    Console.ReadKey();
}

Replacing {"*HB", "boss"} with {"*HB", "!boss"} will yield BOSS instead of boss since it cases to its key's match.

Coding challenge: bad word filter

17 solutions

Solution 1

Solution 4

Solution 5

Solution 8

Solution 9

Solution 14

Solution 16

Solution 2

Solution 3

Solution 6

Solution 7

Solution 10

Solution 11

Solution 12

Solution 13

Solution 15

Solution 17

Add your solution here

Preview 0

Existing Members

...or Join us