Introduction
This article was inspired by the Weekly Code Project Challenge[^] to:
- given a random (or not so random) string, generate unique anagrams
- hook into a spell checking service of your choice and only return anagrams that are actual words
- need to be a fast solution
The purpose of this article is to take you on the journey undertaken and the discoveries made along the way to the final solution for the Weekly Challenge.
The solution will be a console app, however the code could be applied to any other type of app - WinForm, WPF, UWP, Xamarin Native/Forms, ASP.NET, MVC, and ASP.Core.
Disclaimer: I don't claim this to be the best or fastest solution but I feel that it is pretty close. The challenge has not concluded at the time of posting of this article, so someone may have found a faster solution.
TL;DR
For those who want to see the results, you can jump directly to them[^].
What is an Anagram?
Firstly, we need to look at the definition of an Anagram. According to Oxford Dictionaries[^]: "A word, phrase, or name formed by rearranging the letters of another, such as "spar", formed from "rasp"."
1. Build List of Anagrams
To generate anagrams, you need to find all the combinations of the letters. There are many different ways of doing this. Here is one:
static class Program
{
private static object elapsed;
private static void Main(string[] args)
{
var st = new Stopwatch();
var elapsed = default(TimeSpan);
var prime = "a".Anagrams();
while (true)
{
st.Reset();
Console.Write("\r\nSearch Word: ");
int count = 0;
var lookupWord = Console.ReadLine().Trim().ToLower();
if (lookupWord.Length == 0) break;
st.Start();
var anagrams = lookupWord.Anagrams();
elapsed = st.Elapsed;
st.Stop();
foreach (var anagram in anagrams)
Console.WriteLine($"{++count,2}: {anagram}");
Console.WriteLine($"Time: {elapsed.TotalMilliseconds} ms");
}
}
static IEnumerable<string> Anagrams(this string word)
=> Combinations(word.ToCharArray().ToList())
.Select(x => new string(x.ToArray())).Distinct();
static IEnumerable<list<char>> Combinations(List<char> items)
{
if (items.Count == 0) yield return new List<char>();
var copy = new List<char>(items);
foreach (var i in items)
{
copy.Remove(i);
foreach (var rest in Combinations(copy))
{
rest.Insert(0, i);
yield return rest;
}
copy.Insert(0, i);
}
}
}
How Did We Do?
Passed 2 out of 3 requirements. My target was 100%, so it fails:
- Generate anagrams - passed
- Valid (spell checked) words - fail
- Speed: Fast - passed barely
2. Using the WPF Spell Checker
I know that WPF has support for a spell checker. The trick with the control is that it needs to run in a STA Thread[^].
The following uses the WPF spell checker:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Windows.Controls;
using System.Windows.Documents;
namespace FastAnagrams
{
static class Program
{
private static void Main()
{
var st = new Stopwatch();
var elapsed = default(TimeSpan);
var sh = new SpellCheckHelper();
IEnumerable<string> anagrams = null;
while (true)
{
st.Reset();
Console.Write("\r\nSearch Word: ");
int count = 0;
var lookupWord = Console.ReadLine().Trim().ToLower();
if (lookupWord.Length == 0) break;
st.Start();
sh.BeginBulkFind(Anagrams(lookupWord));
anagrams = sh.EndBulkFind();
elapsed = st.Elapsed;
st.Stop();
Console.WriteLine($"Time: {elapsed.TotalMilliseconds} ms");
Console.WriteLine($"{anagrams.Count():N0} valid anagrams\r\n");
foreach (var anagram in anagrams)
Console.WriteLine($"{++count,2}: {anagram}");
}
}
public class SpellCheckHelper
{
private readonly object sync = new object();
private List<string> validWords = new List<string>();
private IEnumerable<string> words;
private bool inProgress;
private bool isComplete;
public bool IsComplete => isComplete;
public void BeginBulkFind(IEnumerable<string> words)
{
this.words = words;
validWords.Clear();
var backgroundThread
= new Thread(new ThreadStart(DoBulkCheck));
backgroundThread.SetApartmentState(ApartmentState.STA);
backgroundThread.Start();
isComplete = false;
inProgress = true;
}
public List<string> EndBulkFind()
{
lock (sync) if (!isComplete) Monitor.Wait(sync);
isComplete = false;
inProgress = false;
return validWords;
}
private void DoBulkCheck()
{
try
{
var tb = new TextBox();
tb.SpellCheck.IsEnabled = true;
foreach (var word in words)
{
int index = 0;
tb.Text = word;
var valid = true;
while ((index =
tb.GetNextSpellingErrorCharacterIndex
(index, LogicalDirection.Forward)) != -1)
if (!string.IsNullOrEmpty
(tb.Text.Substring(index,
tb.GetSpellingErrorLength(index))))
{
valid = false;
break;
}
if (valid) validWords.Add(word);
}
}
finally
{
lock (sync)
{
isComplete = true;
Monitor.PulseAll(sync);
}
}
}
}
}
How Did We Do?
Passed 2 out of 3 requirements. Performance was critical, so it fails at 38+ seconds:
- Generate anagrams - passed
- Valid (spell checked) words - passed
- Fast - failed
3. Check for Valid Words Using a Custom Spell Checker
Checking each word using the WPF spell checker was too slow. So let's load a List of words and run a check against our list.
A quick search finds the list of compiled words: GitHub - 350k+ English words[^].
Next, we require a fast method of checking if the word are in a List
. The Generic HashSet<T> Class[^] has "is based on the model of mathematical sets and provides high-performance set operations similar to accessing the keys of the Dictionary<TKey, TValue>
or HashTable
collections". Here is the revised version:
static class Program
{
private static void Main(string[] args)
{
var st = new Stopwatch();
var elapsed = default(TimeSpan);
var anagrams = new List<string>();
st.Start();
LoadWords("words.txt");
st.Stop();
Console.WriteLine($"Loading Time: {st.ElapsedMilliseconds} ms\r\n");
var prime = "a".Anagrams();
while (true)
{
st.Reset();
Console.Write("\r\nSearch Word: ");
int count = 0;
anagrams.Clear();
var lookupWord = Console.ReadLine().Trim().ToLower();
if (lookupWord.Length == 0) break;
st.Start();
var words = Anagrams(lookupWord);
foreach (var item in words)
{
if (wordList.Contains(item))
anagrams.Add(item);
}
elapsed = st.Elapsed;
st.Stop();
Console.WriteLine($"Time: {elapsed.TotalMilliseconds} ms");
Console.WriteLine($"{anagrams.Count:N0} valid anagrams out of {words.Count():N0}\r\n");
foreach (var anagram in anagrams)
Console.WriteLine($"{++count,2}: {anagram}");
}
}
static HashSet<string> wordList = new HashSet<string>();
static void LoadWords(string filename)
{
string word = string.Empty;
using (StreamReader sr = File.OpenText(filename))
while ((word = sr.ReadLine()) != null)
if (word.Length > 0 && !word.Contains("'") && !word.Contains("\""))
wordList.Add(word);
}
}
How Did We Do?
Passed 3 out of 3 requirements with reasonable performance at 2.75 ms:
- Generate anagrams - passed
- Valid (spell checked) words - passed
- Speed: fast - passed
This is a viable solution. But can we make it even faster?
4. Changing Approach to the Problem
In the first 3 attempts, we are generating a list of possible anagrams that require validation against a dictionary or list of words. This takes time to generate and time to validate each word.
A far simpler approach would be to pre-generate sets of anagrams and look for each set. This way, we remove the time to generate and validate. We can store the anagram sets in a Generic Dictionary<TKey, TValue> Class[^] and calculate a key that is common to each anagram in the set.
Calculating a common key went through a few iterations and the winning solution was based on "all anagrams of a word are equal length and contained the same characters".
So, for example, "teaser", "eaters", "easter", "asteer", "aretes", "saeter", "seater", "staree", "reseat" all have the same key "aeerst".
static string GetKey(string w)
{
var chars = w.ToCharArray();
Array.Sort(chars);
return new string(chars);
}
We now don't need to generate a list of anagrams. The word list that will be loaded contains all the words that we are going to need. Pre-building our anagram lists and storing in a Dictionary<TKey, TValue>
with our generated key means we can take the word entered from our user, generate a key, and get a list of valid anagrams back in bulk if it is a valid word.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
namespace FastAnagrams
{
static class Program
{
private static void Main(string[] args)
{
var st = new Stopwatch();
var elapsed = default(TimeSpan);
IEnumerable<string> anagrams = null;
st.Start();
LoadWords("words.txt");
st.Stop();
Console.WriteLine($"Loading Time: {st.ElapsedMilliseconds} ms\r\n");
var prime = ContainsWord("a");
while (true)
{
st.Reset();
Console.Write("\r\nSearch Word: ");
int count = 0;
var lookupWord = Console.ReadLine().Trim().ToLower();
if (lookupWord.Length == 0) break;
st.Start();
var key = lookupWord.GetKey();
if (ContainsWord(key)) anagrams = GetResults(key);
elapsed = st.Elapsed;
st.Stop();
foreach (var anagram in anagrams)
Console.WriteLine($"{++count,2}: {anagram}");
Console.WriteLine($"\r\nTime: {elapsed.TotalMilliseconds} ms");
Console.WriteLine($"{count:N0} anagrams for {lookupWord}");
}
}
static bool ContainsWord(string key) => LookupTable.ContainsKey(key);
static IEnumerable<string> GetResults(string key) => LookupTable[key];
static Dictionary<string, list<string>> LookupTable
= new Dictionary<string, list<string>>();
static void LoadWords(string filename)
{
var wordKeys = new Dictionary<string, int>();
string word, key;
List<string> wordList = null;
using (StreamReader sr = File.OpenText(filename))
{
while ((word = sr.ReadLine()) != null)
{
if (word.Length > 0 && !word.Contains("'") && !word.Contains("\""))
{
key = word.GetKey();
if (!LookupTable.ContainsKey(key))
{
wordList = new List<string>();
LookupTable.Add(key, wordList);
}
else
{
wordList = LookupTable[key];
}
wordList.Add(word);
}
}
}
}
}
}
How Did We Do?
Passed 3 out of 3 requirements with a major improvement at 0.0148 ms:
- Generate anagrams - passed
- Valid (spell checked) words - passed
- Speed: faster - passed
This is a much better solution. But can we make it even faster?
5. Performance Tweak
Right now, you are probably thinking that we can't tweak the performance any further. But with a bit of experimentation, it is possible. Currently, the code makes 3 function calls:
- Generate the key
- Check if the key exists
- Return the list of anagrams
By combining all 3 functions into one, we gain another major jump in performance. The final version is now:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
namespace FastAnagrams
{
static class Program
{
private static void Main(string[] args)
{
var st = new Stopwatch();
var elapsed = default(TimeSpan);
IEnumerable<string> anagrams = null;
st.Start();
LoadWords("words.txt");
st.Stop();
Console.WriteLine($"Loading Time: {st.ElapsedMilliseconds} ms\r\n");
var prime = Find("a");
while (true)
{
st.Reset();
Console.Write("\r\nSearch Word: ");
int count = 0;
var lookupWord = Console.ReadLine().Trim().ToLower();
if (lookupWord.Length == 0) break;
st.Restart();
anagrams = Find(lookupWord);
elapsed = st.Elapsed;
foreach (var anagram in anagrams)
Console.WriteLine($"{++count,2}: {anagram}");
Console.WriteLine($"\r\nTime: {elapsed.TotalMilliseconds} ms");
Console.WriteLine($"{count:N0} anagrams for {lookupWord}");
}
}
public static string GetKey(this string word)
{
var chars = word.ToCharArray();
Array.Sort(chars);
return new string(chars);
}
static IEnumerable<string> Find(string lookupWord)
{
var chars = lookupWord.ToCharArray();
Array.Sort(chars);
var key = new string(chars);
if (LookupTable.ContainsKey(key))
foreach (var item in LookupTable[key])
yield return item;
else
yield break;
}
static Dictionary<string, list<string>> LookupTable
= new Dictionary<string, list<string>>();
static void LoadWords(string filename)
{
var wordKeys = new Dictionary<string, int>();
string word, key;
List<string> wordList = null;
using (StreamReader sr = File.OpenText(filename))
{
while ((word = sr.ReadLine()) != null)
{
if (word.Length > 0 && !word.Contains("'") && !word.Contains("\""))
{
key = word.GetKey();
if (!LookupTable.ContainsKey(key))
{
wordList = new List<string>();
LookupTable.Add(key, wordList);
}
else
{
wordList = LookupTable[key];
}
wordList.Add(word);
}
}
}
}
}
}
How Did We Do?
Passed 3 out of 3 requirements!
- Generate anagrams - passed
- Valid (spell checked) words - passed
- Speed: on fire! - passed
We have a winner! From a starting point of 2.75 ms (validated words) to 0.0003 ms.
6. Special Mention - SQLite
I expected SQLite[^] to be a bit slower than an in-memory lookup table. With my Mackbook Pro and its SSD (Solid State Drive), the results were quite surprising - they are just as fast as the Optimised Dictionary method (5.)!
using System;
using System.Collections.Generic;
using System.Data.SQLite;
using System.Diagnostics;
using System.IO;
using System.Text;
namespace FastAnagrams
{
static class Program
{
private static void Main(string[] args)
{
var st = new Stopwatch();
var elapsed = default(TimeSpan);
IEnumerable<string> anagrams = null;
st.Start();
InitDB();
st.Stop();
while (true)
{
st.Reset();
Console.Write("\r\nSearch Word: ");
int count = 0;
var lookupWord = Console.ReadLine().Trim().ToLower();
if (lookupWord.Length == 0) break;
st.Restart();
anagrams = Find(lookupWord);
elapsed = st.Elapsed;
foreach (var anagram in anagrams)
Console.WriteLine($"{++count,2}: {anagram}");
Console.WriteLine($"\r\nTime: {elapsed.TotalMilliseconds} ms");
Console.WriteLine($"{count:N0} anagrams for {lookupWord}");
}
}
public static string GetKey(this string word)
{
var chars = word.ToCharArray();
Array.Sort(chars);
return new string(chars);
}
static IEnumerable<string> Find(string lookupWord)
{
var chars = lookupWord.ToCharArray();
Array.Sort(chars);
using (var cmd =
new SQLiteCommand($"SELECT word FROM {tableName} WHERE '" +
new string(chars) + "' = wordkey", dbConn))
using (SQLiteDataReader coreReader = cmd.ExecuteReader())
while (coreReader.Read())
yield return coreReader[0].ToString();
}
static SQLiteConnection dbConn;
static SQLiteTransaction dbTrans;
static string wordFile = "words.txt", dbFile = "words.db",
conn, tableName = "WordLookup",
createCmd = $"CREATE TABLE {tableName} (wordkey TEXT, word TEXT)",
insertCmd = $"INSERT INTO {tableName} (wordkey, word) values (@wordkey, @word)";
static void InitDB()
{
CheckAndOpenDb(dbFile);
CheckAndCreateTable();
}
private static void CheckAndOpenDb(string file)
{
conn = $"Data Source={file};Version=3;DateTimeKind=Utc";
if (!File.Exists(file))
{
Console.WriteLine("Creating Database...");
SQLiteConnection.CreateFile(file);
}
else
{
Console.WriteLine("Starting DB...");
}
dbConn = new SQLiteConnection(conn);
dbConn.Open();
}
private static void CheckAndCreateTable()
{
if (!TableExists(tableName))
{
Console.WriteLine("Creating Table...");
ExecuteCommand(createCmd);
LoadWords();
}
}
static void LoadWords()
{
string word, key;
Console.WriteLine("Copying Word.Txt to Table...");
BeginTrans();
using (StreamReader sr = File.OpenText(wordFile))
{
while ((word = sr.ReadLine()) != null)
{
if (word.Length > 0 &&
!word.Contains("'") &&
!word.Contains("\""))
{
key = word.GetKey();
ExecuteCommand(insertCmd,
new Dictionary<string, object>
{
{ "@wordkey", key },
{ "@word", word }
});
}
}
}
CommitTrans();
}
static bool ExecuteCommand
(string query, Dictionary<string, object> fields = null)
{
bool success = false;
using (SQLiteCommand cmd = Command(query))
{
if (fields != null && fields.Count != 0)
foreach (var key in fields.Keys)
cmd.Parameters.AddWithValue(key, fields[key]);
cmd.ExecuteNonQuery();
success = true;
}
return success;
}
static SQLiteCommand Command(string sql)
{
return new SQLiteCommand(sql, dbConn);
}
static bool TableExists(string name)
=> Command($"SELECT name FROM sqlite_master
WHERE type='table' AND name='{name}'")
.ExecuteReader()
.HasRows;
static void BeginTrans()
{
dbTrans = dbConn.BeginTransaction();
}
static void CommitTrans()
{
dbTrans.Commit();
}
}
}
How Did We Do?
Passed 3 out of 3 requirements.
- Generate anagrams - passed
- Valid (spell checked) words - passed
- Speed: on fire! - passed
Result Summary
Loading words...
3. Anagram using HashSet
4. Keyed Dictionary
5. Keyed Dictionary Optimised
6. SqLite Lookup
Running Tests...
Name Count Timing (ms) % Var
----------------------------------------------------------------------
1. Anagram Generator 360 3.83040 0.00000
2. Anagram WPF Spell Check 3 38,484.62290 -99.99005
3. Anagram using HashSet 9 2.75240 39.16582
4. Keyed Dictionary 9 0.01480 25,781.08108
5. Keyed Dictionary Optimised 9 0.00030 1,276,700.00000
6. SqLite Lookup 9 0.00030 1,276,700.00000
----------------------------------------------------------------------
* Timings are dependant on the config of the PC running them.
-- press any key to exit --
The WPF word dictionary appears to be smaller than the word list that is being used for tests 3 to 6. Only 3 anagrams found for the word "teaser", 9 for tests 3 to 9. The SqLite test threw me at first but was independently checked and verified. All testing was done in Release Mode and run from the command prompt.
Closing Comments
The journey was an interesting one. The lesson learned from this journey is don't discredit possible solutions without validation. I would never have thought that the SqLite solution would be as fast as it was given the configuration of my computer. In a real world scenario, the results will be different dependent on the system that it is running on.
Included Downloadable Code
The code included with this article is shared and can be run individually or together as a single test solution.
To perform optimal testing, as was done for this article, compile the code in Release Mode, open a command prompt and run.