Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Python

Palindromes - Fun with Words: Part 1

5.00/5 (5 votes)
18 Jun 2018CPOL7 min read 14.3K  
Games/Fun with Programming

Introduction

In my continuing effort to learn Python, I thought it would be a productive exercise to convert a Python program to C#. Peter Norvig's Python Palindrome Generator provided the ideal starting point. In order for readers to move easily between this C# code and Mr. Norvig's Python code, I've retained the spelling of the methods in his Python code. As a result, analysis tools like FxCop will warn about method names like read_dict which would normally be ReadDictionary in C#.

I've been fascinated by palindromes - phrases that spell the same thing forward and backward - since my youth when someone showed me the famous "Madam, I'm Adam" and "A man, a plan, a canal, Panama!" palindromes. Crafting them by hand is laborious, so when I found out I could let a computer do the work, I was intrigued. I first saw a computer-generated palindrome in Chapter 4 of Peter van der Linden's 1994 "Expert C Programming" where he shows a nearly 300 word palindrome. Mr. van der Linden's book did not provide the code to generate the palindrome, but rather suggested it as an exercise. So my early attempts at writing a palindrome generator in C were awkward and inefficient. Fortunately, Mr. Norvig's algorithm is elegant and straightforward to implement in C#. Let's see how it works.

The Dictionary

Before we get into the algorithm details, I want to describe the read_dict() method. It reads the dictionary file containing one word or phrase per line into two List<String>. One list (_fw) has the words in their original order and the second list (_bw) has the words reversed, so "cat' is stored as "tac". The reverse() method is used to reverse a String. I've made a slight modification to Mr. Norvig's read_dict() method allowing you to read from given starting and ending line numbers. Incidentally, this demonstrates an often overlooked feature of C# called optional arguments. I've provided a much smaller version of the npDict.txt dictionary that Mr. Norvig supplies called npdictPeterVanDerLinden.txt. The words in npdictPeterVanDerLinden.txt are from Mr. van der Linden's book and are optimized to generate several palindromes quickly. (Granted, the palindromes generally make no sense, but the point of this program is the algorithm.)

Set the name of the dictionary file you wish to use in App.config, for example:

XML
<setting name="Dictionary" serializeAs="String">
	<value>C:\MyDocuments\Palindrome\Dictionary\npdict.txt<value>
</setting>

The Algorithm

The PalPython class has two lists of String, one for building the left-hand side and one for building the right-hand side of the palindrome.

XML
public List<String> left { get; set; }
public List<String> right { get; set; }

In the init() method, we first create the "seed" words consisting of "A man, a plan, a canal, Panama". (As the program runs, we will ignore the punctuation and spaces. See the canonical() method to see how this is done. We also actually store the right-hand side in reverse, and with the letters in the words reversed as well.) We put the "amanaplan" in left, and "acanalpanama" in right.

Next, we find the substring that is not matched by text on the other side. In the table below, that substring (itself a palindrome in this case), aca, is on the right and is shown in bold in the following table:

left right
amanaplan acanalpanama

Next, we search the dictionary for words that begin with the substring aca and add it to the left using the extend() method. Let's assume we chose the phrase a catnip, which as mentioned previously is stored without the blank as acatnip. We add that to the left side and we again find the substring that is not matched by text on the other side. In the table below, that substring, tnip, is on the left and is shown in bold in the following table:

left right
amanaplanacatnip acanalpanama

Since tnip is not a palindrome, we need to find a words in the dictionary that end with pint (the reverse of tnip). Let's assume we fetched apint from the dictionary. Add it to the right using the extend() method, observe that the substring, simply the letter a, is a palindrome, and voilĂ , we have a palindrome (albeit a silly one) : A man, a plan, a catnip, a pint, a canal, Panama!

left right
amanaplanacatnip apintacanalpanama

But what if we don't get a palindrome after we fetch the second word? This is where Mr. Norvig's excellent backtrack() method comes into play. Assume our dictionary has acatalpa, acaret and abater (A Catalpa is a type of tree; a Caret is the symbol above the number 6 on your keyboard, and a Bater is a tannery worker.) Now when we try to match the substring aca the two words fetched that match aca are acatalpa and acaret and are placed on the stack (described below). In the table below, we have added acatalpa to the left (see the extend() method for details).

left right
amanaplanacatalpa acanalpanama

But we have no words in the dictionary that end with aplat (the reverse of the substring, talpa). So we backtrack, removing acatalpa from left and try the next one, acaret, by popping it off the stack in search() and adding it to the left with extend().

left right
amanaplanacaret acanalpanama

Now we look for words in the dictionary that end with ter (the reverse of the substring ret). We fetch abater and add it to the right.

left right
amanaplanacaret abateracanalpanama

The substring now is aba (itself a palindrome), and lo and behold, we have found another palindrome: A man, a plan, a caret, a bater, a canal, Panama! As the palindromes are found, they are written to a file called pallog<process id>.txt, e.g., pallog8080.txt.

Using the Code

Download and extract the project, and open Palindrome.sln with Visual Studio. It consists of two projects:

  1. Palindrome - The main project
  2. UnitTestProjectPal - a unit test project

Build the solution by pressing F6. Remember to set the name of the dictionary file you wish to use as described above.

Code Details

The class PalPython is the main class and has a number of properties including two Dictionary items. public Dictionary<String, String>_truename, which holds the true name of a canonical word, and public Dictionary<String, int>seen, which keeps track of whether or not a word has been used to build the current palindrome.

The key class used in PalPython is MyStackItem. It consists of the current substring, and words that match it in a Stack. A stack of MyStackItem are kept in their own stack, public Stack<MyStackItem>wordStack. As the program runs, the substring and words that match it are placed in a MyStackItem object, and that object in turn is pushed onto the wordStack. The class itself is very simple consisting of just two properties; in fact the PrintStack(), PrintWords(), and ToString() methods used for debugging are actually the most complex part of the class! We'll look at PrintStack() and PrintWords() shortly.

C#
public class MyStackItem
{
	public String Substring { get; set; }
	public Stack<String> Words { get; set; }
	...
}

When using the debugger in Python (I'm using IDLE Version 2.7), I added the following to the search() method after the call to self.extend() to print the current list of words that match the substring, and the current stack:

XML
print (words)
print (self.stack)

For example, for the substring aca (using the npdictPeterVanDerLinden.txt dictionary file), here is the output in IDLE's Python shell:

XML
['acalamus', 'acanal', 'acaret', 'acat', 'acat', 'acatalpa', 'acatnip', 'acaw']
[('aca', ['acalamus', 'acanal', 'acaret', 'acat', 'acat', 'acatalpa', 'acatnip', 'acaw'])]

The PrintWords() and PrintStack() methods in the MyStackItem class generate the same output. Using our previous example, after backtracking the word catalpa, the stack looks like this prior to the Pop():

XML
[('aca', ['acanal', 'acaret'])], [('talpa', [])]

And after the Pop() in the search() method, the talpa has been removed as it has no words that match it, indicated by the empty square brackets ([]):

XML
[('aca', ['acanal', 'acaret'])]

Running Time

When I ran the Release version of Palindrome.exe using the npdict.txt dictionary file, here are some of the results:

Run time Number of Words in Palindrome
1 minute 1129
2 minutes 4801
3 minutes 5406
15 minutes 6677
24 minutes 7003
27 minutes 7197
35 minutes 7456
4 hours 7649

Conclusion

Python is a powerful language with a very readable syntax. Though converting it to C# was straightforward, the C# version is considerably more verbose being over 3 times the line count. Using my rather dated quad core machine with a modest 4 GB of memory, I was not able to generate anywhere near Mr. Norvig's impressive 21,000 word palindrome. It would be interesting if a CodeProject reader could post an article on how to harness the power of cloud computing to leverage more powerful hardware to run this program.

In the future, I plan on demonstrating a program to generate a cousin of the palindrome, the anagram. An anagram is a word or phrase formed by rearranging the letters of another, for example, "Old West Action" is an anagram of "Clint Eastwood".

Version 1.0.0.0

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)