Avoiding the Dreaded Mystery Character
One of the most common utterances programmers make when informed their code is not working right is, "It works on my machine!"
An even more revolting development (than when your code works on your machine, but not on someone else's) is when it works on your machine, but also does not work on your machine.
Let me explain.
I created a "ginormous" jsfiddle of an entire Robert Louis Stevenson book in English ("Treasure Island") and Spanish ("La Isla del Tesoro"). As you can see here, the Spanish displays as desired in jsfiddle - the special characters characteristic of written Spanish, such as "ñ", "¿", "¡", etc., display just fine).
However, when I -- in preparation for making this bilingual work available as a paperback/Kindle pair -- copied the CSS and HTML to a text file, and changed the extension from .txt to .html, the file displayed more-or-less as desired in my browser, except that, on encountering the accented characters, the browser threw up its virtual hands and replaced those accented characters with the "I-don't-know-what-the-heck-this-is-so-I'm-going-to-replace-it-with-a-fallback-symbol" character, namely "�".
This wouldn't do - the Real Academia Española would likely issue a warrant for my arrest and deportation, and then force me to eat Spanish food (bland-bah!) instead of Mexican (spicy-awesome!) fare, which I can enjoy practically "at will" in my native California.
Being scared out of my wits at that prospect, I wrote a utility that replaces accented characters with their HTML code equivalents. Once this is accomplished, the modified text displays as desired in my (and your) browser. Here is the crux of it (both the source and the .exe are included as downloads):
private void buttonReplaceCharsWithCodes_Click(object sender, EventArgs e)
{
String fallName = String.Empty;
List<string> linesModified = new List<string>();
StreamReader file = null;
try
{
try
{
DialogResult result = openFileDialog1.ShowDialog();
if (result == DialogResult.OK)
{
fallName = openFileDialog1.FileName;
}
file = new StreamReader(fallName, Encoding.Default, true);
String line;
while ((line = file.ReadLine()) != null)
{
linesModified.Add(line);
}
progressBar1.Maximum = linesModified.Count;
progressBar1.Value = 0;
labelProgFeedback.Text = "Replacing accented chars with HTML codes";
for (int i = 0; i < linesModified.Count; i++)
{
linesModified[i] = linesModified[i].Replace("á", "á");
linesModified[i] = linesModified[i].Replace("Á", "Á");
linesModified[i] = linesModified[i].Replace("é", "é");
linesModified[i] = linesModified[i].Replace("É", "É");
linesModified[i] = linesModified[i].Replace("í", "í");
linesModified[i] = linesModified[i].Replace("Í", "Í");
linesModified[i] = linesModified[i].Replace("ñ", "ñ");
linesModified[i] = linesModified[i].Replace("Ñ", "Ñ");
linesModified[i] = linesModified[i].Replace("ó", "ó");
linesModified[i] = linesModified[i].Replace("Ó", "Ó");
linesModified[i] = linesModified[i].Replace("ú", "ú");
linesModified[i] = linesModified[i].Replace("Ú", "Ú");
linesModified[i] = linesModified[i].Replace("ü", "ü");
linesModified[i] = linesModified[i].Replace("Ü", "Ü");
linesModified[i] = linesModified[i].Replace("¿", "¿");
linesModified[i] = linesModified[i].Replace("¡", "¡");
linesModified[i] = linesModified[i].Replace("Ä", "Ä");
linesModified[i] = linesModified[i].Replace("ä", "ä");
linesModified[i] = linesModified[i].Replace("Ö", "Ö");
linesModified[i] = linesModified[i].Replace("ö", "ö");
linesModified[i] = linesModified[i].Replace("ß", "ß");
linesModified[i] = linesModified[i].Replace("â", "â;");
linesModified[i] = linesModified[i].Replace("ê", "ê;");
linesModified[i] = linesModified[i].Replace("ô", "ô;");
progressBar1.PerformStep();
}
progressBar1.Value = 0;
}
catch (Exception ex)
{
MessageBox.Show(String.Format("Exception {0}", ex.Message));
}
}
finally
{
textBoxMassagedResults.Text = string.Join(Environment.NewLine, linesModified);
String massagedFileName = String.Format("{0}_Massaged.txt", fallName);
File.WriteAllLines(massagedFileName, linesModified, Encoding.UTF8);
file.Close();
buttonCopyTextToClipboard.Enabled = true;
labelProgFeedback.Text = String.Format
("Finished! Massaged text below and saved as {0}", massagedFileName);
}
}
You may note that the method above also handles German characters. Other special characters can be easily added to the code, as needed, if you want to support other languages/special characters. As for me, since English, Spanish, and German are the only (human) languages I know, they are the only ones whose special characters I need to support in this sort of endeavor.
Thar's Gold in Them Thar Caves!
And, so, voila! (I don't really know French, I just pretend I do on the Interwebs sometimes), I was able to generate the document with the characters represented as they should be, and the English/Spanish version of "Treasure Island / La Isla del Tesoro" is now available in both paperback and kindle formats.
If You See Matched Pairs in the Code...
I have updated the tip several times, changing the character to replace to the single character, and the replacement to the HTML code), but it keeps changing, so if that is what you see, just download the source code, and you will see what it really needs to be.