Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / productivity / Office

How to Insert Spaces Between Words Even When They Begin or End with Strange Characters Using C# and the DocX Library

4.57/5 (3 votes)
7 Jan 2014CPOL2 min read 17.2K   66  
Inserting spaces between words using C# and the DocX Library

"Weird" Characters

In this tip, I showed how to wedge a space between words that were run together, such as "DennisRodman or "theWorm", making them "Dennis Rodman" and "the Worm" respectively (so to speak).

BTW: It's not funny, anymore, Dennis; you're not Marilyn Monroe, and that homicidal maniac is not JFK. 

That tip, though, only dealt with the "normal" English alphabet (a..Z and A..Z). Since I'm currently working with foreign language documents (Spanish and German, with French and perhaps Italian and Dutch coming later), I realized that I need to consider other possible characters, too, both as the ending lowercase letter or other ending character (such as é, í, ñ, ?, !, ", », and ß) and as the beginning uppercase letter or other character, such as ¿, ¡, ", and «

So, if you had a sentence such as this:

quéSera, Sera. Was zumTeuful ist hier los!¿se habla aleman?¡No!He said«Hola, muchacha»Das ist gewißMerkwürdig!

...running it through this helper method would "aerate" it like so:

qué Sera, Sera. Was zum Teuful ist hier los! ¿se habla aleman? ¡No! He said «Hola, muchacha» Das ist gewiß Merkwürdig!

Rather than clutter up and complicate the previous code, I wrote another helper function to handle those situations.

Preliminary Setting Up of Figurative Chairs

Follow these steps to prepare for the code to follow:

  1. Download the DocX DLL library from here
  2. In your Visual Studio project, right-click References, select "Add Reference..." and add docx.dll to the project from wherever you saved it.
  3. Add this to your using section:
  4. C#
    using Novacode;

Add this code to the top of your class, too:

C#
// 65..90 are A..Z; 97..122 are a..z
const int FIRST_CAP_POS = 65;
const int LAST_CAP_POS = 90;
const int FIRST_LOWER_POS = 97;
const int LAST_LOWER_POS = 122;

List<string> specialWordEndings;
List<string> specialWordBeginnings;

string soughtCombo = string.Empty;
string desiredCombo = string.Empty;
</string></string>

As usually happens, this ends up being a little more complicated than I first reckoned, because I have to deal with four different situations:

  1. An "odd" character at the end of a sentence followed by a "normal" (A..Z) character
  2. A "normal" (a..z, etc.) character at the end of a sentence followed by an "odd" character
  3. A combination of "odd" characters
  4. A combination of "normal" characters

And now, without further ado, adieux, or adios, straight from Carmel Valley, California, comes the illustrious and much-ballyhooed and anticipated code, entering from stage left, welcome:

The Nitty Gritty Prettifier/Aerator

C#
        private void Popul8UnusualCharLists()
        {   
            specialWordEndings = new List<string>() { "é", "í", "ñ", "?", "!", ",", ".", ":", ";", "\"", "»", "ß" };

            specialWordBeginnings = new List<string>() { "¿", "¡", "\"", "É", "«" };
        }

        private void AerateUnusualCombo(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                foreach (string endChar in specialWordEndings)
                {
                    foreach (string beginChar in specialWordBeginnings)
                    {
                        soughtCombo = string.Format("{0}{1}", endChar, beginChar);
                        desiredCombo = string.Format("{0} {1}", endChar, beginChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateUnusualEndNormalBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                foreach (string endChar in specialWordEndings)
                {
                    for (int i = FIRST_CAP_POS; i <= LAST_CAP_POS; i++)
                    {
                        char upperChar = (char)i;
                        soughtCombo = string.Format("{0}{1}", endChar, upperChar);
                        desiredCombo = string.Format("{0} {1}", endChar, upperChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateNormalEndUnusualBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                for (int i = FIRST_LOWER_POS; i <= LAST_LOWER_POS; i++)
                {
                    char lowerChar = (char)i;
                    foreach (string beginChar in specialWordBeginnings)
                    {
                        soughtCombo = string.Format("{0}{1}", lowerChar, beginChar);
                        desiredCombo = string.Format("{0} {1}", lowerChar, beginChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateNormalEndNormalBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                for (int i = FIRST_LOWER_POS; i <= LAST_LOWER_POS; i++)
                {
                    char lowerChar = (char)i;
                    for (int j = FIRST_CAP_POS; j <= LAST_CAP_POS; j++)
                    {
                        char upperChar = (char)j;
                        soughtCombo = string.Format("{0}{1}", lowerChar, upperChar);
                        desiredCombo = string.Format("{0} {1}", lowerChar, upperChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }
}

Call it like so:

C#
Cursor.Current = Cursors.WaitCursor;
try
{
    Popul8UnusualCharLists();
    string filename = string.Empty;
    DialogResult result = openFileDialog1.ShowDialog();
    if (result == DialogResult.OK)
    {
        filename = openFileDialog1.FileName;
    }
    else
    {
        MessageBox.Show("No file selected - exiting");
        return;
    }
    AerateUnusualCombo(filename);
    AerateUnusualEndNormalBegin(filename);
    AerateNormalEndUnusualBegin(filename);
    AerateNormalEndNormalBegin(filename);
}
finally
{
    Cursor.Current = Cursors.Default;
}
MessageBox.Show("Scrunched together words have been normalized!");

A Parting Plaintive Plea

If you find this tip useful, pay it forward and do something nice to somebody today, even if it surprises them.

Note: I have added two source code files: the smaller one is just for this tip; the larger one contains all the DocX code for various articles I wrote on CodeProject December 2013 and January 2014.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)