Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / regular-expression

Don't count spaces when counting words.

5.00/5 (2 votes)
25 Oct 2011CPOL 8.3K  
I also use a Regex expression to count words, which returns the same number of words as MS Word. I wrap the Regular Expression in a String extension method to make it easy to use.public static class StringExtensions{ /// /// WordCounts Regular Expression /// ...

I also use a Regex expression to count words, which returns the same number of words as MS Word. I wrap the Regular Expression in a String extension method to make it easy to use.


C#
public static class StringExtensions
{
  /// <summary>
  /// WordCounts Regular Expression
  /// </summary>
  private const string WordCountRegex = @"[^\s!?¡¿\-\–]+";

  /// <summary>
  /// Static WordCounts Regular Expression Object
  /// </summary>
  private static Regex regexWordCounts = new Regex(WordCountRegex, 
             RegexOptions.Compiled | RegexOptions.Multiline);
  
  /// <summary>
  /// Returns the number of words in a given <paramref name="sentence" />
  /// </summary>
  /// <param name="sentence">Text in which to count words</param>
  /// <returns>Number of words, or zero if regular expression failed</returns>
  public static int WordCounts(this string sentence)
  {
    try
    {
      MatchCollection matchCollection = regexWordCounts.Matches(sentence);
      return matchCollection.Count;
    }
    catch
    {
      return 0;
    }
  }
}

Taking the samples above, this would give the following:


C#
string input = 
  "The total number of words       \t        this sentence is 10.";
int wordCounts = input.WordCounts(); //Returns 9

input = "Mr O'Brien-Smith arrived at 8.30 and spent \t $1,000.99";
int wordCounts = input.WordCounts(); //Returns 9

Hope this helps.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)