Click here to Skip to main content
16,018,006 members
Articles / Programming Languages / C#
Article

String Extension Collection for C#

Rate me:
Please Sign up or sign in to vote.
3.40/5 (34 votes)
27 Nov 2008CPOL3 min read 119.4K   718   77   36
Several useful extensions for System.String.

Introduction

Extension methods (a new feature of C# 3.0) are useful as they enable to "add" methods to a class without modifying its source code. Such methods behave (from a point of writing code and intellisense) like member methods. This is very useful for built-in .NET classes or third-party libraries. Hundreds of articles have been written about this; the aim of this article is not to introduce extension methods, but to show a collection of several most useful extension methods for the System.String class.

This article brings a small library (a code file and unit tests for this code). Some of the extension methods have been collected from various websites, and some were written by me. Unit tests are presented for demonstration purposes.

Background

For those who don't know about extension methods, I suggest reading this nice article on Wikipedia.

Using the Code

Let me introduce the source code without much delay. The first method was written by David Hayden and checks if an email ID is in valid format.

C#
/// <summary>
/// true, if is valid email address
/// from http://www.davidhayden.com/blog/dave/
/// archive/2006/11/30/ExtensionMethodsCSharp.aspx
/// </summary>
/// <param name="s">email address to test</param>
/// <returns>true, if is valid email address</returns>

public static bool IsValidEmailAddress(this string s)
{
    return new Regex(@"^[\w-\.]+@([\w-]+\.)+[\w-]{2,6}$").IsMatch(s);
}

The counterpart test method is the following:

C#
[TestMethod()]
public void IsValidEmailAddressTest()
{
    Assert.IsTrue("yellowdog@someemail.uk".IsValidEmailAddress());
    Assert.IsTrue("yellow.444@email4u.co.uk".IsValidEmailAddress());
    Assert.IsFalse("adfasdf".IsValidEmailAddress());
    Assert.IsFalse("asd@asdf".IsValidEmailAddress());
}

I have found a lot of UR validation functions, but not all of them seemed to be OK. This method is inspired by a Regular Expression written by bb, which seems to work fine.

C#
/// <summary>
/// Checks if url is valid. 
/// from http://www.osix.net/modules/article/?id=586
/// and changed to match http://localhost
/// 
/// complete (not only http) url regex can be found 
/// at http://internet.ls-la.net/folklore/url-regexpr.html
/// </summary>
/// <param name="text"></param>

/// <returns></returns>
public static bool IsValidUrl(this string url)
{
    string strRegex = "^(https?://)"
+ "?(([0-9a-z_!~*'().&=+$%-]+: )?[0-9a-z_!~*'().&=+$%-]+@)?" //user@
+ @"(([0-9]{1,3}\.){3}[0-9]{1,3}" // IP- 199.194.52.184
+ "|" // allows either IP or domain
+ @"([0-9a-z_!~*'()-]+\.)*" // tertiary domain(s)- www.
+ @"([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]" // second level domain
+ @"(\.[a-z]{2,6})?)" // first level domain- .com or .museum is optional
+ "(:[0-9]{1,5})?" // port number- :80
+ "((/?)|" // a slash isn't required if there is no file name
+ "(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$";
    return new Regex(strRegex).IsMatch(url);
}

The counterpart test method is the following:

C#
/// <summary>
///A test for IsValidUrl
///</summary>
[TestMethod()]
public void IsValidUrlTest()
{
    Assert.IsTrue("http://www.codeproject.com".IsValidUrl());
    Assert.IsTrue("https://www.codeproject.com/#some_anchor".IsValidUrl());
    Assert.IsTrue("https://localhost".IsValidUrl());
    Assert.IsTrue("http://www.abcde.nf.net/signs-banners.jpg".IsValidUrl());
    Assert.IsTrue("http://aa-bbbb.cc.bla.com:80800/test/" + 
                  "test/test.aspx?dd=dd&id=dki".IsValidUrl());
    Assert.IsFalse("http:wwwcodeprojectcom".IsValidUrl());
    Assert.IsFalse("http://www.code project.com".IsValidUrl());
}

I have written a third method to test if the user provides the existing homepage:

C#
/// <summary>
/// Check if url (http) is available.
/// </summary>
/// <param name="httpUri">url to check</param>
/// <example>

/// string url = "www.codeproject.com;
/// if( !url.UrlAvailable())
///     ...codeproject is not available
/// </example>
/// <returns>true if available</returns>
public static bool UrlAvailable(this string httpUrl)
{
    if (!httpUrl.StartsWith("http://") || !httpUrl.StartsWith("https://"))
        httpUrl = "http://" + httpUrl;
    try
    {
        HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(httpUrl);
        myRequest.Method = "GET";
        myRequest.ContentType = "application/x-www-form-urlencoded";
        HttpWebResponse myHttpWebResponse = 
           (HttpWebResponse)myRequest.GetResponse();
        return true;
    }
    catch
    {
        return false;
    } 
}

The counterpart test method is the following:

C#
public void UrlAvailableTest()
{
    Assert.IsTrue("www.codeproject.com".UrlAvailable());
    Assert.IsFalse("www.asjdfalskdfjalskdf.com".UrlAvailable());
}

The reversing string example can be found on Wikipedia. This version without the cycle looks better.

C#
/// <summary>

/// Reverse the string
/// from http://en.wikipedia.org/wiki/Extension_method
/// </summary>
/// <param name="input"></param>
/// <returns></returns>
public static string Reverse(this string input)
{
    char[] chars = input.ToCharArray();
    Array.Reverse(chars);
    return new String(chars);
}

The counterpart test method is as follows:

C#
public void ReverseTest()
{
    string input = "yellow dog";
    string expected = "god wolley";
    string actual = input.Reverse();
    Assert.AreEqual(expected, actual);
}

Sometimes, you need to provide a preview of a long text. This can be done using this Reduce extension method:

C#
/// <summary>

/// Reduce string to shorter preview which is optionally ended by some string (...).
/// </summary>
/// <param name="s">string to reduce</param>
/// <param name="count">Length of returned string including endings.</param>
/// <param name="endings">optional edings of reduced text</param>

/// <example>
/// string description = "This is very long description of something";
/// string preview = description.Reduce(20,"...");
/// produce -> "This is very long..."
/// </example>
/// <returns></returns>

public static string Reduce(this string s, int count, string endings)
{
    if (count < endings.Length)
        throw new Exception("Failed to reduce to less then endings length.");
    int sLength = s.Length;
    int len = sLength;
    if (endings != null)
        len += endings.Length;
    if (count > sLength)
        return s; //it's too short to reduce
    s = s.Substring(0, sLength - len + count);
    if (endings != null)
        s += endings;
    return s;
}

The counterpart test method is the following:

C#
[TestMethod()]
public void ReduceTest()
{
    string input = "The quick brown fox jumps over the lazy dog";
    int count = 10; 
    string endings = "...";
    string expected = "The qui...";
    string actual = input.Reduce(count, endings);
    Assert.AreEqual(expected, actual);
}

Sometimes you need to parse a phone number or a price, and the user might have interposed the string with spaces. To not boss the user about, and to avoid duplicating test conditions, you can use the RemoveSpaces extension method when parsing numbers.

C#
/// <summary>
/// remove white space, not line end
/// Useful when parsing user input such phone,
/// price int.Parse("1 000 000".RemoveSpaces(),.....
/// </summary>
/// <param name="s"></param>

/// <param name="value">string without spaces</param>
public static string RemoveSpaces(this string s)
{
    return s.Replace(" ", "");
}

The counterpart test method is the following:

C#
[TestMethod()]
public void RemoveSpacesTest()
{
    string input = "yellow dog" + Environment.NewLine  + "black cat";
    string expected = "yellowdog" + Environment.NewLine + "blackcat";
    string actual = input.RemoveSpaces();
    Assert.AreEqual(expected, actual);
}

If you need to ensure the user input to be a number and you want to be tolerant of the number format, use the IsNumber extension.

C#
/// <summary>
/// true, if the string can be parse as Double respective Int32
/// Spaces are not considred.
/// </summary>
/// <param name="s">input string</param>

/// <param name="floatpoint">true, if Double is considered,
/// otherwhise Int32 is considered.</param>
/// <returns>true, if the string contains only digits or float-point</returns>
public static bool IsNumber(this string s, bool floatpoint)
{
    int i;
    double d;
    string withoutWhiteSpace = s.RemoveSpaces();
    if (floatpoint)
        return double.TryParse(withoutWhiteSpace, NumberStyles.Any,
            Thread.CurrentThread.CurrentUICulture , out d);
    else
        return int.TryParse(withoutWhiteSpace, out i);
}

The counterpart test method is the following:

C#
[TestMethod()]
public void IsNumberTest()
{
    Thread.CurrentThread.CurrentUICulture = CultureInfo.InvariantCulture;

    Assert.IsTrue("12345".IsNumber(false));
    Assert.IsTrue("   12345".IsNumber(false));
    Assert.IsTrue("12.345".IsNumber(true));
    Assert.IsTrue("   12,345 ".IsNumber(true));
    Assert.IsTrue("12 345".IsNumber(false));
    Assert.IsFalse("tractor".IsNumber(true));
}

The more restrictive version of the IsNumber method is IsNumberOnly, which ensures that all characters are digits, possibly float point. This could also be done using LINQ via s.ToCharArray().Where(...).Count() == 0.

C#
/// <summary>
/// true, if the string contains only digits or float-point.
/// Spaces are not considred.
/// </summary>
/// <param name="s">input string</param>

/// <param name="floatpoint">true, if float-point is considered</param>
/// <returns>true, if the string contains only digits or float-point</returns>
public static bool IsNumberOnly(this string s, bool floatpoint)
{
    s = s.Trim();
    if (s.Length == 0)
        return false;
    foreach (char c in s)
    {
        if (!char.IsDigit(c))
        {
            if (floatpoint && (c == '.' || c == ','))
                continue;
            return false;
        }
    }
    return true;
}

The counterpart test method is the following:

C#
[TestMethod()]
public void IsNumberOnlyTest()
{
    Assert.IsTrue("12345".IsNumberOnly(false));
    Assert.IsTrue("   12345".IsNumberOnly(false));
    Assert.IsTrue("12.345".IsNumberOnly(true));
    Assert.IsTrue("   12,345 ".IsNumberOnly(true));
    Assert.IsFalse("12 345".IsNumberOnly(false));
    Assert.IsFalse("tractor".IsNumberOnly(true));
}

Michael Kaplan describes a very useful method for removing diacritics (accents) from strings. It is useful when implementing URL rewriting, and you need to generate valid and readable URLs.

C#
/// <summary>
/// Remove accent from strings 
/// </summary>
/// <example>
///  input:  "Příliš žluťoučký kůň úpěl ďábelské ódy."
///  result: "Prilis zlutoucky kun upel dabelske ody."
/// </example>
/// <param name="s"></param>
/// <remarks>founded at http://stackoverflow.com/questions/249087/
/// how-do-i-remove-diacritics-accents-from-a-string-in-net</remarks>
/// <returns>string without accents</returns>

public static string RemoveDiacritics(this string s)
{
    string stFormD = s.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    for (int ich = 0; ich < stFormD.Length; ich++)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(stFormD[ich]);
        }
    }
    return (sb.ToString().Normalize(NormalizationForm.FormC));
}

The counterpart test method is the following:

C#
/// <summary>
///A test for RemoveDiacritics
///</summary>
[TestMethod()]
public void RemoveDiacriticsTest()
{
    //contains all czech accents
    ///  input:  "Příliš žluťoučký kůň úpěl ďábelské ódy."
    ///  result: "Prilis zlutoucky kun upel dabelske ody."
    string actual = input.RemoveDiacritics();
    Assert.AreEqual(expected, actual);
}

When I was programming in PHP, Nl2Br was a very useful PHP function. This one was posted by DigiMortal.

C#
/// <summary>
/// Replace \r\n or \n by <br />
/// from http://weblogs.asp.net/gunnarpeipman/archive/2007/11/18/c-extension-methods.aspx
/// </summary>

/// <param name="s"></param>
/// <returns></returns>
public static string Nl2Br(this string s)
{
    return s.Replace("\r\n", "<br />").Replace("\n", "<br />");
}

The counterpart test method is the following:

C#
[TestMethod()]
public void Nl2BrTest()
{
    string input = "yellow dog" + Environment.NewLine + "black cat";
    string expected = "yellow dog<br />black cat";
    string actual = input.Nl2Br();
    Assert.AreEqual(expected, actual);
}

The MD5 function can be used in almost every application.

C#
/// <summary>
static MD5CryptoServiceProvider s_md5 = null;

/// from http://weblogs.asp.net/gunnarpeipman/archive/2007/11/18/c-extension-methods.aspx
/// </summary>
/// <param name="s"></param>
/// <returns></returns>
public static string MD5(this string s)
{
    if( s_md5 == null) //creating only when needed
        s_md5 = new MD5CryptoServiceProvider();
    Byte[] newdata = Encoding.Default.GetBytes(s);
    Byte[] encrypted = s_md5.ComputeHash(newdata);
    return BitConverter.ToString(encrypted).Replace("-", "").ToLower();
}

The counterpart test method is the following:

C#
[TestMethod()]
public void MD5Test()
{
    string input = "The quick brown fox jumps over the lazy dog";
    string expected = "9e107d9d372bb6826bd81d3542a419d6";
    string actual = input.MD5();
    Assert.AreEqual(expected, actual);
}

Points of Interest

While writing this article, I have found an extensive library here. Unfortunately, some links don't work.

History

  • 18 Nov 2008
    • Definition of extension methods was changed not to propagate a misstatement about them being members.
    • Removed CreateDirIfNotExistsTest, it was really useless.
    • URL Regex changed to match http://localhost
  • 25 Nov 2008
    • Changed Invariant Culture to CurrentUI Culture (thanks to x2develop.com).
    • Changed email regex to match .museum domain.
    • Changed MD5 method to be more effective (thanks to Juan).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Other my own project
Czech Republic Czech Republic
My name is Tomas Kubes, feel free to contact me at tomas_kubes(at)seznam.cz

Comments and Discussions

 
GeneralRe: ....and why not? Pin
Tomas Kubes21-Nov-08 11:44
Tomas Kubes21-Nov-08 11:44 
GeneralUrl validation is invalid Pin
megger8318-Nov-08 4:40
megger8318-Nov-08 4:40 
GeneralRe: Url validation is invalid Pin
Aiscrim18-Nov-08 5:01
Aiscrim18-Nov-08 5:01 
GeneralRe: Url validation is invalid Pin
Tomas Kubes18-Nov-08 5:25
Tomas Kubes18-Nov-08 5:25 
GeneralJust one comment Pin
Dmitri Nеstеruk18-Nov-08 4:20
Dmitri Nеstеruk18-Nov-08 4:20 
GeneralRe: Just one comment Pin
Tomas Kubes18-Nov-08 8:53
Tomas Kubes18-Nov-08 8:53 
GeneralE-mail validation is majorly flawed Pin
William E. Kempf18-Nov-08 4:18
William E. Kempf18-Nov-08 4:18 
RantRe: E-mail validation is majorly flawed Pin
jtm_moon24-Nov-08 16:16
jtm_moon24-Nov-08 16:16 
I have to chime in with my picky criticism of an example (this post strays from the original topic a bit).
I know there's many more permutations to an email account than allowed by the simple Regular Expression shown above.

I know you can add '+' to between the email username and '@' symbol with gmail.
I'm not sure about other email servers. I'm sure a GNU or open-source email server would allow other name-munging options, too (they tend to follow standards rigorously).

For example:
bo.b@gmail.com
is the same email account as
bo.b+codeproject@gmail.com
which is also the same as
bob+codeproject@gmail.com
which is also the same as
b..o..b+codeproject@gmail.com
These are the same valid email address (with gmail, at least).
The point being, there are a much wider variety of syntax for email user names possible than most email parsers realize. I know because I have a few email addresses with unusual mailbox names and the email address is often rejected as invalid in most web forms.

Basically, a web form shouldn't care too much about the mailbox name preceding the '@' symbol. It is the responsibility of the email server host to parse the user name from this and deliver it to the correct mailbox

Here's the most relevant Standards snippet I could find in 10 minutes of searching.
From RFC 2821
http://tools.ietf.org/html/rfc2821#section-2.3.10
2.3.10 Mailbox and Address
As used in this specification, an "address" is a character string
that identifies a user to whom mail will be sent or a location into
which mail will be deposited. ...
The standard mailbox naming convention is defined to be "local-
part@domain": contemporary usage permits a much broader set of
applications than simple "user names". Consequently, and due to a
long history of problems when intermediate hosts have attempted to
optimize transport by modifying them, the local-part MUST be
interpreted and assigned semantics only by the host specified in the
domain part of the address.


I'm sure there's another RFC with even more specifics than that.
Okay, that's the end of my rant.
Thanks for taking the time to write this article on C# string extensions!

-J_Tom_Moon_79

GeneralRe: E-mail validation is majorly flawed Pin
Tomas Kubes25-Nov-08 0:38
Tomas Kubes25-Nov-08 0:38 
GeneralRe: E-mail validation is majorly flawed Pin
TobiasP27-Nov-08 0:49
TobiasP27-Nov-08 0:49 
GeneralThoughts Pin
PIEBALDconsult18-Nov-08 3:57
mvePIEBALDconsult18-Nov-08 3:57 
GeneralRe: Thoughts Pin
Tomas Kubes18-Nov-08 5:11
Tomas Kubes18-Nov-08 5:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.