Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / regular-expression

Parsing Latitude and Longitude Information

4.73/5 (24 votes)
21 Feb 2012CPOL9 min read 96.3K   2K  
Parses user input and extracts latitude and longitude information, taking into account the user's language and regional settings

Introduction

In my map control article, I tried to parse user input to see if it was a latitude and longitude and display the result on the map. At the time, I didn't want to write a full parser for the user input so I (very lazily) just split the user input using a comma and tried to parse two decimal values.

This has a number of problems. First, coordinates couldn't be in degree minute second format, which is quite popular for coordinates. The second point (which was also pointed out in the comments), some countries use a comma as a decimal separator (Spain for example).

Background

Jaime Olivares wrote an excellent article here that parses and serializes latitude and longitude coordinates according to the ISO 6709 standard (a nice guide on the standard is available on this page).

The article is good at explaining what the standard is and provides nice and concise code to get the job done, but it's a bit unreasonable to expect users of an application to type a coordinate according to this format! For that reason, I'm going to use some simple regular expressions to try and parse as flexibly as possible, taking into account different user's language settings.

Screenshot

ISO 6709 Parsing

The nice thing about the ISO 6709 format (from a developer's point of view) is that we know exactly what to expect in the string. For example, to separate multiple coordinates the '/' character is used. Also, the data will not vary depending on the cultural settings of the user; the decimal separator will always be '.' However, there's still a little guess work, as we don't know if it represents decimal degrees (from now on referred to as D), degrees and decimal minutes (DM) or degrees, minutes and decimal seconds (DMS). Also, we do not know if there will be an altitude component or not. Let's list what we do know though:

  • The only valid digits are 0 - 9.
  • The only valid decimal separator is '.'
  • The decimal part of a number must contain the decimal separator and be followed by at least one digit.
  • The latitude component will be first and starts with a '+' or '-'.
  • Latitude will be three characters minimum, plus an optional decimal part [±DD(.D)].
  • Latitude will be seven characters maximum, plus an optional decimal part [±DDMMSS(.S)].
  • The longitude component will be next, which starts with a '+' or '-'.
  • Longitude will be four characters minimum, plus an optional decimal part [±DDD(.D)].
  • Longitude will be eight characters maximum, plus an optional decimal part [±DDDMMSS(.S)].
  • Altitude, if specified, will be next and will start with '+' or '-'.
  • Altitude will be two characters minimum, plus an optional decimal part [±A(.A)].
  • The string will be terminated by a '/' character.

Now that we know what a valid format is, we can easily translate it into a regular expression (and use the <a href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx">Regex</a> class). This is the regular expression we'll use (if you want to try it remember to use the RegexOptions.IgnorePatternWhitespace flag).

^\s*                                        # Match the start of the string, 
                    ignoring any whitespace
(?<latitude> [+-][0-9]{2,6}(?: \. [0-9]+)?) # The decimal part is optional.
(?<longitude>[+-][0-9]{3,7}(?: \. [0-9]+)?)
(?<altitude> [+-][0-9]+(?: \. [0-9]+)?)?    # The altitude component is optional
/                                           # The string must be terminated by '/'

This regular expression will tell us if the input string might be in the ISO 6709 format and, if it all matched, will allow us to get the various components from the string using the various named groups. I said the string might be in the correct format, because the expression shown also allows '+123+1234/' as a valid value (i.e. ±DDM±DDDM/) and doesn't perform any range checking on the values (e.g. minutes and seconds cannot be greater than or equal to 60). Therefore, we need to pass the output of a successful match onto another function to convert the string to a number that we can use in calculations.

For the altitude part, this is extremely easy; check the altitude <a href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.success.aspx">Group.Success</a> property and, if the altitude was found, convert the string value using double.Parse (making sure to pass in <a href="http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.invariantculture.aspx">CultureInfo.InvariantCulture</a> to avoid any localization issues). Note there is no need to use double.TryParse as we've already checked the input is valid using the regular expression.

For longitude and latitude, it's a little trickier. The basic idea is to split the string into two parts; the integral part and the optional fractional part. Depending on the length of the integral part, we know whether the string is in D, DM or DMS format and can split the string and parse each component separately, making sure to add the fractional part (if any) to the last component.

User Input

As mentioned in the introduction, the motivation of this article is to extract a coordinate from user supplied strings, whilst being friendly to different cultural settings. The approach I've taken is to split the string up into groups and then use the double.TryParse method (passing in the current cultural setting) to actually do the number processing, as I figured the .NET Framework can do a better job at localization than I can!

This just begs the question on how to split the string into groups? What I've assumed is that the latitude and longitude will be separated by whitespace. I've also assumed that the latitude and longitude will be in the same format (i.e. if latitude is a DM then longitude is a DM too). Let's look at some examples of how we might write the latitude:

12° 34' 56? SThis uses the ISO 31-1 symbols.
-12° 34' 56?This uses a negative sign instead of 'S'
-12°34'56?SThis uses a negative sign and an 'S' and omits the whitespace. We'll assume the coordinate is in the southern hemisphere.
-12 34" 56'This omits the degree symbol and uses quotation marks. This is probably the easiest to type as it doesn't use any special symbols not found on a normal keyboard.
-12 34’ 56”Same as above but uses smart quotes (think copying from Microsoft Word).
+12 34 56 SWe can assume this is DMS format as there are three groups of numbers.
S 12d34m56sSome programs allow D for degree, M for minute and S for second, with the North/South suffix at the beginning.
S 12* 34' 56"This is often seen in legal descriptions.

Of course, there are many more combinations (only specifying one of the symbols, mixing smart quotes and plain quotes, etc). Also, this is just for DMS format and doesn't even look at decimal seconds (for example, is -12 34' 56.78" valid? Maybe in some countries, but in Spain it's not). There is also a possible source of ambiguity in regards to what 'S' should mean? If we allow 'D' to signify Degrees, 'M' signifies Minutes then naturally 'S' should be interpreted as Seconds. But in most of the examples, 'S' signifies that the latitude is in the Southern hemisphere. We’ll therefore exclude 'S' as a symbol for seconds, so 12d 34m 56s will be interpreted as 12° 34' 56? S

Since we're not going to try and validate the numbers, we just need to find a way of splitting the string into groups. As with the ISO format, we can use a regular expression and group together anything which isn't a symbol or whitespace. Here is the simplest case for degrees only:

^\s*                         # Ignore any whitespace at the start of the string
(?<latitudeSuffix>[NS])?     # The suffix could be at the start
(?<latitude>.+?)             # Match anything and we'll try to parse it later
[D\*\u00B0]?\s*              # Degree symbols (optional) followed by optional whitespace
(?<latitudeSuffix>[NS])?\s+  # Optional suffix with at least some whitespace to separate

Wow, what a mess! After skipping the whitespace at the start of the string, Regex will look for a North/South specifier and, if it's found, will store it in a group named latitudeSuffix. It will then match any character ('.') more than once but as few times as necessary ('+?'). What that means is that if it finds an optional degree symbol (such as '*' (a reserved character so needs to be escaped), 'D' or '°' (written as a Unicode number)) then the matching will stop. Failing that, it will look for any whitespace. If still no matches are found, it will look for the latitude suffix. Finally, if it still hasn't found any of these, then it must find at least one whitespace character (remember we said that the latitude and longitude must be separated by whitespace). Assuming the regular expression matches the whole string successfully, then we move on to phase two where we try to parse the extracted groups using the current cultural settings. This involves passing the latitude group to double.TryParse and altering the sign (if necessary) based on the latitudeSuffix group.

Using the Code

The Angle class serves as a base class for Latitude and Longitude and allows conversion between radians and degrees. It implements the <a href="http://msdn.microsoft.com/en-us/library/4d7sx9hd.aspx">IComparable<T></a>, <a href="http://msdn.microsoft.com/en-us/library/ms131187.aspx">IEquatable<T></a> and <a href="http://msdn.microsoft.com/en-us/library/system.iformattable.aspx">IFormattable</a> interfaces, which means you can compare Angles with each other (or a Latitude or Longitude, but you cannot compare a Latitude to a Longitude - that doesn't make sense). It also means that you can choose how to display them:

C#
var latitude = Latitude.FromDegrees(-5, -10, -15.1234);
Console.WriteLine("{0:DMS1}", latitude); // 5° 10' 15.1? S
Console.WriteLine("{0:DM3}", latitude);  // 5° 10.252' S
Console.WriteLine("{0:D}", latitude);    // 5.17° S
Console.WriteLine("{0:ISO}", latitude);  // -051015.1234

The class does not have any public visible constructors, so you’ll need to use the static initializers. Here is the full list of methods and properties for the class:

C#
public class Angle : IComparable<Angle>, IEquatable<Angle>, IFormattable
{
    // Gets the whole number of degrees from the angle.
    public int Degrees { get; }

    // Gets the whole number of minutes from the angle.
    public int Minutes { get; }

    // Gets the number of seconds from the angle.
    public double Seconds { get; }

    // Gets the value of the angle in radians.
    public double Radians { get; }

    // Gets the value of the angle in degrees.
    public double TotalDegrees { get; }

    // Gets the value of the angle in minutes.
    public double TotalMinutes { get; }

    // Gets the value of the angle in seconds.
    public double TotalSeconds { get; }

    // Creates a new angle from an amount in degrees.
    public static Angle FromDegrees(double degrees);
    public static Angle FromDegrees(double degrees, double minutes);
    public static Angle FromDegrees(double degrees, double minutes, double seconds);

    // Creates a new angle from an amount in radians.
    public static Angle FromRadians(double radians);

    // Returns the result of multiplying the specified value by negative one.
    public static Angle Negate(Angle angle);

    public static bool operator !=(Angle angleA, Angle angleB);
    public static bool operator <(Angle angleA, Angle angleB);
    public static bool operator <=(Angle angleA, Angle angleB);
    public static bool operator ==(Angle angleA, Angle angleB);
    public static bool operator >(Angle angleA, Angle angleB);
    public static bool operator >=(Angle angleA, Angle angleB);

    // Compares this instance with a specified Angle object and indicates
    // whether the value of this instance is less than, equal to, or greater
    // than the value of the specified Angle object.
    public int CompareTo(Angle other);

    // Determines whether this instance and a specified object have
    // the same value.
    public override bool Equals(object obj);
    public bool Equals(Angle other);

    // Returns the hash code for this instance.
    public override int GetHashCode();

    // Returns a string that represents the current Angle in degrees,
    // minutes and seconds form.
    public override string ToString();

    // Formats the value of the current instance using the specified format.
    public virtual string ToString(string format, IFormatProvider formatProvider);
}

The Location class contains a Latitude, Longitude and optional altitude. It implements the IEquatable<T>, IFormattable and <a href="http://msdn.microsoft.com/en-us/library/system.xml.serialization.ixmlserializable.aspx">IXmlSerializable</a> interfaces, using the ISO format to serialize/deserialize itself. It also accepts the same formatting strings as Latitude/Longitude. There are some static parsing methods that accept various options for allowing different formats to be recognised and the class also has a few helper functions as well, derived from Aviation Formulary V1.45 by Ed Williams.

C#
public sealed class Location : IEquatable<Location>, IFormattable, IXmlSerializable
{
    // Initializes a new instance of the Location class.
    public Location(Latitude latitude, Longitude longitude);
    public Location(Latitude latitude, Longitude longitude, double altitude);

    // Gets the altitude of the coordinate, or null if the coordinate doesn't
    // contain altitude information.
    public double? Altitude { get; }

    // Gets the latitude of the coordinate.
    public Latitude Latitude { get; }

    // Gets the longitude of the coordinate.
    public Longitude Longitude { get; }

    // Converts the string into a Location.
    public static Location Parse(string value);
    public static Location Parse(string value, IFormatProvider provider);
    public static Location Parse(string value, LocationStyles style,
                                 IFormatProvider provider);

    // Converts the string into a Location (without throwing an exception).
    public static bool TryParse(string value, out Location location);
    public static bool TryParse(string value, IFormatProvider provider,
                                out Location location);
    public static bool TryParse(string value, LocationStyles style,
                                IFormatProvider provider, out Location location);

    public static bool operator !=(Location locationA, Location locationB);
    public static bool operator ==(Location locationA, Location locationB);

    // Determines whether this instance and a specified object have the
    // same value.
    public override bool Equals(object obj);
    public bool Equals(Location other);

    // Returns the hash code for this instance.
    public override int GetHashCode();

    // Returns a string that represents the current Location in degrees,
    // minutes and seconds form.
    public override string ToString();

    // Formats the value of the current instance using the specified format.
    public string ToString(string format, IFormatProvider formatProvider);

    // Calculates the initial course (or azimuth; the angle measured clockwise
    // from true north) from this instance to the specified value.
    public Angle Course(Location point);

    // Calculates the great circle distance, in meters, between this instance
    // and the specified value.
    public double Distance(Location point);

    // Calculates a point at the specified distance along the specified
    // radial from this instance.
    public Location GetPoint(double distance, Angle radial);
}

For the sake of completeness, there is also a serializable LocationCollection class that, like Location, uses the ISO format to serialize/deserialize itself.

Points of Interest

None of the Angle, Latitude or Longitude classes include a conversion (either implicit or explicit) from built in types (such as double). This is deliberate. In the <a href="http://msdn.microsoft.com/en-us/library/system.math.aspx">Math</a> class of the .NET Framework, the methods which work with angles (such as Math.Cos) expect the angles to be in radians. However, when dealing with latitude/longitude, degrees are far more common. It therefore seems inconsistent to be able to cast a double to an Angle and assume that the number is in radians but to have a cast from a double to Latitude assume that the number is in degrees. For this reason, it's best if the developer is explicit with what the number represents by using the FromDegrees or FromRadians static methods.

Also, for efficiency reasons, it would be nice if Latitude and Longitude were structs. However, you cannot use inheritance with structs and I think the code re-use between Angle and Latitude/Longitude justifies the use of classes, but would welcome any feedback with your opinions.

History

  • 21/02/12 - Fixed ISO formatting error with values less than one degree and improved general rounding of minutes and seconds
  • 12/11/11 - Fixed a parsing bug where the hemisphere information is lost when the degrees element is zero
  • 31/01/11 – Added formats pointed out by Erik Anderson
  • 30/01/11 – First version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)