Introduction
In my map control article, I tried to parse user input to see if it was a latitude and longitude and display the result on the map. At the time, I didn't want to write a full parser for the user input so I (very lazily) just split the user input using a comma and tried to parse two decimal values.
This has a number of problems. First, coordinates couldn't be in degree minute second format, which is quite popular for coordinates. The second point (which was also pointed out in the comments), some countries use a comma as a decimal separator (Spain for example).
Background
Jaime Olivares wrote an excellent article here that parses and serializes latitude and longitude coordinates according to the ISO 6709 standard (a nice guide on the standard is available on this page).
The article is good at explaining what the standard is and provides nice and concise code to get the job done, but it's a bit unreasonable to expect users of an application to type a coordinate according to this format! For that reason, I'm going to use some simple regular expressions to try and parse as flexibly as possible, taking into account different user's language settings.
ISO 6709 Parsing
The nice thing about the ISO 6709 format (from a developer's point of view) is that we know exactly what to expect in the string
. For example, to separate multiple coordinates the '/'
character is used. Also, the data will not vary depending on the cultural settings of the user; the decimal separator will always be '.'
However, there's still a little guess work, as we don't know if it represents decimal degrees (from now on referred to as D), degrees and decimal minutes (DM) or degrees, minutes and decimal seconds (DMS). Also, we do not know if there will be an altitude component or not. Let's list what we do know though:
- The only valid digits are 0 - 9.
- The only valid decimal separator is
'.'
- The decimal part of a number must contain the decimal separator and be followed by at least one digit.
- The latitude component will be first and starts with a
'+'
or '-'
. - Latitude will be three characters minimum, plus an optional decimal part
[±DD(.D)]
. - Latitude will be seven characters maximum, plus an optional decimal part
[±DDMMSS(.S)]
. - The longitude component will be next, which starts with a
'+'
or '-'
. - Longitude will be four characters minimum, plus an optional decimal part
[±DDD(.D)]
. - Longitude will be eight characters maximum, plus an optional decimal part
[±DDDMMSS(.S)]
. - Altitude, if specified, will be next and will start with
'+'
or '-'
. - Altitude will be two characters minimum, plus an optional decimal part
[±A(.A)]
. - The string will be terminated by a
'/'
character.
Now that we know what a valid format is, we can easily translate it into a regular expression (and use the <a href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx">Regex</a>
class). This is the regular expression we'll use (if you want to try it remember to use the RegexOptions.IgnorePatternWhitespace
flag).
^\s* # Match the start of the string,
ignoring any whitespace
(?<latitude> [+-][0-9]{2,6}(?: \. [0-9]+)?) # The decimal part is optional.
(?<longitude>[+-][0-9]{3,7}(?: \. [0-9]+)?)
(?<altitude> [+-][0-9]+(?: \. [0-9]+)?)? # The altitude component is optional
/ # The string must be terminated by '/'
This regular expression will tell us if the input string might be in the ISO 6709 format and, if it all matched, will allow us to get the various components from the string
using the various named groups. I said the string
might be in the correct format, because the expression shown also allows '+123+1234/'
as a valid value (i.e. ±DDM±DDDM/
) and doesn't perform any range checking on the values (e.g. minutes and seconds cannot be greater than or equal to 60). Therefore, we need to pass the output of a successful match onto another function to convert the string
to a number that we can use in calculations.
For the altitude part, this is extremely easy; check the altitude <a href="http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.success.aspx">Group.Success</a>
property and, if the altitude was found, convert the string
value using double.Parse
(making sure to pass in <a href="http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.invariantculture.aspx">CultureInfo.InvariantCulture</a>
to avoid any localization issues). Note there is no need to use double.TryParse
as we've already checked the input is valid using the regular expression.
For longitude and latitude, it's a little trickier. The basic idea is to split the string
into two parts; the integral part and the optional fractional part. Depending on the length of the integral part, we know whether the string
is in D, DM or DMS format and can split the string
and parse each component separately, making sure to add the fractional part (if any) to the last component.
User Input
As mentioned in the introduction, the motivation of this article is to extract a coordinate from user supplied string
s, whilst being friendly to different cultural settings. The approach I've taken is to split the string
up into groups and then use the double.TryParse
method (passing in the current cultural setting) to actually do the number processing, as I figured the .NET Framework can do a better job at localization than I can!
This just begs the question on how to split the string
into groups? What I've assumed is that the latitude and longitude will be separated by whitespace. I've also assumed that the latitude and longitude will be in the same format (i.e. if latitude is a DM then longitude is a DM too). Let's look at some examples of how we might write the latitude:
12° 34' 56? S | This uses the ISO 31-1 symbols. |
-12° 34' 56? | This uses a negative sign instead of 'S' |
-12°34'56?S | This uses a negative sign and an 'S' and omits the whitespace. We'll assume the coordinate is in the southern hemisphere. |
-12 34" 56' | This omits the degree symbol and uses quotation marks. This is probably the easiest to type as it doesn't use any special symbols not found on a normal keyboard. |
-12 34’ 56” | Same as above but uses smart quotes (think copying from Microsoft Word). |
+12 34 56 S | We can assume this is DMS format as there are three groups of numbers. |
S 12d34m56s | Some programs allow D for degree, M for minute and S for second, with the North/South suffix at the beginning. |
S 12* 34' 56" | This is often seen in legal descriptions. |
Of course, there are many more combinations (only specifying one of the symbols, mixing smart quotes and plain quotes, etc). Also, this is just for DMS format and doesn't even look at decimal seconds (for example, is -12 34' 56.78"
valid? Maybe in some countries, but in Spain it's not). There is also a possible source of ambiguity in regards to what 'S'
should mean? If we allow 'D'
to signify Degrees, 'M'
signifies Minutes then naturally 'S'
should be interpreted as Seconds. But in most of the examples, 'S'
signifies that the latitude is in the Southern hemisphere. We’ll therefore exclude 'S'
as a symbol for seconds, so 12d 34m 56s
will be interpreted as 12° 34' 56? S
Since we're not going to try and validate the numbers, we just need to find a way of splitting the string
into groups. As with the ISO format, we can use a regular expression and group together anything which isn't a symbol or whitespace. Here is the simplest case for degrees only:
^\s* # Ignore any whitespace at the start of the string
(?<latitudeSuffix>[NS])? # The suffix could be at the start
(?<latitude>.+?) # Match anything and we'll try to parse it later
[D\*\u00B0]?\s* # Degree symbols (optional) followed by optional whitespace
(?<latitudeSuffix>[NS])?\s+ # Optional suffix with at least some whitespace to separate
Wow, what a mess! After skipping the whitespace at the start of the string, Regex
will look for a North/South specifier and, if it's found, will store it in a group named latitudeSuffix
. It will then match any character ('.'
) more than once but as few times as necessary ('+?'
). What that means is that if it finds an optional degree symbol (such as '*'
(a reserved character so needs to be escaped), 'D'
or '°'
(written as a Unicode number)) then the matching will stop. Failing that, it will look for any whitespace. If still no matches are found, it will look for the latitude suffix. Finally, if it still hasn't found any of these, then it must find at least one whitespace character (remember we said that the latitude and longitude must be separated by whitespace). Assuming the regular expression matches the whole string
successfully, then we move on to phase two where we try to parse the extracted groups using the current cultural settings. This involves passing the latitude
group to double.TryParse
and altering the sign (if necessary) based on the latitudeSuffix
group.
Using the Code
The Angle
class serves as a base class for Latitude
and Longitude
and allows conversion between radians and degrees. It implements the <a href="http://msdn.microsoft.com/en-us/library/4d7sx9hd.aspx">IComparable<T></a>
, <a href="http://msdn.microsoft.com/en-us/library/ms131187.aspx">IEquatable<T></a>
and <a href="http://msdn.microsoft.com/en-us/library/system.iformattable.aspx">IFormattable</a>
interfaces, which means you can compare Angle
s with each other (or a Latitude
or Longitude
, but you cannot compare a Latitude
to a Longitude
- that doesn't make sense). It also means that you can choose how to display them:
var latitude = Latitude.FromDegrees(-5, -10, -15.1234);
Console.WriteLine("{0:DMS1}", latitude);
Console.WriteLine("{0:DM3}", latitude);
Console.WriteLine("{0:D}", latitude);
Console.WriteLine("{0:ISO}", latitude);
The class does not have any public
visible constructors, so you’ll need to use the static
initializers. Here is the full list of methods and properties for the class:
public class Angle : IComparable<Angle>, IEquatable<Angle>, IFormattable
{
public int Degrees { get; }
public int Minutes { get; }
public double Seconds { get; }
public double Radians { get; }
public double TotalDegrees { get; }
public double TotalMinutes { get; }
public double TotalSeconds { get; }
public static Angle FromDegrees(double degrees);
public static Angle FromDegrees(double degrees, double minutes);
public static Angle FromDegrees(double degrees, double minutes, double seconds);
public static Angle FromRadians(double radians);
public static Angle Negate(Angle angle);
public static bool operator !=(Angle angleA, Angle angleB);
public static bool operator <(Angle angleA, Angle angleB);
public static bool operator <=(Angle angleA, Angle angleB);
public static bool operator ==(Angle angleA, Angle angleB);
public static bool operator >(Angle angleA, Angle angleB);
public static bool operator >=(Angle angleA, Angle angleB);
public int CompareTo(Angle other);
public override bool Equals(object obj);
public bool Equals(Angle other);
public override int GetHashCode();
public override string ToString();
public virtual string ToString(string format, IFormatProvider formatProvider);
}
The Location
class contains a Latitude
, Longitude
and optional altitude. It implements the IEquatable<T>
, IFormattable
and <a href="http://msdn.microsoft.com/en-us/library/system.xml.serialization.ixmlserializable.aspx">IXmlSerializable</a>
interfaces, using the ISO format to serialize/deserialize itself. It also accepts the same formatting string
s as Latitude
/Longitude
. There are some static
parsing methods that accept various options for allowing different formats to be recognised and the class also has a few helper functions as well, derived from Aviation Formulary V1.45 by Ed Williams.
public sealed class Location : IEquatable<Location>, IFormattable, IXmlSerializable
{
public Location(Latitude latitude, Longitude longitude);
public Location(Latitude latitude, Longitude longitude, double altitude);
public double? Altitude { get; }
public Latitude Latitude { get; }
public Longitude Longitude { get; }
public static Location Parse(string value);
public static Location Parse(string value, IFormatProvider provider);
public static Location Parse(string value, LocationStyles style,
IFormatProvider provider);
public static bool TryParse(string value, out Location location);
public static bool TryParse(string value, IFormatProvider provider,
out Location location);
public static bool TryParse(string value, LocationStyles style,
IFormatProvider provider, out Location location);
public static bool operator !=(Location locationA, Location locationB);
public static bool operator ==(Location locationA, Location locationB);
public override bool Equals(object obj);
public bool Equals(Location other);
public override int GetHashCode();
public override string ToString();
public string ToString(string format, IFormatProvider formatProvider);
public Angle Course(Location point);
public double Distance(Location point);
public Location GetPoint(double distance, Angle radial);
}
For the sake of completeness, there is also a serializable LocationCollection
class that, like Location
, uses the ISO format to serialize/deserialize itself.
Points of Interest
None of the Angle
, Latitude
or Longitude
classes include a conversion (either implicit
or explicit
) from built in types (such as double
). This is deliberate. In the <a href="http://msdn.microsoft.com/en-us/library/system.math.aspx">Math</a>
class of the .NET Framework, the methods which work with angles (such as Math.Cos
) expect the angles to be in radians. However, when dealing with latitude/longitude, degrees are far more common. It therefore seems inconsistent to be able to cast a double
to an Angle
and assume that the number is in radians but to have a cast from a double
to Latitude
assume that the number is in degrees. For this reason, it's best if the developer is explicit with what the number represents by using the FromDegrees
or FromRadians static
methods.
Also, for efficiency reasons, it would be nice if Latitude
and Longitude
were struct
s. However, you cannot use inheritance with struct
s and I think the code re-use between Angle
and Latitude
/Longitude
justifies the use of classes, but would welcome any feedback with your opinions.
History
- 21/02/12 - Fixed ISO formatting error with values less than one degree and improved general rounding of minutes and seconds
- 12/11/11 - Fixed a parsing bug where the hemisphere information is lost when the degrees element is zero
- 31/01/11 – Added formats pointed out by Erik Anderson
- 30/01/11 – First version