Introduction
UnitParser provides a reliable way to easily interact with a wide variety of units of measurement. Its adaptable format allows to intuitively account for unit-related information in virtually any situation.
It also includes further relevant features, like configurable exception triggering or gracefully managing numeric values of any size.
UnitParser is the first part of FlexibleParser
, a multi-purpose group of independent .NET parsing libraries (the second part in codeproject.com: NumberParser).
This article refers to UnitParser v. 1.0.9.0 (stable).
Additionally, note that there is a web API and a Java version of this library.
Background
Units of measurement represent a complex reality which, even nowadays, hasn't been fully systematised.
Traditionally, most of the software approaches have been mainly focused on the simplest sides of the problem: individual units/conversion factors. Most packages usually ignore issues like different systems of units (e.g., SI or Imperial), compound units (e.g., kg*m/s^2 being equal to N) or simplifications (e.g., kg*m/kg being equal to m). It is also quite difficult to find a piece of software converting string
-based inputs into safe programming structures allowing to easily manage this complex reality.
UnitParser aims to overcome the aforementioned usual limitations by applying the following ideas:
- Comprehensive classifications accounting for all the possible scenarios. Each unit is defined according to: system (SI, Imperial, USCS or CGS), type (all the types), specific name (all the names) symbol/abbreviation (all the symbols) if applicable, prefix (all the prefixes) if applicable and constituent parts (e.g., N formed by kg, m and s^-2).
- Programmer-friendly structures allowing to deal with any situation as intuitively as possible. There is a main class (
UnitP
) dealing with all the possible scenarios and managing potential errors/incompatibilities internally. For example: new UnitP("m/s")
is fine (m/s is a valid SI velocity unit), but new UnitP("m*s")
is not. - Minimal user input and rules applied systematically and consistently. For example, in case of incompatibility, the conditions defined by the first element starting from the top left will always prevail.
- Eminently focused on newer/commonly-used units.
- Usage of formally-correct alternatives when possible.
- Creation of custom formats only when strictly required. They have always to be simple and consistent.
- All the classifications apply the rule "in case of doubt, default/none".
Code Analysis
The UnitParser code is quite big and complex. It doesn’t have a well-defined structure which might be easily summarised. It doesn’t include specific parts worth highlighting due to its difficulty or the peculiarities of its implementation. I have a relevant experience in developing reasonably complex pieces of software from scratch, the .NET Framework/C# and units of measurement; the current code is simply a consequence of this reality: an experienced developer coming up with a comprehensive solution for a complex problem with which he feels very comfortable. That’s why I think that the best way to get a good idea about this code is to properly analyse it; for example, via debugging its descriptive test sample code (in GitHub).
I submitted an article about a preliminary version of this library some months ago. I didn’t get very good feedback back then, mainly because of not having explained the code in too much detail. Even despite not liking that feedback too much and finally deleting that submission, I did kind of apply those ideas and wrote descriptive texts for the most relevant parts of the code. The current analysis is different, but also includes references to the most relevant parts of these other resources.
Hopefully, what I am including below these lines will help people get interested in UnitParser and go ahead with the aforementioned ideal-IMO proper analysis of the code.
One Class to Rule Them All: UnitP
One of the defining features of UnitParser, as opposed to other software solutions, is to rely on just one class for all the units. Since the very first moment, my intention was to develop a very comprehensive approach, where the huge number of scenarios advise against the a priori more logical multi-class alternative.
UnitP
has a big number of different constructors supporting many different scenarios and is defined by the public
variables listed in the code excerpt below:
public partial class UnitP
{
public readonly Units Unit;
public readonly UnitTypes UnitType;
public readonly UnitSystems UnitSystem;
public readonly Prefix UnitPrefix = new Prefix();
public ReadOnlyCollection<UnitPart> UnitParts;
public readonly string OriginalUnitString;
public readonly string UnitString;
public readonly string ValueAndUnitString;
public readonly int BaseTenExponent;
public readonly ErrorInfo Error;
public decimal Value { get; set; }
}
UnitP
is mostly defined by readonly
fields, a normal consequence of its doing-everything-internally nature. Note that other approaches like automatically-updated getter/setters would have implied a relevant increase of complexity because of the big number of internal checks associated with even slight variations of the input conditions. Thanks to this readonly
setup, the whole process can be simplified into three main parts:
- Well-defined inputs provided via one of the public constructors
- Specific analysis on account of the type of inputs under consideration
- Population of the
readonly
public variables
In summary, all the UnitP
instantiated variables can be assumed to be valid and either include supported unit information or an error (by default, managed internally without throwing any exception).
UnitP newton = new UnitP(Units.Newton);
UnitP newton2 = new UnitP("kg*m/s^2");
UnitP newton3 = new UnitP("kg*m*s-2");
UnitP newton4 = new UnitP("1000 g*m/s2");
UnitP wrong1 = new UnitP("asfasf");
UnitP wrong2 = new UnitP("kG*m/s2");
UnitP wrong3 = new UnitP("kg*m*s-3");
UnitP wrong4 = new UnitP("10_0 g*m/s2");
The four first variables are valid instances and all of them are identical to each other (1 N, SI, force). The four variables below are all wrong, an issue which is indicated via the Error
field but without throwing an exception (this can only be done by relying on certain constructors).
Operations Between UnitP Instances
Another important aspect of relying on one class to deal with such a big number of different scenarios is making sure that all the operations between instances of that class follow the expected rules. When dealing with units of measurement and numerical values, this refers at least to arithmetic and comparison operations.
All the UnitP
public overloads/implicit operations are stored in the Operations/Operations_Public.cs file; a small sample of that code:
public partial class UnitP : IComparable<UnitP>
{
public int CompareTo(UnitP other)
{
return
(
this.BaseTenExponent == other.BaseTenExponent ?
(this.Value * this.UnitPrefix.Factor).CompareTo
(other.Value * other.UnitPrefix.Factor) :
(this.BaseTenExponent.CompareTo(other.BaseTenExponent)
);
}
public static implicit operator UnitP(string input)
{
return new UnitP(input);
}
public static implicit operator UnitP(decimal input)
{
return new UnitP(input);
}
public static implicit operator UnitP(Units input)
{
return new UnitP(input);
}
public static UnitP operator +(UnitP first, UnitP second)
{
return PerformUnitOperation
(
first, second, Operations.Addition,
GetOperationString(first, second, Operations.Addition)
);
}
}
Before performing any operation, both UnitP
instances have to be analysed and might eventually be modified. During these pre-checks, the following fields are being taken into account:
UnitType
. Operations between different types can only happen under certain conditions, as explained below. UnitParts
. This collection includes the most accurate definition of the given unit. Operations between instances of UnitP
with different UnitParts
are possible, but automatic conversions are likely to happen (read below). Error
. One or both instances being wrong might affect the output of the operation. - Numeric fields (i.e.,
Value
, BaseTenExponent
and UnitPrefix
). After having confirmed that both instances are compatible and having performed all the required actions (e.g., conversions), the corresponding operation is performed by bringing the numeric fields into picture.
By continuing with one of the code samples above, consider the following operations:
UnitP allNewtons = newton + newton2 + newton3;
UnitP wrongOperation = new UnitP("m") + newton;
allNewtons
is a valid instance (3 N, SI, force), but wrongOperation
is not because metres and newtons cannot be added.
When dealing with units of measurement, addition/subtraction (only the values are affected) are treated differently than multiplication/division (as a result of the operation, a new unit is created). These peculiarities are respected by UnitParser at each single level; for example, new UnitP("m")/new UniP("s")
outputs the same than new UnitP("m/s"),
metre per second (SI, speed).
In any case, it is recommendable to rely on the string
-based approach for relatively complex operations because each operator overloading is analysed individually and this might provoke some situations to be misassessed. For example, new UnitP("m*s/s")
is fine (1 metre); but new UnitP("m") * new UnitP("s") / new UnitP("s")
is wrong (an error is triggered when analysing new UnitP("m") * new UnitP("s")
).
Unit Parsing Peculiarities
One of the main goals of UnitParser is to be as intuitive as possible. Supporting input string
s is one of the ways to accomplish that goal, but it also opens a big number of possible scenarios: a wide variety of valid, invalid and even weird but technically correct inputs.
The code taking care of all the unit-string
-parsing parts is fairly complex and can be found in various files inside the Parse folder. Below these lines, I am including the method where all the compound (i.e., units formed by more than one contituent element) parsing actions are started.
private static ParseInfo StartCompoundAnalysis(ParseInfo parseInfo)
{
if (parseInfo.UnitInfo.Error.Type != ErrorTypes.None)
{
return parseInfo;
}
if (parseInfo.ValidCompound == null)
{
parseInfo.ValidCompound = new StringBuilder();
}
parseInfo.UnitInfo = RemoveAllUnitInformation(parseInfo.UnitInfo);
parseInfo.UnitInfo = UpdateInitialPositions(parseInfo.UnitInfo);
parseInfo.UnitInfo.System = GetSystemFromUnitInfo(parseInfo.UnitInfo);
parseInfo.UnitInfo = CorrectDifferentSystemIssues(parseInfo.UnitInfo);
parseInfo.UnitInfo = ImproveUnitParts(parseInfo.UnitInfo);
if (parseInfo.UnitInfo.Type == UnitTypes.None)
{
parseInfo.UnitInfo = GetUnitFromParts(parseInfo.UnitInfo);
}
parseInfo.UnitInfo = UpdateMainUnitVariables(parseInfo.UnitInfo);
if (parseInfo.UnitInfo.Unit == Units.None)
{
parseInfo.UnitInfo.Error = new ErrorInfo(ErrorTypes.InvalidUnit);
}
else parseInfo = AnalyseValidCompoundInfo(parseInfo);
return parseInfo;
}
UnitParser can deal with the following string
input scenarios:
- Valid symbols, common abbreviations and names:
new UnitP("s")
, new UnitP("sec")
and new UnitP("seConD")
are valid ways to refer to 1 second. - Constituent parts of a compound unit:
new UnitP("kg*m/s2")
understood as 1 N. - Convertible-to-each-other units forming a compound:
new UnitP("kg*ft/s2")
understood as 0.3048 N (first unit, kg, indicates that SI should be considered; ft doesn’t belong to SI, but it is directly convertible via 0.3048 m).
There are some rules which have to be always observed when parsing string
s consisting in multiple parts:
- Starting from the top left, the first unit with a supported system (note that there is a relevant number of units assumed to not belong to any system) defines the system for the whole compound. This is helpful to determine the target for eventual conversions of constituent elements (i.e., default unit for the given type and system). The 0.3048 N example above gives a quite descriptive idea about this specific scenario.
- Only one division sign is expected and it separates numerator and denominator.
High Quality Information
Any tool dealing with units of measurement and all what is related to them (representations, classifications, conversions, etc.) has to rely on a very relevant amount of hardcoded information. While developing UnitParser, I did quite an important effort to collect high quality information of different types.
The most relevant hardcoded information of UnitParser is stored in the files under the Keywords folder. This includes not just simpler formats (e.g., symbols or conversion factors), but also more complex ones like the definition of compounds, as shown in the code excerpt below:
private static Dictionary<UnitTypes, Compound[]> AllCompounds = new Dictionary<UnitTypes, Compound[]>()
{
{
UnitTypes.Area, new Compound[]
{
new Compound
(
new List<CompoundPart>() { new CompoundPart(UnitTypes.Length, 2) }
),
new Compound
(
new List<CompoundPart>() { new CompoundPart(UnitTypes.Area) }
)
}
},
{
UnitTypes.Volume, new Compound[]
{
new Compound
(
new List<CompoundPart>() { new CompoundPart(UnitTypes.Length, 3) }
),
new Compound
(
new List<CompoundPart>() { new CompoundPart(UnitTypes.Volume) }
)
}
},
{
UnitTypes.Velocity, new Compound[]
{
new Compound
(
new List<CompoundPart>()
{
new CompoundPart(UnitTypes.Length),
new CompoundPart(UnitTypes.Time, -1)
}
)
}
},
{
UnitTypes.Acceleration, new Compound[]
{
new Compound
(
new List<CompoundPart>()
{
new CompoundPart(UnitTypes.Length),
new CompoundPart(UnitTypes.Time, -2)
}
)
}
},
{
UnitTypes.Force, new Compound[]
{
new Compound
(
new List<CompoundPart>()
{
new CompoundPart(UnitTypes.Mass),
new CompoundPart(UnitTypes.Length),
new CompoundPart(UnitTypes.Time, -2)
}
),
new Compound
(
new List<CompoundPart>() { new CompoundPart(UnitTypes.Force) }
)
}
},
{
UnitTypes.Energy, new Compound[]
{
new Compound
(
new List<CompoundPart>()
{
new CompoundPart(UnitTypes.Mass),
new CompoundPart(UnitTypes.Length, 2),
new CompoundPart(UnitTypes.Time, -2)
}
),
new Compound
(
new List<CompoundPart>() { new CompoundPart(UnitTypes.Energy) }
)
}
}
}
Managed Operations
One of my concerns when firstly thinking about UnitParser was how to deal with the intrinsic difficulties of the associated numeric operations. On the one hand, you have a complex numeric reality formed by values, prefixes (e.g., 1 kg being equal to 1000 g) and conversions sometimes involving different exponents. On the other hand, the idea of having a single class dealing with all the wrong/right situations and internally managing the exceptions. All this reality seemed too much for the in-built numeric types or, at least, to imply a relevant amount of additional effort to reach a not-fully-controlled stage. That’s why implementing these managed operations was in my to-do list since the very first moment.
With managed operations, I refer to all the code dealing with the operations involving UnitP
and numeric variables. Numerically speaking, a UniP
instance is formed by a decimal
value, an int
base-ten exponent and, eventually, a prefix (e.g., new UnitP("1 kg")
is understood as value being 1, prefix 1000 and base-ten exponent zero; or value 1000, prefix 1 and base-ten exponent 0; or value 1, prefix 1 and base-ten exponent 3). This setup requires a special custom calculations and further issues like dealing with all the errors internally (or managing the errors, this is precisely where "managed operations" came from). Anecdotally, I have adapted this concept to NumberParser
, the second part of FlexibleParser
, which can also deal with numbers of any size and manage the errors internally.
The main code dealing with the managed operations is stored in the Operations_Private_Managed.cs file and, below these lines, you can find quite a descriptive sample.
private static UnitInfo ConvertBaseTenToValue(UnitInfo unitInfo)
{
if (unitInfo.BaseTenExponent == 0) return unitInfo;
UnitInfo outInfo = new UnitInfo(unitInfo);
bool decrease = unitInfo.BaseTenExponent > 0;
int sign = Math.Sign(outInfo.Value);
decimal absValue = Math.Abs(outInfo.Value);
while (outInfo.BaseTenExponent != 0m)
{
if (decrease)
{
if (absValue >= MaxValueDec / 10m) break;
absValue *= 10m;
outInfo.BaseTenExponent -= 1;
}
else
{
if (absValue <= MinValueDec * 10m) break;
absValue /= 10m;
outInfo.BaseTenExponent += 1;
}
}
outInfo.Value = sign * absValue;
return outInfo;
}
Using the Code
The first step is to add a reference to UnitParser.dll in your code (namespace FlexibleParser
). Note that UnitParser is also available as a NuGet package.
The main class is called UnitP
and can be instantiated in many different ways.
unitP = new UnitP("1 N");
unitP = new UnitP(1m, UnitSymbols.Newton);
unitP = new UnitP(1m, "nEwTon");
unitP = new UnitP(1m, Units.Newton);
UnitP
can be seen as an abstract concept including many specific types. Same-type variables can be added/subtracted. Different-type variables can be multiplied/divided, but only in case of generating a valid-type output.
unitP = new UnitP("1 N") + new UnitP(1m, Units.Newton);
unitP = new UnitP("1 N") * new UnitP("1 m");
unitP = new UnitP("1 N") * new UnitP("1 m") * new UnitP("1 m");
Main Variable Information
UnitP
variables are defined according to various readonly
fields populated at instantiation.
Unit
- Corresponding Units
member UnitType
- Corresponding UnitTypes
member UnitSystem
- Corresponding UnitSystems
member UnitParts
- Defining parts of the given unit UnitPrefix
- Supported prefix affecting all the unit parts BaseTenExponent
- Base-ten exponent used when dealing with too small/big value Error
- Variable storing all the error- and exception-related information
General Rules
All the functionalities are based upon the following ideas:
- In case of incompatibility, the first element is always preferred.
- By default, the formally-correct alternative is preferred. Some required modifications might be performed.
- By default, all the errors are managed internally.
unitP = new UnitP("1 m") + new UnitP("1 ft");
unitP = new UnitP("1 Km");
unitP = 999999999999999999999999999999999999.9 * new UnitP("9999999999999 St");
Unit String Parsing Format
The unit string
parsing part is quite flexible, but there are some basic rules.
- String multi-part units are expected to be exclusively formed by units, multiplication/division signs and integer exponents.
- Only one division sign is expected. The parser understands that all that lies before/after it is the numerator/denominator.
unitP = new UnitP("1m");
unitP = new UnitP("1 J*J/s*J2*J-1*s*s-1");
unitP = new UnitP("1 J*J/(s*J2*s)*J*s");
Numeric Support
Formally, two numeric types are supported: decimal
, almost everywhere; and double
, only in multiplication/division with UnitP
variables. Practically, UnitP
variables implement a mixed system delivering decimal
precision and beyond-double
-range support.
unitP = new UnitP("1 ft") * 7.891011m;
unitP = new UnitP("1 s") * 1213141516.0;
unitP = 0.0000000000000000000000000000000000000000000000001 *
new UnitP(0.000000000000000000001m, "ym2") /
new UnitP("999999999999999999999 Ym") / double.MaxValue / double.MaxValue;
Points of Interest
Relevant amount of high quality hardcoded information related to units of measurement. Part of it can be directly checked via APIs (e.g., enum
s or public
comments) and another part enjoyed by using different functionalities (e.g., fraction simplification and compound management).
Quite powerful parsing capabilities which allow this library, either directly or after minor modifications, to deal with a big number of raw-data scenarios involving units of measurement.
It can deal with as big as required numbers and manages all the errors internally.
Authorship
I, Alvaro Carballo Garcia, am the sole author of this article and all the referred UnitParser
/FlexibleParser
resources like code or documentation.