(untagged)

UnitParser

varocarbas

0.00/5 (No votes)

1 Jun 2018

Comprehensive unit parsing library

Download source code - 436.5 KB

Introduction

UnitParser provides a reliable way to easily interact with a wide variety of units of measurement. Its adaptable format allows to intuitively account for unit-related information in virtually any situation.

It also includes further relevant features, like configurable exception triggering or gracefully managing numeric values of any size.

UnitParser is the first part of FlexibleParser, a multi-purpose group of independent .NET parsing libraries (the second part in codeproject.com: NumberParser).

This article refers to UnitParser v. 1.0.9.0 (stable).

Additionally, note that there is a web API and a Java version of this library.

Background

Units of measurement represent a complex reality which, even nowadays, hasn't been fully systematised.

Traditionally, most of the software approaches have been mainly focused on the simplest sides of the problem: individual units/conversion factors. Most packages usually ignore issues like different systems of units (e.g., SI or Imperial), compound units (e.g., kg*m/s^2 being equal to N) or simplifications (e.g., kg*m/kg being equal to m). It is also quite difficult to find a piece of software converting string-based inputs into safe programming structures allowing to easily manage this complex reality.

UnitParser aims to overcome the aforementioned usual limitations by applying the following ideas:

Comprehensive classifications accounting for all the possible scenarios. Each unit is defined according to: system (SI, Imperial, USCS or CGS), type (all the types), specific name (all the names) symbol/abbreviation (all the symbols) if applicable, prefix (all the prefixes) if applicable and constituent parts (e.g., N formed by kg, m and s^-2).
Programmer-friendly structures allowing to deal with any situation as intuitively as possible. There is a main class (UnitP) dealing with all the possible scenarios and managing potential errors/incompatibilities internally. For example: new UnitP("m/s") is fine (m/s is a valid SI velocity unit), but new UnitP("m*s") is not.
Minimal user input and rules applied systematically and consistently. For example, in case of incompatibility, the conditions defined by the first element starting from the top left will always prevail.
Eminently focused on newer/commonly-used units.
Usage of formally-correct alternatives when possible.
Creation of custom formats only when strictly required. They have always to be simple and consistent.
All the classifications apply the rule "in case of doubt, default/none".

Code Analysis

The UnitParser code is quite big and complex. It doesn’t have a well-defined structure which might be easily summarised. It doesn’t include specific parts worth highlighting due to its difficulty or the peculiarities of its implementation. I have a relevant experience in developing reasonably complex pieces of software from scratch, the .NET Framework/C# and units of measurement; the current code is simply a consequence of this reality: an experienced developer coming up with a comprehensive solution for a complex problem with which he feels very comfortable. That’s why I think that the best way to get a good idea about this code is to properly analyse it; for example, via debugging its descriptive test sample code (in GitHub).

I submitted an article about a preliminary version of this library some months ago. I didn’t get very good feedback back then, mainly because of not having explained the code in too much detail. Even despite not liking that feedback too much and finally deleting that submission, I did kind of apply those ideas and wrote descriptive texts for the most relevant parts of the code. The current analysis is different, but also includes references to the most relevant parts of these other resources.

Hopefully, what I am including below these lines will help people get interested in UnitParser and go ahead with the aforementioned ideal-IMO proper analysis of the code.

One Class to Rule Them All: UnitP

One of the defining features of UnitParser, as opposed to other software solutions, is to rely on just one class for all the units. Since the very first moment, my intention was to develop a very comprehensive approach, where the huge number of scenarios advise against the a priori more logical multi-class alternative.

UnitP has a big number of different constructors supporting many different scenarios and is defined by the public variables listed in the code excerpt below:

///<summary><para>Basic UnitParser class containing all the information
///about units and values.</para></summary>
public partial class UnitP
{
    ///<summary><para>Member of the Units enum which best suits the 
    ///current conditions.</para></summary>
    public readonly Units Unit;

    ///<summary><para>Member of the UnitTypes enum which best suits the 
    ///current conditions.</para></summary>
    public readonly UnitTypes UnitType;

    ///<summary><para>Member of the UnitSystems enum which best suits the 
    ///current conditions.</para></summary>
    public readonly UnitSystems UnitSystem;

    ///<summary><para>Prefix information affecting all the unit parts.</para></summary>
    public readonly Prefix UnitPrefix = new Prefix();

    ///<summary><para>List containing the basic unit parts which define the
    ///current unit.</para></summary>
    public ReadOnlyCollection<UnitPart> UnitParts;

    ///<summary><para>String variable including the unit information which was 
    ///input at variable instantiation.</para></summary>
    public readonly string OriginalUnitString;

    ///<summary><para>String variable containing the symbol(s) best describing 
    ///the current unit.</para></summary>
    public readonly string UnitString;

    ///<summary><para>String variable including both numeric and unit 
    ///information associated with the current conditions.</para></summary>
    public readonly string ValueAndUnitString;

    ///<summary><para>Base-ten exponent used when dealing with too small/big 
    ///numeric values.</para></summary>
    public readonly int BaseTenExponent;

    ///<summary><para>ErrorInfo variable containing all the error- and 
    ///exception-related information.</para></summary>
    public readonly ErrorInfo Error;

    ///<summary><para>Decimal variable storing the primary numeric information 
    ///under the current conditions.</para></summary>
    public decimal Value { get; set; }

    //etc.
}

UnitP is mostly defined by readonly fields, a normal consequence of its doing-everything-internally nature. Note that other approaches like automatically-updated getter/setters would have implied a relevant increase of complexity because of the big number of internal checks associated with even slight variations of the input conditions. Thanks to this readonly setup, the whole process can be simplified into three main parts:

Well-defined inputs provided via one of the public constructors
Specific analysis on account of the type of inputs under consideration
Population of the readonly public variables

In summary, all the UnitP instantiated variables can be assumed to be valid and either include supported unit information or an error (by default, managed internally without throwing any exception).

UnitP newton = new UnitP(Units.Newton);
UnitP newton2 = new UnitP("kg*m/s^2");
UnitP newton3 = new UnitP("kg*m*s-2");
UnitP newton4 = new UnitP("1000 g*m/s2");

UnitP wrong1 = new UnitP("asfasf");
UnitP wrong2 = new UnitP("kG*m/s2");
UnitP wrong3 = new UnitP("kg*m*s-3");
UnitP wrong4 = new UnitP("10_0 g*m/s2");

The four first variables are valid instances and all of them are identical to each other (1 N, SI, force). The four variables below are all wrong, an issue which is indicated via the Error field but without throwing an exception (this can only be done by relying on certain constructors).

Operations Between UnitP Instances

Another important aspect of relying on one class to deal with such a big number of different scenarios is making sure that all the operations between instances of that class follow the expected rules. When dealing with units of measurement and numerical values, this refers at least to arithmetic and comparison operations.

All the UnitP public overloads/implicit operations are stored in the Operations/Operations_Public.cs file; a small sample of that code:

public partial class UnitP : IComparable<UnitP>
{
    ///<summary><para>Compares the current instance against another UnitP one.</para></summary>
    ///<param name="other">The other UnitP instance.</param>
    public int CompareTo(UnitP other)
    {
        return
        ( 
            this.BaseTenExponent == other.BaseTenExponent ?
            (this.Value * this.UnitPrefix.Factor).CompareTo
            (other.Value * other.UnitPrefix.Factor) :
            (this.BaseTenExponent.CompareTo(other.BaseTenExponent)
       );
    }

    ///<summary><para>Creates a new UnitP instance by relying on the most 
    ///adequate constructor.</para></summary>
    ///<param name="input">String input.</param>
    public static implicit operator UnitP(string input)
    {
        return new UnitP(input);
    }

    ///<summary><para>Creates a new UnitP instance by relying on the most 
    ///adequate constructor.</para></summary>
    ///<param name="input">Decimal input.</param>
    public static implicit operator UnitP(decimal input)
    {
        return new UnitP(input);
    }

    ///<summary><para>Creates a new UnitP instance by relying on the most 
    ///adequate constructor.</para></summary>
    ///<param name="input">Units input.</param>
    public static implicit operator UnitP(Units input)
    {
        return new UnitP(input);
    }

    ///<summary>
    ///<para>Adds two UnitP variables by giving preference to the configuration 
    ///of the first operand.</para>
    ///<para>Different unit types will trigger an error.</para>
    ///</summary>
    ///<param name="first">Augend. In case of incompatibilities, its configuration 
    ///would prevail.</param>
    ///<param name="second">Addend.</param>
    public static UnitP operator +(UnitP first, UnitP second)
    {
        return PerformUnitOperation
        (
            first, second, Operations.Addition,
            GetOperationString(first, second, Operations.Addition)
        );
    }

    //etc.
}

Before performing any operation, both UnitP instances have to be analysed and might eventually be modified. During these pre-checks, the following fields are being taken into account:

UnitType. Operations between different types can only happen under certain conditions, as explained below.
UnitParts. This collection includes the most accurate definition of the given unit. Operations between instances of UnitP with different UnitParts are possible, but automatic conversions are likely to happen (read below).
Error. One or both instances being wrong might affect the output of the operation.
Numeric fields (i.e., Value, BaseTenExponent and UnitPrefix). After having confirmed that both instances are compatible and having performed all the required actions (e.g., conversions), the corresponding operation is performed by bringing the numeric fields into picture.

By continuing with one of the code samples above, consider the following operations:

UnitP allNewtons = newton + newton2 + newton3;
UnitP wrongOperation = new UnitP("m") + newton;

allNewtons is a valid instance (3 N, SI, force), but wrongOperation is not because metres and newtons cannot be added.

When dealing with units of measurement, addition/subtraction (only the values are affected) are treated differently than multiplication/division (as a result of the operation, a new unit is created). These peculiarities are respected by UnitParser at each single level; for example, new UnitP("m")/new UniP("s") outputs the same than new UnitP("m/s"), metre per second (SI, speed).

In any case, it is recommendable to rely on the string-based approach for relatively complex operations because each operator overloading is analysed individually and this might provoke some situations to be misassessed. For example, new UnitP("m*s/s") is fine (1 metre); but new UnitP("m") * new UnitP("s") / new UnitP("s") is wrong (an error is triggered when analysing new UnitP("m") * new UnitP("s") ).

Unit Parsing Peculiarities

One of the main goals of UnitParser is to be as intuitive as possible. Supporting input strings is one of the ways to accomplish that goal, but it also opens a big number of possible scenarios: a wide variety of valid, invalid and even weird but technically correct inputs.

The code taking care of all the unit-string-parsing parts is fairly complex and can be found in various files inside the Parse folder. Below these lines, I am including the method where all the compound (i.e., units formed by more than one contituent element) parsing actions are started.

private static ParseInfo StartCompoundAnalysis(ParseInfo parseInfo)
{

    if (parseInfo.UnitInfo.Error.Type != ErrorTypes.None)
    { 
        return parseInfo;
    }

    if (parseInfo.ValidCompound == null)
    {
        parseInfo.ValidCompound = new StringBuilder();
    }

    parseInfo.UnitInfo = RemoveAllUnitInformation(parseInfo.UnitInfo);

    //Knowing the initial positions of all the unit parts is important because of the defining
    //"first element rules" idea which underlies this whole approach. Such a determination 
    //isn't always straightforward due to the numerous unit part modifications.
    parseInfo.UnitInfo = UpdateInitialPositions(parseInfo.UnitInfo);

    //This is the best place to determine the system before finding the unit, because the
    //subsequent unit part corrections might provoke some misunderstandings on this front
    //(e.g., CGS named compound divided into SI basic units).
    parseInfo.UnitInfo.System = GetSystemFromUnitInfo(parseInfo.UnitInfo);

    //This is also an excellent place to correct eventual system mismatches. For example:
    //N/pint where pint has to be converted into m3, the SI (first operand system) basic
    //unit for volume.
    parseInfo.UnitInfo = CorrectDifferentSystemIssues(parseInfo.UnitInfo);
    parseInfo.UnitInfo = ImproveUnitParts(parseInfo.UnitInfo);

    if (parseInfo.UnitInfo.Type == UnitTypes.None)
    {
        parseInfo.UnitInfo = GetUnitFromParts(parseInfo.UnitInfo);
    }

    parseInfo.UnitInfo = UpdateMainUnitVariables(parseInfo.UnitInfo);
    if (parseInfo.UnitInfo.Unit == Units.None)
    {
        parseInfo.UnitInfo.Error = new ErrorInfo(ErrorTypes.InvalidUnit);
    }
    else parseInfo = AnalyseValidCompoundInfo(parseInfo);

    return parseInfo;
}

UnitParser can deal with the following string input scenarios:

Valid symbols, common abbreviations and names: new UnitP("s"), new UnitP("sec") and new UnitP("seConD") are valid ways to refer to 1 second.
Constituent parts of a compound unit: new UnitP("kg*m/s2") understood as 1 N.
Convertible-to-each-other units forming a compound: new UnitP("kg*ft/s2") understood as 0.3048 N (first unit, kg, indicates that SI should be considered; ft doesn’t belong to SI, but it is directly convertible via 0.3048 m).

There are some rules which have to be always observed when parsing strings consisting in multiple parts:

Starting from the top left, the first unit with a supported system (note that there is a relevant number of units assumed to not belong to any system) defines the system for the whole compound. This is helpful to determine the target for eventual conversions of constituent elements (i.e., default unit for the given type and system). The 0.3048 N example above gives a quite descriptive idea about this specific scenario.
Only one division sign is expected and it separates numerator and denominator.

High Quality Information

Any tool dealing with units of measurement and all what is related to them (representations, classifications, conversions, etc.) has to rely on a very relevant amount of hardcoded information. While developing UnitParser, I did quite an important effort to collect high quality information of different types.

The most relevant hardcoded information of UnitParser is stored in the files under the Keywords folder. This includes not just simpler formats (e.g., symbols or conversion factors), but also more complex ones like the definition of compounds, as shown in the code excerpt below:

//Contains the definitions of all the supported compounds, understood as units formed by
//other units and/or variations (e.g., exponents different than 1) of them.
//In order to be as efficient as possible, AllCompounds ignores the difference between 
//dividable and non-dividable units. For example: N is formed by kg*m/s2, exactly what 
//this collection expects; on the other hand, lbf isn't formed by the expected lb*ft/s2. 
//In any case, note that this "faulty" format is only used internally, never shown to 
//the user.
//NOTE: the order of the compounds within each type does matter. The first position is 
//reserved for the main fully-expanded version (e.g., mass*length/time2 for force). In 
//the second position, the compound basic units (e.g., force) are expected to have their 
//1-part version (e.g., 1 force part for force).
private static Dictionary<UnitTypes, Compound[]> AllCompounds = new Dictionary<UnitTypes, Compound[]>()
{
    {
        UnitTypes.Area, new Compound[]
        {
            new Compound
            (
                new List<CompoundPart>() { new CompoundPart(UnitTypes.Length, 2) }
            ),
            new Compound
            (
                new List<CompoundPart>() { new CompoundPart(UnitTypes.Area) }
            )
        }
    },
    {
        UnitTypes.Volume, new Compound[]
        {
            new Compound
            (
                new List<CompoundPart>() { new CompoundPart(UnitTypes.Length, 3) }
            ),
            new Compound
            (
                new List<CompoundPart>() { new CompoundPart(UnitTypes.Volume) }
            )
        }
    },
    {
        UnitTypes.Velocity, new Compound[]
        {
            new Compound
            (
                new List<CompoundPart>()
                {
                    new CompoundPart(UnitTypes.Length),
                    new CompoundPart(UnitTypes.Time, -1)
                }
            )
        }
    },
    {
        UnitTypes.Acceleration, new Compound[]
        {
            new Compound
            (
                new List<CompoundPart>()
                {
                    new CompoundPart(UnitTypes.Length),
                    new CompoundPart(UnitTypes.Time, -2)
                }
            )
        }
    },
    {
        UnitTypes.Force, new Compound[]
        {
            new Compound
            (
                new List<CompoundPart>()
                {
                    new CompoundPart(UnitTypes.Mass),
                    new CompoundPart(UnitTypes.Length),
                    new CompoundPart(UnitTypes.Time, -2)
                }
            ),
            new Compound
            (
                new List<CompoundPart>() { new CompoundPart(UnitTypes.Force) }
            )
        }
    },
    {
        UnitTypes.Energy, new Compound[]
        {
            new Compound
            (
                new List<CompoundPart>()
                {
                    new CompoundPart(UnitTypes.Mass),
                    new CompoundPart(UnitTypes.Length, 2),
                    new CompoundPart(UnitTypes.Time, -2)
                }
            ),
            new Compound
            (
                new List<CompoundPart>() { new CompoundPart(UnitTypes.Energy) }
            )
        }
    }

    //etc.
}

Managed Operations

One of my concerns when firstly thinking about UnitParser was how to deal with the intrinsic difficulties of the associated numeric operations. On the one hand, you have a complex numeric reality formed by values, prefixes (e.g., 1 kg being equal to 1000 g) and conversions sometimes involving different exponents. On the other hand, the idea of having a single class dealing with all the wrong/right situations and internally managing the exceptions. All this reality seemed too much for the in-built numeric types or, at least, to imply a relevant amount of additional effort to reach a not-fully-controlled stage. That’s why implementing these managed operations was in my to-do list since the very first moment.

With managed operations, I refer to all the code dealing with the operations involving UnitP and numeric variables. Numerically speaking, a UniP instance is formed by a decimal value, an int base-ten exponent and, eventually, a prefix (e.g., new UnitP("1 kg") is understood as value being 1, prefix 1000 and base-ten exponent zero; or value 1000, prefix 1 and base-ten exponent 0; or value 1, prefix 1 and base-ten exponent 3). This setup requires a special custom calculations and further issues like dealing with all the errors internally (or managing the errors, this is precisely where "managed operations" came from). Anecdotally, I have adapted this concept to NumberParser, the second part of FlexibleParser, which can also deal with numbers of any size and manage the errors internally.

The main code dealing with the managed operations is stored in the Operations_Private_Managed.cs file and, below these lines, you can find quite a descriptive sample.

private static UnitInfo ConvertBaseTenToValue(UnitInfo unitInfo)
{
    if (unitInfo.BaseTenExponent == 0) return unitInfo;

    UnitInfo outInfo = new UnitInfo(unitInfo);
    bool decrease = unitInfo.BaseTenExponent > 0;
    int sign = Math.Sign(outInfo.Value);
    decimal absValue = Math.Abs(outInfo.Value);

    while (outInfo.BaseTenExponent != 0m)
    {
        if (decrease)
        {
            if (absValue >= MaxValueDec / 10m) break;
            absValue *= 10m;
            outInfo.BaseTenExponent -= 1;
        }
        else
        {
            if (absValue <= MinValueDec * 10m) break;
            absValue /= 10m;
            outInfo.BaseTenExponent += 1;
        }
    }

    outInfo.Value = sign * absValue;

    return outInfo;
}

Using the Code

The first step is to add a reference to UnitParser.dll in your code (namespace FlexibleParser). Note that UnitParser is also available as a NuGet package.

The main class is called UnitP and can be instantiated in many different ways.

//1 N. UnitP 
unitP = new UnitP("1 N"); 

//1 N. 
unitP = new UnitP(1m, UnitSymbols.Newton); 

//1 N. 
unitP = new UnitP(1m, "nEwTon"); 

//1 N. 
unitP = new UnitP(1m, Units.Newton);

UnitP can be seen as an abstract concept including many specific types. Same-type variables can be added/subtracted. Different-type variables can be multiplied/divided, but only in case of generating a valid-type output.

//2 N.
unitP = new UnitP("1 N") + new UnitP(1m, Units.Newton);

//1 J.
unitP = new UnitP("1 N") * new UnitP("1 m");

//Error not triggering an exception. 
//The output unit N*m^2 doesn't match any supported type.
unitP = new UnitP("1 N") * new UnitP("1 m") * new UnitP("1 m");

Main Variable Information

UnitP variables are defined according to various readonly fields populated at instantiation.

Unit - Corresponding Units member
UnitType - Corresponding UnitTypes member
UnitSystem - Corresponding UnitSystems member
UnitParts - Defining parts of the given unit
UnitPrefix - Supported prefix affecting all the unit parts
BaseTenExponent - Base-ten exponent used when dealing with too small/big value
Error - Variable storing all the error- and exception-related information

General Rules

All the functionalities are based upon the following ideas:

In case of incompatibility, the first element is always preferred.
By default, the formally-correct alternative is preferred. Some required modifications might be performed.
By default, all the errors are managed internally.

//1.3048 m.
unitP = new UnitP("1 m") + new UnitP("1 ft"); 

//Error not triggering an exception. 
//The parser expects "km" or a full-name-based version like "KiLom".
unitP = new UnitP("1 Km"); 

//999999.999999900000 * 10^19 YSt.
unitP = 999999999999999999999999999999999999.9 * new UnitP("9999999999999 St");

Unit String Parsing Format

The unit string parsing part is quite flexible, but there are some basic rules.

String multi-part units are expected to be exclusively formed by units, multiplication/division signs and integer exponents.
Only one division sign is expected. The parser understands that all that lies before/after it is the numerator/denominator.

//Error not triggering an exception. 
//The parser expects "1 m" or any other version including a separating blank space.
unitP = new UnitP("1m"); 

//1 W.
unitP = new UnitP("1 J*J/s*J2*J-1*s*s-1");

//Error not triggering an exception. 
//The parser understands "J*J/(s*J2*s*J*s)", what doesn't represent a supported type.
unitP = new UnitP("1 J*J/(s*J2*s)*J*s");

Numeric Support

Formally, two numeric types are supported: decimal, almost everywhere; and double, only in multiplication/division with UnitP variables. Practically, UnitP variables implement a mixed system delivering decimal precision and beyond-double-range support.

//7.81011 ft.
unitP = new UnitP("1 ft") * 7.891011m;

//1213141516 s.
unitP = new UnitP("1 s") * 1213141516.0;

//0.0003094346047382564187537561*10^-752 ym.
unitP = 0.0000000000000000000000000000000000000000000000001 * 
new UnitP(0.000000000000000000001m, "ym2") / 
new UnitP("999999999999999999999 Ym") / double.MaxValue / double.MaxValue;

Points of Interest

Relevant amount of high quality hardcoded information related to units of measurement. Part of it can be directly checked via APIs (e.g., enums or public comments) and another part enjoyed by using different functionalities (e.g., fraction simplification and compound management).

Quite powerful parsing capabilities which allow this library, either directly or after minor modifications, to deal with a big number of raw-data scenarios involving units of measurement.

It can deal with as big as required numbers and manages all the errors internally.

Authorship

I, Alvaro Carballo Garcia, am the sole author of this article and all the referred UnitParser/FlexibleParser resources like code or documentation.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here