Introduction
CmdTailCmd is a .NET library to parse a program's command tail and fill in public fields on an object. It is intended to initialize objects implementing some variant of the Command pattern from the Gang of Four book Design Patterns, although that isn't necessary. I expect most programs will want to incorporate the files directly rather than use it as a standalone assembly.
Programmers have only been handling command line parameters for 40 years or so. As with anything that is new, there are no well-known techniques or libraries that work for most applications. Or rather, as with anything that old, there are no well-known techniques or libraries that work for most applications: the venerable libraries don't always match modern tools and needs, and the new libraries haven't shaken out to a standard.
CmdTailCmd is a small and (very) simple-to-use library.
- Define a command object that describes an operation your program does.
- This object should contain a public field for each command line option you want to detect for this operation.
- You may also add methods for validation and execution, making this object fairly standalone.
- Initialize a library object with your command tail (the
args
parameter to your Main
function). - As the command tail object to fill out your operation object:
- The library will examine command line parameters, coerce types, call custom converters if you have any, read enumerations, handle case (in)sensitivity, and support (or prohibit) using substrings and alternate names.
- The library will return your object along with any errors, any unidentified parameters, and a copy of the command line for your examination.
- Look at your filled-in object. If it's valid, execute. If not, try a different command or show help.
As for help, the library object will also recognize common idioms for requesting help and can recognize which category of help was requested from your application-specific list.
Background
I write command line tools a lot, often as quick tests when building components. Command line handling is a nightmare of inconsistent traditions, odd special cases, and competing custom libraries with their own opinions and restrictions. Scripting environments set up much clearer rules than an arbitrary shell app enjoys, which can make scripting more appealing for some tasks, but sometimes you need to write an app.
I made do with the quite reasonable NConsoler library for a regex tool I needed last year. When I went to modify my tool, I realized that the assumptions in that library don't match my thinking; I had adapted how I wrote my code and how I used the app to match NConsoler's design. The design is reasonable, but I'd rather restrict my code as little as I can get away with.
When choosing a library, designing the command line parameters an app will accept, or writing code to parse the command tail, there are a lot more questions than are immediately obvious. For example:
- Whether parameter order is significant.
- What data types are allowed for a parameter.
Common types include string, integer, boolean, and date. You may also see string arrays or selections from a hard-coded list of strings (enumerations).
- What switch characters to allow.
Fortunately, '/' and '-' are all you usually need to worry about on Windows platforms, and only '-' on Unix-derived systems.
- What parameters can be included without any switch character or name ("dir just_a_name.txt").
Often the first or last undecorated token has some special meaning, such as input filename.
- How parameter values are demarked.
Common choices include "/file filename", "/file=filename", and "/file:filename"".
- How boolean values are demarked.
Common choices include "/b" (implicit true), "/b+", "/b true" for true. or "/b-" and "/-b" for false.
- How (and if) boolean values may be combined.
For example, "app /sb" for "app /s /b" and how that relates to "false" values ("app /sb-", "app /s-b", etc.).
- What special cases to support.
"-" is often used to represent stdin. This needs special handling since it can run afoul of switchchar detection.
- What special idioms to support.
Commonly, no parameters, "app /?", "app -h", "app --help", or "app help" are used for help. Also. "app help some_specific_help" is often used for detailed help. Commonly, "--" as a special switchchar means some parameter applying to the context or environment of the operation rather than the operation itself.
- Whether to allow unambiguous substrings as parameters ("/file" for "/file-to-encode").
- Whether to allow aliases for parameters ("/rtl" for "/right-to-left").
- And, specifically for libraries:
- When and how to show what errors (especially, should the library do that for you, should it format the error text, ...).
- Should/can the library generate help text.
- How you tell the library what parameters to look for, their types, and their semantics.
Many libraries ask for detailed structures or for attributed objects. Others answer requests for specific values from the tail rather than returning everything at once.
Every answer to each of these questions has merits in some given situation. As a community, our needs and expectations have changed through the decades. It isn't surprising that 40 years hasn't been enough to standardize this in the most general case; instead, it's been long enough to outdate standards.
The answers to these questions drive some interesting design issues in the parsing. For example, if implicit "true" is allowed for booleans using the form "/b" and if detached values are allowed for strings using the form "/detached value"--both very common decisions--it is impossible to determine if "/param text" is a boolean "param" being set to an implicit "true" or a string "param" being set to "text", without knowing the expected types.
Using the code
No details here, just examples for common cases to consume the library. All are for some made-up logging tool; please don't pick at the example app, it leaves a nasty scab.
Case 1: Nothing fancy
public enum LogOutputFormat {Raw, CSV, TabDelimited, XML, compressed, csv2}
public class MyActionParams
{
public string FileName;
public bool Append;
public LogOutputFormat OutputFormat;
public DateTime Touch;
public int Delay;
}
public static Main(string[] args)
{
CommandTail tail = new CommandTail(args);
CmdSettings<MyActionParams> Action = tail.Parse(new MyActionParams());
}
We can feed the program these command lines:
- app /FileName=somefile.log /Append /OutputFormat:TabDelimited -Touch 2011.04.01
Delay is still default(int) (zero).
- app /Append /Append- /Append true /Append:false /Append+--+-+
Each syntax works for assigning to Append. The last one on the command line (true
from the final + in the last parameter) wins.
- app /Out:Tab -File somefile.log
By default, unambiguous substrings can be used for property names and for enumeration values, so "/Out
" can be used instead of "/OutputFormat".
- app /filename somefile.log /outputformat raw
By default, names of settings ("filename" for "FileName") and enumeration values ("raw" for "Raw") are case insensitive--unless it introduces ambiguity.
- app /OutputFormat=c
Generates an error. "c
" is ambiguous. It could mean "CSV
" or "csv2
".
- app /out CSV
This isn't ambiguous since it matches one of the enumeration values exactly.
- app unknown_string /Append+ unknown_string_2
"unknown_string" isn't part of a recognized tag, so it appears in the "UndecoratedTokens" collection on the CmdSettings object's Context property. "unknown_string_2" also appears in UndecoratedTokens, since the token before it is complete.
- app /Append unknown_string_3
The undecorated tokens collection is empty. Since /Append is ambiguous (it could be an implicit true or it could be expecting a value in the next token), the next token is examined for a switchchar. Since it doesn't have one, the literal "undecorated_string_3" is coerced to a boolean and Append is set to false.
Case 2: Some finer control
Let's take some more control using attributes. Individual settings support these attributed options:
- These apply to any field on your object:
NameCaseSensitive
- The name of the parameter is case sensitive (default)NameAllowSubstring
- The name of the parameter can be shortened to an unambiguous initial substring (default)AlternateNames
- An array of alternate names the user may supply for a parameter (such as "/rtl" for "/right-to-left")
- These apply to enumerations:
ValueCaseSensitive
- The value of the parameter is case sensitiveValueAllowSubstring
- The value of the parameter can be shortened to an unambiguous initial substring
Example:
public class MyParams
{
[CmdTailSetting(AlternateNames = new string[]{"Name", "Logfile"})]
public string FileName;
[CmdTailSetting(NameAllowSubstring = false)]
public bool Append;
[CmdTailSetting(ValueCaseSensitive = true)]
public LogOutputFormat OutputFormat;
public DateTime Touch;
public int Delay;
}
Calling it:
- app /name logfile.log
You can use "/Name" or "/Logfile" (or any substring of those) instead of "/FileName"
- app -Appe false
Error. Append does not allow substrings. "Appe" is an unknown parameter and an exception is put in the ParsingExceptions
collection and optionally thrown.
Since the type of "Appe" is unknown, "false" is considered an undecorated token.
- app /out xml
Error. Case is significant even when unambiguous for the value of OutputFormat
(but not for the name OutputFormat
itself).
Case 3: Let's get smarter
Now that we know how to parse the parameters into a structure, let's see how to handle several different commands:
Let's say we want to allow these command lines:
- app /Dump /FileName filename [/append+-]
- app /Roll /FileName filename /ArchiveDir archive_dir [/OlderThan cutoff_date]
public class LogAppCommands : CmdTailCommand
{
public bool Dump = false;
public bool Roll = false;
}
public class DumpCommand : LogAppCommands, ICmdTailCommand
{
[CmdTailSetting(AlternateNames = new string[]{"Log", "Output"})]
public string FileName;
public bool Append = true;
public bool IsValid(CmdTailContext ctx)
{
return (FileName != null && FileName != string.Empty && Dump && !Roll);
}
public bool Execute(object o)
{
}
}
public class RollCommand : LogAppCommands, ICmdTailCommand
{
[CmdTailSetting(AlternateNames = new string[]{"Log"})]
public string FileName;
public DateTime OlderThan = DateTime.Now;
public DirectoryInfo ArchiveDirectory;
}
CommandTail tail = new CommandTail(args);
tail.AddCoercer(typeof(DateTime), (s) => new DirectoryInfo(s));
CmdSettings<DumpCommand> DumpParams = tail.Parse<DumpCommand>();
CmdSettings<RollCommand> RollParams = tail.Parse<RollCommand>();
if (DumpParams.Settings.IsValid()) {DumpParams.Settings.Execute();}
else if (RollParams.Settings.IsValid()) {RollParams.Settings.Execute();}
Case 4: Help handling
Getting help from a command line app is important and should be universal. The library has a couple of functions to encourage writing help:
CmdTailSettings.IsHelpRequest()
tries to determine if any common idiom for help was passed.CmdTailSettings.HelpRequestCategory<E>
tries to match the help request to any named value (from an enumeration)
For example:
public enum LogToolHelpCategories {General, Version, DumpingLogs, RollingLogs, Formats}
CommandTail tail = new CommandTail(args);
if (tail.IsHelpRequest())
{
LogToolHelpCategories cat = tail.HelpRequestCategory<LogToolHelpCategories>();
Console.WriteLine(LogToolHelpText[cat]);
return 0;
}
Now the user can request help:
c:>app
c:>app /?
c:>app help
c:>app --help
c:>app help rolling
Some idioms when planning your command line handling
When using the library, I find some idioms clean up my code and my design.
Idiom 1: Put shared parameters in a base class
Programs often have multiple commands but have some parameters common to all (or most) commands. These may be metaparameters not specific to any command (e.g., a computer to connect to or whether to use UTF8 in the output), or they may be parameters common to any operation the program could support (e.g., input filename).
Create a base class which contains fields for these parameters, and inherit your command objects from it. This keeps the name, attributes, and type the same across every command object, helping keep the user from being confused.
public class CommonParameters
{
public string InputFile;
public bool UTF8;
}
public class ValidateFile : CommonParameters
{
public bool ExitOnError = false;
}
public class ImportParameters : CommonParameters
{
public string DatabaseName;
}
Idiom 1a: Treat disallowed parameters like shared parameters
Sometimes you want to check that a parameter was not passed, usually because it would indicate that the user is confused. Treat these like shared parameters so you can test that they have not been set. Be certain not to initialize the field in the base class, so you can test the value against null.
This only works with nullable types, most commonly strings, and with enumerations with a default value that is not valid for the user to set.
Idiom 2: Identify command mode with bools
Many programs have several modes. Imagine a media app that can validate, analyze, stream, and play a file. If you create boolean parameters for each mode and place them in a base class, you can easily identify which mode was requested, test for confused user's mixing modes, and give the user a simple command syntax.
public class Modes
{
public bool Validate = false;
public bool Analyze = false;
public bool Stream = false;
public bool Play = false;
}
public class Validate : Modes, ICmdTailCommand
{
public bool IsValid(CmdTailContext ctx)
{
if (!Validate) return false;
if (Analyze || Stream || Play) return false;
}
}
Now the user can use the implicit true syntax to select a mode:
app /validate ...
app /stream ...
app /p ...
Small note: default (bool) is false, so the explicit assignment in the Modes
constructor is not necessary (and FxCop will yell at you for double assignment, since the compiler will stupidly construct, assign the default, and then re-assign it). It's definitely worth the explicit assignment, though. The double assignment should optimize away, but even if it doesn't, you don't know who will be maintaining this code next year; if they aren't thinking about the default (or don't know the default), they should see the value instead of risking making an incorrect assumption. Don't let bad implementation in the compiler lead you to bugs in your application.
Idiom 3: Use Partial Classes to group parameters
C# borrows Java's "put each class in one file and don't group or structure the class layout" philosophy. In general, programmers like it and it's seen as a good thing; it's a reaction against C++'s separation of class structure from class implementation, and relies on clever IDEs to create the structured class metadata when programmers need it since IDEs didn't create combined views of C++ header and implementation files.
This can make identifying the set of parameters a command uses difficult, and it can make comparing parameters between commands very difficult. Since it is important to have consistency in command elements (e.g., casing, tense, name choice), we want some clear way to visualize the parameters.
C#'s partial classes give us a good way to do that. If you make one file for your parameter layouts and declare your command objects partial, you can put all of your parameter information together and still separate your command implementation into a file with all your properties and methods.
CommandLineParameters.cs
public partial class Mode1Command
{
public string SomeParameter;
public bool SomeSwitch;
}
public partial class Mode2Command
{
public int Count = 1;
}
Mode1.cs
public partial class Mode1Command
{
public bool IsValid(CmdTailContext ctx)
}
Idiom 4: Disable substring matching to enable substring matching
If you have two parameters with a shared initial substring, you may want one to be easily abbreviated more than the other. Disable initial substring matching on one and the other can be abbreviated. Similarly, if one parameter is a complete initial substring of another, disabling substring matching on the shorter will allow the longer to be abbreviated but still allow setting the shorter by a complete name since exact matches take precedence over substrings.
If two parameters have a shared initial substring, you can still use AlternateNames
to allow unambiguous substrings while defaulting ambiguous substrings to one of them.
public class Parameters
{
[CmdTailSetting(NameAllowSubstring = false
AlterateNames = new string[] {"CountD",
"CountDi",
"CountDis",
"CountDisp",
"CountDispl",
"CountDispla"})]
public bool CountDisplay = true;
[CmdTailSetting(NameAllowSubstring = false)] public int Count = 1;
public string CounterName;
}
Now you can call the app like:
- app /C cname: Substrings default to
CounterName
- app /Count n: Exact matches are highest precedence
- app /CounterDisp-:
AlternateNames
allows this
Design
Major design goals:
- No required command line structure (although tokens may need requirements). For example, NConsoler is a great library, but the need to have an undecorated first parameter that indicates mode wasn't working for me.
- Access to "undecorated" tokens on the line. I really wanted one incredibly common token to be passed without a name, like the filename argument to dir.
- Easy to change the parameters allowed. I do a lot of trial and error while coding quick tools, and I do many iterations when coding production projects. I'm likely to change the command line pretty often.
- Easy help. Even on quick tools, I like good command line help, mostly because I never remember what to do for something I may run several times in one day but only once a month.
- Enumerations. I had hacked this into the NConsoler library for my regx tool and I can't really live without it now.
Things I didn't care about for this:
- Speed. You parse the command line once at the start of a run. If it takes a few extra milliseconds, so be it. If you have a tool that is called extremely often, or if it is running in a restricted environment, it may not be the best choice. For general tools on a basic Windows box, optimization for speed would be ridiculous.
- Unused generalization. It's probably got more than I need in it, but I tried hard to cut back on what I wasn't using.
Some current limitations/NYI/ideas for extension:
- It can't support "-" as a value, which is annoying at times.
- You can't specify negative numbers as detached tokens ("/n -5").
- Parameter names that start with "--" can be supported with the
AlternateNames
attribute, but that is far from ideal. - Doesn't support /-W for false.
- You can't isolate parameters and reuse them across different command objects, which some programs do ("app /command1 /file f1 /command2 /file f2", etc.).
- Some type coercions may succeed when you'd rather they failed.
- You're on your own for mapping where undecorated tokens fall in the token sequence, if that's important ("/files f1 f2 f3").
- Supplying a value to the same field several times is quietly hidden, with each assignment happening as the command line is parsed. You can't detect it if it's an error for you.
- No arbitrary numbered parameters ("/p1 f1 /p2 f2 ..." where p1 and p2 are not fields, but represent some array; this is sometimes used to allow the user to specify an arbitrary number of values).
- It doesn't generate usage text.
Implementation
There are two obvious ways to structure the parsing:
- Examine the output and search the command line for matching tokens, or
- Walk the tokens building name/value pairs and try to match those to the output.
When examining the tokens, there are two general issues:
- Recognizing whether a token is a new parameter or a continuation of the last token, and
- Getting a token to the right data type.
The library walks the tokens and makes one pass through them, matching to public fields found by Reflection. It's a simple two-deep state tree unrolled into a function rather than state objects. The tokens are put into a queue.
- If the queue is empty, we're done.
- Pop the first token off the queue.
- If it isn't a switchchar, add the token to the Undecorated Tokens collection and go back to the start.
- Split the token into a name, any plus/minus symbols, and any attached value. Do some basic validation.
- Find the field matching the name. If there isn't one, or if there is more than one, generate the appropriate exception.
- Switch on the type of the field:
- Bool
- If it's plus/minus, get the last character and set the field.
- If it's an attached string, coerce to bool and set the field.
- Examine the head of the queue:
- Queue empty, implicit true.
- Starts with a switchchar, implicit true.
- Otherwise, pop the head of the queue and coerce to bool.
- Assign.
- Other
- If we don't have an attached value, examine the head of the queue.
- Queue empty, generate missing value exception.
- Starts with switchchar, generate missing value exception.
- Pop the head of the queue and use it as the value.
- Examine the coercers collection for the field type. If we have one, call it.
- If it's an enumeration, match the value string against the names in the enumeration.
- Look for a static method
Parse(string)
on the type and use that. - Pass the string to the assignment and see if .NET can do it for us.
- Loop on back.
Idioms in the implementation
Nothing really special in the code. Here are a few random syntactical points I find interesting or entertaining:
- The code as-is uses some LINQ syntax and lambdas, but nothing that couldn't be back-ported to older .NET versions if you need to. It would get harder to read, though.
- In some places, you'll see potentially-interesting nullable constructs like:
func(SomeType st){SomeType s = st ?? new SomeType();}
or the much more common:
bool? nb;
bool b = nb ?? false;
Another well-known--but interesting if you haven't seen it--construct is chained ternary operators:
BoolFormat bf = (Bools != string.Empty) ? BoolFormat.Explicit
: (Value != string.Empty) ? BoolFormat.AttachedString
: BoolFormat.Other;
This tests the condition on the left of the question mark and returns the value on the right, proceeding line by line in order. Much easier to read than a string of if/else
.
History
- 2011/04/18: Version 1.0.
- 2011/04/22: Article text updated.