I'm making a program to generate a class with properties based on a dataset, (nothing fancy, just a tool to simplify life).
Right now I'm testing CSV-data of stars, as it a lot of columns, and it's somewhat messy data.
What I'm doing right now is that I take the dataset, extracts the headers, then generate a class, (removed about 30 lines for brevity):
class
{
public int Id { get; set; }
public string Hip { get; set; }
public string Hd { get; set; }
public string Hr { get; set; }
public decimal Ci { get; set; }
public decimal X { get; set; }
public decimal Y { get; set; }
public decimal Z { get; set; }
}
This is done by looping through the first row of data, evaluating the cells with this:
public static string Evaluate(string input)
{
string output = "string";
if (decimal.TryParse(input, out var result)) { output = "decimal"; }
if (DateTime.TryParse(input, out var result1)) { output = "DateTime"; }
if (int.TryParse(input, out var result2)) { output = "int"; }
if (input == "") { output = "string"; }
return output;
}
It's a bit clunky, but this isn't a beauty-contest
What I need to to is to evaluate the first 5 rows of data, then compare which datatype is the most likely. It turns out that just sampling the first line is a bad idea, so I'm looking for which type is correct for three out of five lines, (if no type has this it's a string).
The data:
id hip hd ra dec dist pmra pmdec
0 0 0 0 0 0
1 1 224700 0.00006 1.089009 219.7802 -5.2 -1.88
2 2 224690 0.000283 -19.49884 47.9616 181.21 -0.93
3 3 224699 0.000335 38.859279 442.4779 5.24 -2.91
4 4 224707 0.000569 -51.893546 134.2282 62.85 0.16
Note: The first dataline is Sol, ging it 0 as value in a column filled with decimals, so just checking the first, or one line would be folly, as it wouldn't handle a whole number not shown as 0.0.
The GitHub Project:
https://github.com/frankhaugen/class-from-dataset
What I have tried:
Yes, I've googled, but I to either lack the words, or there isn't any information on this
And I've looked into using ML.NET for the task, but it seems somewhat overkill, (but it might be fun)