|
Hi Roger,
Roger Wright wrote: maybe it would be better to just import what I have to SQL Server and let a query sort out the rows with missing fields
I don't think so. Partly because my knowledge of SQL is limited, mainly because I don't want to feed data I don't understand well to another big system. Separation of concerns: first clean up the data, then process it.
Some comments:
1.
a small data sample could have been helpful.
2.
How big is a typical file? Maybe header and tail removal could be handled best by Regex.
3.
I would be worried if relevant data ends up in array element 1 when you expect it in element 0. No, Split is not 1-based at all. You probably have an unexpected character somewhere (maybe your line separators are CR+LF, and one of them becomes the first char of your line?); I would analyze this and deal with it properly. Don't rely on code that seems to fix something you don't understand.
4.
I guess you want reliability and flexibility; i.e. the data format might slightly change over time (or vary from file to file), and you probably want that to get handled correctly, and when a problem occurs to get noticed right away. It would suck if there wasn't some kind of checksum. You did mention a total you want to skip; I would not skip it, I would try and recognize it (the last useful data line of the file?), compare it to the sum of all the previous data I recognized, and when equal, just skip it; when unequal, raise an alarm.
|
|
|
|
|
Good points, all of them. As for a sample:
Date Range = Yesterday (09/01/2010)
Time Zone = MST
Sign Convention = GEN + LOAD + OTHER +
Include Zero = Yes
Execution Time: 09/02/2010 07:00 CST
"Zone = RMS / AMPS"
"Date","Meter","HE1","HE2","HE3","HE4",...,"HE24","On Peak","Off Peak","Total"
"09/01/2010","BCI891 ","0.000","0.000","0.000","0.000",...,"0.000","45.000","0.000","45.000"
"","DAD348 Out","0.000","0.000","0.000","0.000",...,"4.634","4.382","75.264","9.016","84.280"
"","NON921A Out","-6.200","-5.800","-5.600","-5.300",...,"-7.300","-150.700","-49.200","-199.900"
"","TOP928 In","0.000","0.000","0.000","0.000",...,"0.000","0.000","0.000","0.000"
"","TOP928 Out","7.490","6.940","6.740","6.440",...,"8.750","179.920","58.980","238.900"
"09/01/2010","","1.290","1.140","1.140","1.140",...,"5.832","149.484","18.796","168.280"
Total: 5 records
The above is typical, exported from Excel as a .csv file and emailed to me by an automated system. The headings HEx mean "Hour Ending", so I edited out most of them, as they aren't relevant to seeing the pattern. I've taken your suggestion and read up on Regex - what a pain! I don't know when it changed, but it's no longer possible to print "this topic and all sub-topics" from the VS2008 Help system.
In looking at it tonight, I think I see an approach that is fairly simple. As I read each line, I can look at the first 4 or 5 characters and tell what kind of line it is, dumping any that aren't data lines. The String.Length() function will make it easy to drop blank lines, too. A line that begins with a date is then either the beginning of data, or the summary line with column totals, and a little string footwork can patch the date into the lines that are missing that item so that they all have the same format. A Regex will be handy to extract the number of lines from the last line, just to make sure I've got all of it in my result set.
I'm not sure what to do with all the data values that should be numerics - I'd like to strip all the quotes out before importing to a dataset, excepting the text containing the meter name, to save having to do conversions from string to float after I import them, but that seems like more than it's worth. I hate to hardcode formats into the program, just in case the format changes one day, but I don't see any way around it.
Thanks for the suggestions!
Will Rogers never met me.
|
|
|
|
|
Hi Roger,
Yes, Regex looks a bit terrifying; I do not suggest to do everything with regexes (others might, I don't); I only suggest to use Regex for rather simple operations that would take quite some code doing it otherwise.
The example suggests the following approach (there may be alternatives, this is my first impression anyway):
1. a Regex could locate the line containing "Total: # records" and extract the record count "RC".
2. another Regex could return a collection of all lines starting with a double quote ("AllLines"); these are the only ones we're interested in.
3. It seems AllLines.Count should equal RC+2, (a header, RC data lines, and a total); if it isn't, raise alarm.
4. Each line in AllLines is supposed to contain the same number of fields. Fields are sepatated by comma, however one should not simply split on comma, as there could be comma's inside the quoted stuff (doesn't happen in your example). PIEBALD has an article on how to handle that.
5. Split each line into a string array holding the individual fields
6. For each field in each line (except first and last line), drop the double quotes, and stuff it into the DataSet (assuming that is what you want).
7. Final check: the last of AllLines should fit the data in the DataSet. If not, raise alarm.
Hope this helps.
|
|
|
|
|
0) Right, what Luc said.
1) A little sample would go a long way.
2) Have you considered using bcp?
|
|
|
|
|
Hello.
I have folowing foreach loop
foreach(Process proc in GetProcessFromSystem())
{
}
GetProcessFromSystem() is a method that retrives all running processes, witch can easly change. Does foreach handles such thimg or is it dangereous to use in a such whay?
|
|
|
|
|
A foreach loop can run on anything that implements IEnumerable - so, IMO, this should be fine, if that was your question.
I dont know what has been implemented in GetProcessFromSystem() so this cannot be commented upon.
The funniest thing about this particular signature is that by the time you realise it doesn't say anything it's too late to stop reading it.
My latest tip/trick
Visit the Hindi forum here.
|
|
|
|
|
is GetProcessFromSystem() called each loop? if so it can give a problem.
|
|
|
|
|
nothing inside the parentheses of foreach is called more than once.
it would be different if you had a for, e.g.:
for(int i=0; i<GetProcessFromSystem().Count; i++) { ... }
would evaluate the termination test, and hence call the method, upon each iteration.
|
|
|
|
|
Thank you, this was the answer i was looking.
|
|
|
|
|
for this to work, GetProcessFromSystem() has to return an IEnumerable. If it does, the foreach will be happy. However, if the enumerable would change while foreach is iterating over it, an exception would be thrown.
Whether GetProcessFromSystem() implements and returns the information you hope to get cannot be determined without seeing it.
|
|
|
|
|
Luc Pattyn wrote: Whether GetProcessFromSystem() implements and returns the information you hope to get cannot be determined without seeing it.
If you want, it returns List<Process> , every time is called. If some app would crash and dissapear. Everytime this method calles, it construct the list from scratch. The only wory for me is that it wouldn't dissapear from the list, while it is in loop. It seems i am going to localize it first before passing to foreach loop.
Also that method is a standart as you would get the same list from task manager
|
|
|
|
|
When the method is returning a List, and no one is altering the list, then the list remains as is. Some process crashing, being killed or getting started will not modify the list.
If you don't understand or don't trust how foreach works, then I can only suggest you study it, or stop using it.
|
|
|
|
|
Assuming, for the sake of this answer, that GetProcessFromSystem() returns a List<Process> it would be better to do:
List<Process> procs = GetProcessFromSystem();
foreach (Process proc in procs)
{
............
............
}
Doing it the way you first proposed would be more greedy for system resources as (I think) GPFS() would be called on each iteration of the foreach loop, consuming resources. In addition the list could easily change, items might be added or even removed. Anything implementing <code> IEnumerable can give unpredictable results, especially if things are removed from the list.
Henry Minute
Do not read medical books! You could die of a misprint. - Mark Twain
Girl: (staring) "Why do you need an icy cucumber?"
“I want to report a fraud. The government is lying to us all.”
|
|
|
|
|
Can't agree with that.
In for (int x=0; x<bitmap.Width; x++) the Width property is fetched over and over, once per iteration, and returning the same value all the time; so it makes a lot of sense to use a local variable.
Inforeach(type someVar in someExpressionYieldingAnEnumerator) the expression is evaluated once, and the resulting enumerator is worked with (i.e. its MoveNext method is called once per iteration). Using a local variable for holding the enumerator does not change a thing, except it adds to the typing and the risk of errors.
|
|
|
|
|
Thanks!
I stand corrected. (Well actually I'm sitting)
Henry Minute
Do not read medical books! You could die of a misprint. - Mark Twain
Girl: (staring) "Why do you need an icy cucumber?"
“I want to report a fraud. The government is lying to us all.”
|
|
|
|
|
still facing South?
|
|
|
|
|
Yup, I haven't rearranged the furniture.
Henry Minute
Do not read medical books! You could die of a misprint. - Mark Twain
Girl: (staring) "Why do you need an icy cucumber?"
“I want to report a fraud. The government is lying to us all.”
|
|
|
|
|
Henry Minute wrote: Doing it the way you first proposed would be more greedy for system resources as (I think) GPFS() would be called on each iteration of the foreach loop,
No, it wouldn't. Your code is functionally identical to the OP's code. The code in the foreach's paranthesis is executed only once and the code in the curly braces is executed for each item in the collection returned by the paranthesis code.
Now, if the collection we modified by a background thread or by the code in the curly braces, then you've got a problem...
|
|
|
|
|
Thank you.
I'll try to remember that and probably fail.
Henry Minute
Do not read medical books! You could die of a misprint. - Mark Twain
Girl: (staring) "Why do you need an icy cucumber?"
“I want to report a fraud. The government is lying to us all.”
|
|
|
|
|
Then what's with the elephant??
|
|
|
|
|
foreach(Elephant I in GetMyZoo()) I.CantRemember();
|
|
|
|
|
The Elephant is Henry Minute.
I have to do the typing for him because we cannot find a large enough keyboard.
Henry Minute
Do not read medical books! You could die of a misprint. - Mark Twain
Girl: (staring) "Why do you need an icy cucumber?"
“I want to report a fraud. The government is lying to us all.”
|
|
|
|
|
You should buy him an iPhone then. There is this little app that lets him trumphet in morse code, each character gets translated automatically in a regular keystroke.
If he dislikes the iPhone (or when the neighbors object), there is an alternative based on a Wii Fit. That requires flapping the ears, this time using what amounts to the semaphore alphabet.
|
|
|
|
|
We tried your second suggestion. Never again.
He got a little excited once and it took 3 weeks and a whole case of baby-oil to get his ears untangled.
Henry Minute
Do not read medical books! You could die of a misprint. - Mark Twain
Girl: (staring) "Why do you need an icy cucumber?"
“I want to report a fraud. The government is lying to us all.”
|
|
|
|
|
is there a YouTube reference, or any other proof of bodily harm, discomfort, or damage; so others could benefit and maybe start a class action suit against Nintendo?
|
|
|
|
|