|
Hi Seb,
Thank you very much for this superb app. All good except a very small problem - while trying to read a CSV containing £ currency symbol, somehow it is being interpreted as a odd char like a diamond shape one ♦ !
Sample records-
0000100,12345,"£1,200.00",13/11/2008,AA,07/11/2008,30/11/2008,1,RT,Paid
0000100,1234500,"£1,200.00",13/11/2008,BB,07/11/2008,30/11/2008,2,AB,Paid
Any idea?
Many thanks.
Milan
|
|
|
|
|
Hmm,.
Missed to specify Encoding in StreamReader [System.Text.Encoding]
Cheers,
Milan G
satan
|
|
|
|
|
Sebastien, thanks for this wonderful csv-reader. Works like a charm. Look forward to study the NLight project.
|
|
|
|
|
|
application is 100% perfect for my needs - importing a pipe-delimited file
|
|
|
|
|
Works a treat. Only query is the use of a structure for the enumerator. I believe this will create memory inefficiencies.
|
|
|
|
|
On the contrary, a struct prevents putting one more tiny object on the heap, thus helping the GC a little bit. In fact, .NET BCL itself uses struct if you check it out using Reflector, e.g. List<T>.Enumerator. A value type is not necessarily allocated on the stack, but it happens it does in this particular CLR.
Sébastien
Intelligence shared is intelligence squared.
Homepage : http://www.sebastienlorion.com
|
|
|
|
|
Hi, thanks once more for sharing this valuable piece of code.
I have one question though: I want to give end-users some kind of preview, allowing to adjust settings like quote-chars and the like, as well as mappings to data table columns before running the import. My goal would be to parse say 20 rows and then show them in a grid.
What I am missing is some kind of MaxRows-property for the cached reader. I tried calling ReadNextRecord in a loop for 20 times before passing it as DataSource to the grid. But this didn't help because the bindingList _count-field is initialized with -1, causing it to run through all remaining rows when the Count-property is called.
Would you consider adding something like as MaxRows functionality to the CachedReader implementation?
TIA, Stefan
|
|
|
|
|
Hum, right now I have other priorities, but one easy way to do what you want is to concatenate the first 20 lines in a string in memory and then use a StringReader as input to the CachedCsvReader. Not elegant, but it works
|
|
|
|
|
Hi, thanks anyway for your prompt response. Yes that's the way I currently do it, and as you mentioned it is not very elegant, but it works.
Thanks, Stefan
|
|
|
|
|
I have seen worse in my life tho
|
|
|
|
|
CSV Reader does not support Chinese
Thanks in advance!
1
|
|
|
|
|
|
|
[HOW TO] Modify CSV Reader to support display of the Chinese(or any other language)
1
|
|
|
|
|
It does not need any modification, it does support Chinese. I've tested just now, the encoding of your CSV file must be UTF8.
CsvFileDescription inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true,
TextEncoding = System.Text.Encoding.UTF8, // default is utf-8
FileCultureName = "zh-CN" // default is the current culture
};
Could you post your error messages and codes?
|
|
|
|
|
I am not sure what I need to set for the parameters so it will not throw a error on this section of csv data
LumenWorks.Framework.IO.Csv.MalformedCsvException was unhandled by user code
Message=The CSV appears to be corrupt near record '105' field '2 at position '42'. Current raw data : '"
106, 224,"zzzAFFINITY - "PD" ","AFFINITY ", 144, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," ","10/07/1991 ","10/07/1991 "," ","01/01/1988 ","I ","M "," ", 0, 0.00, 0.00," "," "
107, 224,"zzzSPOCK ","KOHLINAHR ", 145, 12," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","M ", 0.00,"Y ","N ","04/05/2000 ","03/28/2000 "," ","01/01/1986 ","I ","G "," ", 0, 0.00, 0.00,"STAR, SNIP. LH & RH SOCKS, RF PASTERN "," "
108, 224,"zzzTIA **DEAD ","ACACIA ", 146, 12," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","CHES ","F ", 0.00,"N ","N ","05/12/1995 ","05/12/1995 "," ","01/01/1984 ","I ","M "," ", 0, 0.00, 0.00,"BLAZE, LH SOCK "," "
109, 224,"zzzMENDELSSOHN ** ","HORSE BELONGS TO F.E.D. ", 147, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","CHES ","M ", 0.00,"Y ","N ","11/15/1989 ","11/02/1989 "," ","01/01/1980 ","I ","G "," ", 0, 0.00, 0.00,"**SEE DIXON** "," "
110, 224,"zzzSCOTTY ","SCOTTISH CHARMER ", 148, 9," ","E ","THOROUGHBRED ","ETHOROUGHBRED ","BAY ","M ", 0.00,"Y "," "," "," "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
111, 224,"zzzRIDDLES ","ERIDANUS ", 149, 9," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","DARK BROWN ","M ", 0.00,"Y ","N ","05/28/1996 ","05/28/1996 "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
112, 223,"zzzFRUHWIND ","FRUHWIND - DECEASED ", 150, 9," ","E ","GERMAN TB ","EGERMAN TB ","BLACK ","M ", 0.00,"N "," ","07/09/1990 ","05/21/1990 "," ","01/01/1969 ","I ","S "," ", 0, 0.00, 0.00," "," "
113, 223,"zzzCASS ","CASTAGNA ", 151, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," "," "," "," ","01/01/1983 ","I ","M "," ", 0, 0.00, 0.00," "," "
114, 223,"WINDSOR ","WINDSOR '.
Source=LumenWorks.Framework.IO
CurrentFieldIndex=2
CurrentPosition=42
CurrentRecordIndex=105
RawData=(snip)
My guess is it is the quotes around "PD" that are causing the problem. Is there any way to fix this? Here is what I am currently using for my constructor.
using (var csv = new CsvReader(new StreamReader(Path.Combine(txtPath.Text, filename)), true, ',', '"', '\0', '#', ValueTrimmingOptions.All))
|
|
|
|
|
Is the last record only
114, 223,"WINDSOR ","WINDSOR '. ?
If so, that's your problem right there: it is missing some fields.
p.s. saying hello, please, thank you has a tendency of making people more willing to help you.
|
|
|
|
|
I am sorry, Here is my belated hello. Hello!
That record is complete what posted before is just what the exception printed. there are 100's of records before it and 1000's of records after it.
Here is the complete version of that block plus the previous two rows and the next two rows.
104, 224,"zzzLYNN ","CRYSTALLINE ", 142, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," ","10/07/1991 ","10/07/1991 "," ","01/01/1989 ","I ","M "," ", 0, 0.00, 0.00," "," "
105, 224,"zzzRIFLE ","FRUHREIF ", 143, 9," "," ","WARMBLOOD "," WARMBLOOD ","DARK BROWN ","M ", 0.00,"Y "," ","10/15/1990 ","10/12/1990 "," ","01/01/1986 ","I ","G "," ", 0, 0.00, 0.00," "," "
106, 224,"zzzAFFINITY - "PD" ","AFFINITY ", 144, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," ","10/07/1991 ","10/07/1991 "," ","01/01/1988 ","I ","M "," ", 0, 0.00, 0.00," "," "
107, 224,"zzzSPOCK ","KOHLINAHR ", 145, 12," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","M ", 0.00,"Y ","N ","04/05/2000 ","03/28/2000 "," ","01/01/1986 ","I ","G "," ", 0, 0.00, 0.00,"STAR, SNIP. LH & RH SOCKS, RF PASTERN "," "
108, 224,"zzzTIA **DEAD ","ACACIA ", 146, 12," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","CHES ","F ", 0.00,"N ","N ","05/12/1995 ","05/12/1995 "," ","01/01/1984 ","I ","M "," ", 0, 0.00, 0.00,"BLAZE, LH SOCK "," "
109, 224,"zzzMENDELSSOHN ** ","HORSE BELONGS TO F.E.D. ", 147, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","CHES ","M ", 0.00,"Y ","N ","11/15/1989 ","11/02/1989 "," ","01/01/1980 ","I ","G "," ", 0, 0.00, 0.00,"**SEE DIXON** "," "
110, 224,"zzzSCOTTY ","SCOTTISH CHARMER ", 148, 9," ","E ","THOROUGHBRED ","ETHOROUGHBRED ","BAY ","M ", 0.00,"Y "," "," "," "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
111, 224,"zzzRIDDLES ","ERIDANUS ", 149, 9," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","DARK BROWN ","M ", 0.00,"Y ","N ","05/28/1996 ","05/28/1996 "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
112, 223,"zzzFRUHWIND ","FRUHWIND - DECEASED ", 150, 9," ","E ","GERMAN TB ","EGERMAN TB ","BLACK ","M ", 0.00,"N "," ","07/09/1990 ","05/21/1990 "," ","01/01/1969 ","I ","S "," ", 0, 0.00, 0.00," "," "
113, 223,"zzzCASS ","CASTAGNA ", 151, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," "," "," "," ","01/01/1983 ","I ","M "," ", 0, 0.00, 0.00," "," "
114, 223,"WINDSOR ","WINDSOR ", 152, 9," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","CHES ","M ", 0.00,"Y "," ","03/12/1999 ","03/12/1999 "," ","01/01/1988 ","I ","G "," ", 0, 0.00, 0.00," "," "
115, 559,"MR. BROWN "," ", 153, 9," ","E ","CROSS ","ECROSS ","BAY ","M ", 0.00,"Y "," ","02/24/1998 ","02/24/1998 "," "," ","I ","G "," ", 0, 0.00, 0.00," "," "
116, 559,"zzzMR GOLD "," ", 154, 9," ","E ","UNKNOWN ","EUNKNOWN ","UNKNOWN "," ", 0.00," "," ","08/29/1989 "," "," "," ","I "," "," ", 0, 0.00, 0.00," "," "
Thank you for a great product and your quick response.
|
|
|
|
|
Thanks for your new found amiability
The error is caused by
"zzzAFFINITY - "PD" "
You need to escape quotes when they are inside quoted fields.
|
|
|
|
|
Yes, I was pretty sure that was causing the problem, I mentioned that in my first post. I have no control of the format csv I am reading from, fortunately I was able to work with the developer and he will now escape the quotes. However had I not been able to work with him, is there any settings in your code that can be changed so it will not fail if it hits this type of non escaped quote? If not, there is no problem anyway as the file will now be correctly escaped, I would just like to know for future reference.
Thank you for your patients, and the great program,
Scott
|
|
|
|
|
The real question is how the parser is going to make the difference between
"this is a quote",followed by comma inside one field","another field"
and
"this is a quoted field followed by a delimiter","another field"
We can guess that the first case is 2 fields, but in reality, it should be 3 fields. Can you see how it gets hairy real quick? We enter the realm of heuristics .. all that because people cannot follow a dead simple standard? This is scary stuff when you extrapolate that reality to our industry.
So the short answer is no, I am not going to support that: it is time consuming to figure out the correct heuristic and it kills the performance. Also, it goes against my own principles. If you are really desperate, there's always Excel It can handle pivot tables all right, but csv files were apparently too hard to output correctly.
p.s. that answer is meant for the readers of this article at large, not you in particular You can tell by the tone that it is a pet peeve of mine
|
|
|
|
|
Hi, I have a follow-up question to this. I totally agree, that there is no sense in trying to compensate for this malformatted input line. But my question is, how this error is handled by your code?
Is there just an exception thrown and then the whole import job gets aborted?
Or is there a way to continue the import and just omit the offending line?
My ideal solution would be, to import all syntactically correct lines and at the end have a collection of the bad lines, together with their line numbers. Then I could save the good lines in one table with as many columns as fields defined by the header row of the CSV file, whereas the bad lines could be stored in a separate table with just 3 columns: lineNo, content of the line, error text.
Thanks for this promising tool and thanks in advance for a reply to my question.
Stefan
|
|
|
|
|
You can set the property DefaultParseErrorAction to ParseErrorAction.RaiseEvent and then listen to the event ParseError . There is an example in the "Custom error handling with events scenario" of my article.
|
|
|
|
|
Of course, in this particular case, a reader that reads fields of fixed width could be used instead of a comma-separated value reader. There is no problem with unescaped embedded quotes if the width of the field is fixed (and known).
|
|
|
|
|