|
I have seen worse in my life tho
|
|
|
|
|
CSV Reader does not support Chinese
Thanks in advance!
1
|
|
|
|
|
|
|
[HOW TO] Modify CSV Reader to support display of the Chinese(or any other language)
1
|
|
|
|
|
It does not need any modification, it does support Chinese. I've tested just now, the encoding of your CSV file must be UTF8.
CsvFileDescription inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true,
TextEncoding = System.Text.Encoding.UTF8, // default is utf-8
FileCultureName = "zh-CN" // default is the current culture
};
Could you post your error messages and codes?
|
|
|
|
|
I am not sure what I need to set for the parameters so it will not throw a error on this section of csv data
LumenWorks.Framework.IO.Csv.MalformedCsvException was unhandled by user code
Message=The CSV appears to be corrupt near record '105' field '2 at position '42'. Current raw data : '"
106, 224,"zzzAFFINITY - "PD" ","AFFINITY ", 144, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," ","10/07/1991 ","10/07/1991 "," ","01/01/1988 ","I ","M "," ", 0, 0.00, 0.00," "," "
107, 224,"zzzSPOCK ","KOHLINAHR ", 145, 12," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","M ", 0.00,"Y ","N ","04/05/2000 ","03/28/2000 "," ","01/01/1986 ","I ","G "," ", 0, 0.00, 0.00,"STAR, SNIP. LH & RH SOCKS, RF PASTERN "," "
108, 224,"zzzTIA **DEAD ","ACACIA ", 146, 12," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","CHES ","F ", 0.00,"N ","N ","05/12/1995 ","05/12/1995 "," ","01/01/1984 ","I ","M "," ", 0, 0.00, 0.00,"BLAZE, LH SOCK "," "
109, 224,"zzzMENDELSSOHN ** ","HORSE BELONGS TO F.E.D. ", 147, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","CHES ","M ", 0.00,"Y ","N ","11/15/1989 ","11/02/1989 "," ","01/01/1980 ","I ","G "," ", 0, 0.00, 0.00,"**SEE DIXON** "," "
110, 224,"zzzSCOTTY ","SCOTTISH CHARMER ", 148, 9," ","E ","THOROUGHBRED ","ETHOROUGHBRED ","BAY ","M ", 0.00,"Y "," "," "," "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
111, 224,"zzzRIDDLES ","ERIDANUS ", 149, 9," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","DARK BROWN ","M ", 0.00,"Y ","N ","05/28/1996 ","05/28/1996 "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
112, 223,"zzzFRUHWIND ","FRUHWIND - DECEASED ", 150, 9," ","E ","GERMAN TB ","EGERMAN TB ","BLACK ","M ", 0.00,"N "," ","07/09/1990 ","05/21/1990 "," ","01/01/1969 ","I ","S "," ", 0, 0.00, 0.00," "," "
113, 223,"zzzCASS ","CASTAGNA ", 151, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," "," "," "," ","01/01/1983 ","I ","M "," ", 0, 0.00, 0.00," "," "
114, 223,"WINDSOR ","WINDSOR '.
Source=LumenWorks.Framework.IO
CurrentFieldIndex=2
CurrentPosition=42
CurrentRecordIndex=105
RawData=(snip)
My guess is it is the quotes around "PD" that are causing the problem. Is there any way to fix this? Here is what I am currently using for my constructor.
using (var csv = new CsvReader(new StreamReader(Path.Combine(txtPath.Text, filename)), true, ',', '"', '\0', '#', ValueTrimmingOptions.All))
|
|
|
|
|
Is the last record only
114, 223,"WINDSOR ","WINDSOR '. ?
If so, that's your problem right there: it is missing some fields.
p.s. saying hello, please, thank you has a tendency of making people more willing to help you.
|
|
|
|
|
I am sorry, Here is my belated hello. Hello!
That record is complete what posted before is just what the exception printed. there are 100's of records before it and 1000's of records after it.
Here is the complete version of that block plus the previous two rows and the next two rows.
104, 224,"zzzLYNN ","CRYSTALLINE ", 142, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," ","10/07/1991 ","10/07/1991 "," ","01/01/1989 ","I ","M "," ", 0, 0.00, 0.00," "," "
105, 224,"zzzRIFLE ","FRUHREIF ", 143, 9," "," ","WARMBLOOD "," WARMBLOOD ","DARK BROWN ","M ", 0.00,"Y "," ","10/15/1990 ","10/12/1990 "," ","01/01/1986 ","I ","G "," ", 0, 0.00, 0.00," "," "
106, 224,"zzzAFFINITY - "PD" ","AFFINITY ", 144, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," ","10/07/1991 ","10/07/1991 "," ","01/01/1988 ","I ","M "," ", 0, 0.00, 0.00," "," "
107, 224,"zzzSPOCK ","KOHLINAHR ", 145, 12," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","M ", 0.00,"Y ","N ","04/05/2000 ","03/28/2000 "," ","01/01/1986 ","I ","G "," ", 0, 0.00, 0.00,"STAR, SNIP. LH & RH SOCKS, RF PASTERN "," "
108, 224,"zzzTIA **DEAD ","ACACIA ", 146, 12," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","CHES ","F ", 0.00,"N ","N ","05/12/1995 ","05/12/1995 "," ","01/01/1984 ","I ","M "," ", 0, 0.00, 0.00,"BLAZE, LH SOCK "," "
109, 224,"zzzMENDELSSOHN ** ","HORSE BELONGS TO F.E.D. ", 147, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","CHES ","M ", 0.00,"Y ","N ","11/15/1989 ","11/02/1989 "," ","01/01/1980 ","I ","G "," ", 0, 0.00, 0.00,"**SEE DIXON** "," "
110, 224,"zzzSCOTTY ","SCOTTISH CHARMER ", 148, 9," ","E ","THOROUGHBRED ","ETHOROUGHBRED ","BAY ","M ", 0.00,"Y "," "," "," "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
111, 224,"zzzRIDDLES ","ERIDANUS ", 149, 9," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","DARK BROWN ","M ", 0.00,"Y ","N ","05/28/1996 ","05/28/1996 "," ","01/01/1984 ","I ","G "," ", 0, 0.00, 0.00," "," "
112, 223,"zzzFRUHWIND ","FRUHWIND - DECEASED ", 150, 9," ","E ","GERMAN TB ","EGERMAN TB ","BLACK ","M ", 0.00,"N "," ","07/09/1990 ","05/21/1990 "," ","01/01/1969 ","I ","S "," ", 0, 0.00, 0.00," "," "
113, 223,"zzzCASS ","CASTAGNA ", 151, 9," ","E ","SW WARMBLOOD ","ESW WARMBLOOD ","BAY ","F ", 0.00,"N "," "," "," "," ","01/01/1983 ","I ","M "," ", 0, 0.00, 0.00," "," "
114, 223,"WINDSOR ","WINDSOR ", 152, 9," ","E ","DAN WARMBLOOD ","EDAN WARMBLOOD ","CHES ","M ", 0.00,"Y "," ","03/12/1999 ","03/12/1999 "," ","01/01/1988 ","I ","G "," ", 0, 0.00, 0.00," "," "
115, 559,"MR. BROWN "," ", 153, 9," ","E ","CROSS ","ECROSS ","BAY ","M ", 0.00,"Y "," ","02/24/1998 ","02/24/1998 "," "," ","I ","G "," ", 0, 0.00, 0.00," "," "
116, 559,"zzzMR GOLD "," ", 154, 9," ","E ","UNKNOWN ","EUNKNOWN ","UNKNOWN "," ", 0.00," "," ","08/29/1989 "," "," "," ","I "," "," ", 0, 0.00, 0.00," "," "
Thank you for a great product and your quick response.
|
|
|
|
|
Thanks for your new found amiability
The error is caused by
"zzzAFFINITY - "PD" "
You need to escape quotes when they are inside quoted fields.
|
|
|
|
|
Yes, I was pretty sure that was causing the problem, I mentioned that in my first post. I have no control of the format csv I am reading from, fortunately I was able to work with the developer and he will now escape the quotes. However had I not been able to work with him, is there any settings in your code that can be changed so it will not fail if it hits this type of non escaped quote? If not, there is no problem anyway as the file will now be correctly escaped, I would just like to know for future reference.
Thank you for your patients, and the great program,
Scott
|
|
|
|
|
The real question is how the parser is going to make the difference between
"this is a quote",followed by comma inside one field","another field"
and
"this is a quoted field followed by a delimiter","another field"
We can guess that the first case is 2 fields, but in reality, it should be 3 fields. Can you see how it gets hairy real quick? We enter the realm of heuristics .. all that because people cannot follow a dead simple standard? This is scary stuff when you extrapolate that reality to our industry.
So the short answer is no, I am not going to support that: it is time consuming to figure out the correct heuristic and it kills the performance. Also, it goes against my own principles. If you are really desperate, there's always Excel It can handle pivot tables all right, but csv files were apparently too hard to output correctly.
p.s. that answer is meant for the readers of this article at large, not you in particular You can tell by the tone that it is a pet peeve of mine
|
|
|
|
|
Hi, I have a follow-up question to this. I totally agree, that there is no sense in trying to compensate for this malformatted input line. But my question is, how this error is handled by your code?
Is there just an exception thrown and then the whole import job gets aborted?
Or is there a way to continue the import and just omit the offending line?
My ideal solution would be, to import all syntactically correct lines and at the end have a collection of the bad lines, together with their line numbers. Then I could save the good lines in one table with as many columns as fields defined by the header row of the CSV file, whereas the bad lines could be stored in a separate table with just 3 columns: lineNo, content of the line, error text.
Thanks for this promising tool and thanks in advance for a reply to my question.
Stefan
|
|
|
|
|
You can set the property DefaultParseErrorAction to ParseErrorAction.RaiseEvent and then listen to the event ParseError . There is an example in the "Custom error handling with events scenario" of my article.
|
|
|
|
|
Of course, in this particular case, a reader that reads fields of fixed width could be used instead of a comma-separated value reader. There is no problem with unescaped embedded quotes if the width of the field is fixed (and known).
|
|
|
|
|
Good observation My new library NLight has such a reader (FixedWidthRecordReader). It can be found on CodePlex: http://nlight.codeplex.com[^].
|
|
|
|
|
|
Thanks for sharing this. It has saved me a bunch of development time!
|
|
|
|
|
Sebastien, this is an excellent program, but the error messages are difficult to understand.
For example, here is an error I got:
The CSV appears to be corrupt near record '20' field '109 at position '2418'
Then there are several records displayed, but I have no idea which one of these records is the record with the error.
Would you display just the record with the error?
Thanks again for an excellent program !
|
|
|
|
|
Due to the design of the class (fields are read one by one), it would impact performance greatly to do what you ask for in a systematic way, most notably when records overlap buffers. Reporting the position of where the error occurred in the current buffer is the compromise I chose to make. My new reader included in NLight (http://nlight.codeplex.com[^]) has a different design (records are read one by one) and so reporting errors is much simpler for me and convenient for you
|
|
|
|
|
so are you saying your NLight project is a better version of this?
|
|
|
|
|
Yes, definitely. This reader is a bit faster than the new one, but the price of that bit of speed is high. NLight's design is much cleaner and extensible.
|
|
|
|
|
right on, I'll check it out.
|
|
|
|
|
Hi,
I'm trying to make it work in visual C#, but it requires to be converted. After I've done this procedure, it says to me that there were some mistakes on it.
Should I put the binaries firstly in some folder? Any other step before?
I'm a novice in C#, could you please give me a hand on this?
Thanks in advance!
Nacho
|
|
|
|
|
I am sorry, but that is out of the scope of the support I am willing to provide for this article. On the other hand, since you are using VS2010, you may want to check out my new library at http://nlight.codeplex.com[^]. Have a look at the NLight.IO.Text.DelimitedRecordReader class.
|
|
|
|
|