|
Maybe I am missing something, but i just can't seem to figure out the problem with the last missing field. I've looked at the other messages related to this, but they don't seem to solve my issue.
Here's the sample of data (comma seperator, quote text qualifier, double quote escape, and MissingFieldAction.ParseError). 5 fields in each row, and record two has missing last field.
This sample does not throw error, but yet it is missing last seperator so last field is missing. Shouldn't this throw an error?
"Part Number","Part Description","Price","Weight","Information"
"0050010790","3/4"" LUG BOLT","3.73","",""
"0010095200","CYLINDER FRAME","588.75",""
"0010095224","CYLINDER GATE","726.06","",""
This sample thows an error, but at field 4, not field 5 even though it is only missing the last seperator. Why is there an error at field 4 if there is a seperator for it?
"Part Number","Part Description","Price","Weight","Information"
"0050010790","3/4"" LUG BOLT","3.73","",""
"0010095200","CYLINDER FRAME","588.75",
"0010095224","CYLINDER GATE","726.06","",""
Please help. Thanks.
|
|
|
|
|
While parsing the 4th field, the expected token is [delimiter], which is absent. The parser does not know if the field value is complete or not. In the example:
foo,bar,baz,Sébasti
"en" is missing at the end of my name in the 4th field. The reader cannot determine if the value is complete or not and so, it is pessimistic about it.
If you want to get the partial value and still know which fields are missing, use the MissingFieldAction.ReturnNullValue option instead. Missing fields will have a null value instead of String.Empty.
|
|
|
|
|
Yes, i see what you mean in your example, but if there are text qualifiers enclosing "Sébasti",
foo,bar,baz,"Sébasti"
it does not set the missing field to null. The missing field is has a value of string.empty.
|
|
|
|
|
I tried using a different text qualifier and escape character, but it exhibits the same behavior. As long as the last field is enclosed in the text qualifier, the missing field will have a String.Empty value.
Thanks for your help. Great proj by the way.
|
|
|
|
|
Well, the conditions are narrower than that. The behavior you see happens only for the last record before EOF and only if the last present field is quoted. I will look into that to correct that edge case. Thanks for reporting the bug
|
|
|
|
|
FYI ... I'm using VB.Net in my testing, it actually occured at any point. I discovered it while adding different test cases to the beginning and middle of my test file. I didn't realize it was a bug, I was hoping it was incorrect construct or properties I was using.
Thanks once again for your proj and help.
|
|
|
|
|
If possible, could you post your sample data, because I was not getting this behaviour in the quick tests I made after I read your previous post. I will be checking this issue more thoroughly later.
|
|
|
|
|
Here's some of the data I was testing with.
"Part Number","Part Description","Price","Weight","Information"
"0010095200-kd","CYLINDER FRAME..","588.75","weight"
"0010095224-kd","CYLINDER GATE CAMECO","726.a06","",""
"0020033326-kd","VALVE RELIEF","70.93","",""
"0020033327-kd"
"0020048048-kd","VALE STEM (20E)","117.07","",""
"0050001440-kd",6.80","",""
"0050009641-kd","BOLT (20e)","4.77",
"0050010035-kd","BOLT","",""
"0050010138-kd","LOCK NUT","2.64","",""
"0050010197-kd","NUT (20E)","1.01","",""
"0050010199-kd","LOCK NUT (20E)","4.05","",""
"","0.46","",""
"0050010204-kd","NUT","0.80","",""
"0050010790-kd","3/4"" LUG BOLT","3.73","",""
"005001218-kd","BUSHING CAMECO","12.06","",""
"0050022596-kd","LOCK NUT","47.43","",""
"0050025505-kd","","0.56","",""
"0050028027-kd","HUB AXLE NUT","7.85","",""
"0050028067-kd","HUB AXLE WASHER","4.77","",""
"CYLINDER BOLT","38.01","",""
"0050033315-kd","AXLE NUT","5.82","",""
"0050298275-kd","YOKE BOLT","65.06","",""
"0051366510-kd","THRUST WASHER","10.19","",""
"0051372113-kd","WASHER","3.17","",""
"B187","HITCH PIN (1.5"" X 17"")","","",""
"B439-9","WEAR PLATE","","",""
"1020837","GREASE FITTING","","",""
"R200-3","PIN","","",""
"037000535","Grease Zerk","0.50","",""
|
|
|
|
|
Hi Sebastien,
Thanks for you CSV Reader, it's been a great help. I've got the latest version.
Unfortunately, the data being read now includes double quotes within the columns, highlighted below.
"",10173,"Development Manager - Social Economy Sector","Trust Bank",10153,,"Lolalll Pudd","Meet the requirements of Structured Finance & Relationship Teams for "transaction support" at all times, ensuring appropriate analysis and clarity in respect of required/request","",09-Sep-2008,42000,38000,"+ LTA + bonus",,,,41000,45000,,,"","",,"","","",,,"",,"A","AUK,AUL,AUN,CBE,RBD"
This is now throwing an error. I've looked at many code examples but can't work out the best way to deal with this.
I'd like to change the Quote value, but this is a read only property.
Do you have some examples of how to change this value, or the escape value.
Thanks in advance.
Trevor
|
|
|
|
|
You can specify quote, escape, delimiter and comment characters in the constructor (look at the overloads). Unfortunately, there is no built-in way to deal with CSV file malformed the way you describe.
|
|
|
|
|
Thanks so much for speedy reply.
I did notice after searching the forum (something I should have done before), you had answered this question many times.
So my apologises.
I will look at the overloads.
Many Thanks
Trevor
|
|
|
|
|
Sorry to have to come back to you on an issue you have discussed many times.
However, for the past couple of days I have tried a number of combinations to escape the double quotes and I have not been able to resolve my issue.
Please note I'm using VB instead of C# (I hope this is not the problem)
Basically the csv file has 36 columns per row. One of the fields contains multilines and sometimes double quotes, which of course I know is my issue.
Using csv As New CsvReader(New StreamReader(App_Path() & "db\weblistold.csv"), False, ",", Chr(34), "\'", "#", False)
csv.SupportsMultiline = True
csv.SkipEmptyLines = True
Dim fieldCount As Integer = csv.FieldCount
End Using
This combination results in the error mentioned in the subject.
Using csv As New CsvReader(New StreamReader(App_Path() & "db\weblistold.csv"), False, ",", "~", "\'", "#", False)
Changing the quote value to a ~ for example. Results in only 14 columns per line, which causes a problem with column mapping. Because I need to use the double quotes for the text qualifier.
Any help or working example code / projects will be a great help.
|
|
|
|
|
I am willing to spend time to correct bugs in my reader, but not to add new features, sorry. So your options are to either modify the code to add the behavior you want or preprocess your file to escape the quotes inside the quoted fields.
I would strongly suggest you go with the later. The real problem you would have to solve if you go with the first option is trying to detect when a quote is really a quote instead of part of the field value. One way would be to check if the quote is followed by the delimiter character BUT that is *exactly* the point of quoting a field, ie allow delimiters inside a field. So you could very well end up having false positives because it happens that the delimiter following the quote is also part of the field value. See where this is going? It's not so simple...
By the way, are your files generated by MS SQL Server ? I try to see which products are so lazy as to not follow a 5-line spec. Excel and SQL Server are first on the line up to now.
|
|
|
|
|
I hope my request for support doesn't imply I’m requesting a new feature.
I completely understand your point re escaping the quotes before passing it to the fast reader, and I’ve looked into this. However, I’ve come across the same issue you mentioned above. How do you know if that double quote is valid?
I appreciate escaping quotes is not easy to do and even the OLEDB provider is unable to deal with this. It simply creates another column. This is a reason I looked towards your reader to help. I hoped the options available in the constructor, would deal with this scenario.
Are there any code examples available to help me along? Or I’m I correct in thinking that using a double quote for the text delimited and a single quote to escape with not work in the constructor.
Please note the extract is from an application called MircoDec and I’m not sure which database it uses.
Once again many thanks for your speedy response and assistance.
|
|
|
|
|
In the constructor, you can set any character, as long as delimiter and quote are different. Quote and escape can be the same. Comment is taken into account only when present at the beginning of a line.
Well, if you have control over the quote character, simply use some silly character which you are sure will not be in your fields. That way, you can keep the unescaped " in your fields.
|
|
|
|
|
I'm having this issue also, and I have no control over the source file. It could come from any number of sources, and will probably never be consistent in formatting.
As far as how to check if the quote is a field qualifier or if it's a quote within the field, why not just check the previous and next characters, and if neither is a delimiter assume it's a quote within a field? This seems like a relatively simple solution to me.. Then again, looking at the code I can't tell exactly where I'd need to make that change...
|
|
|
|
|
The solution you propose seems simple, but it breaks a lot of other cases in the current code on CodeProject. I have yet to try with the new library[^] that I published on CodePlex.
I think it will be easier for you to see where to change the code with that new library. For the moment, I have no time to implement a quirks mode (and I am still thinking about if I should do it at all).
|
|
|
|
|
Hi Sebastien,
I am having a similar problem. My delimiter is Tab.
The CSV appears to be corrupt near record error occur when any of the value within the field has a " symbol.
Is this normal?
I am currently manually remove the " symbol before parsing. Is there any alternative to ignore the " symbol?
|
|
|
|
|
I have the same problem. It is just a csv file created by Excel (ie save as csv)
I can't find any offending charactors, not " at all in the file.
It goes up to about 60 lines and then it reads the next set of 10 lines or so as a single line.
Somebody said not to reinvent the wheel. But if all else fails I have to do that.
Amara
|
|
|
|
|
Hi, I have used your library before and it works great, so thanks for all the effort you have put into it.
I have a problem in that it doesn't appear to accept certain characters, it maybe something I am doing wrong, but I have pulled the code out into a seperate application to test and the issue still appears.
Basically I have a very simple CSV file like so:
a,b,c
£,a,a
1,2,3
Yet when the library attempts to read the second line, then it displays a "?" symbol rather than the "£" symbol. There maybe other characters that cause the same behaviour.
Help suggestions would be much appreciated!
Thanks
|
|
|
|
|
This is an encoding issue. Your need to save your file as Unicode (UTF-8 or whatever) or otherwise provide the codepage to the StreamReader (probably Windows-1252).
using (CsvReader csv = new CsvReader(new StreamReader("data.csv", Encoding.GetEncoding(1252)), false, ';'))
{
}
|
|
|
|
|
I suspected it was down to me!
Thanks that worked, and thanks for the speedy reply!
|
|
|
|
|
First, I love your CSV reader. It is both fast and elegant.
I have a csv file that has two lines at the bottom, each containing just a single space. The CSVReader reads those two lines as records, with all fields empty. Is there a way to configure the reader to treat such lines as empty lines? This type of lines occur often, especially at the bottom, where the creator of the file accidentally has added a single space.
My code is:
using (StreamReader stream = new StreamReader(filePath))
{
using (CsvReader csv = new CsvReader(stream, false))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
csv.SkipEmptyLines = true;
csv.SupportsMultiline = false;
csv.DefaultParseErrorAction = ParseErrorAction.AdvanceToNextLine;
IDataReader reader = csv;
while (reader.Read())
{
TdAccount account = GetAccountFromReader(reader);
accounts.Add(account);
}
reader.Close();
}
}
Thanks!
Roel
modified on Thursday, October 23, 2008 9:12 AM
|
|
|
|
|
That feature is not supported. One thing you could do is test that all fields are empty, or modify the source code to add this behavior. The method you want to modify is CsvReader.DoSkipEmptyAndCommentedLines(ref int pos).
|
|
|
|
|
Hi,
I got a CSV file with the following content:
Column1,Column2
2,4
3
4,55,66
5
6
you can see the line 4: "4,55,66" is malformed,
when I use the exception setting as
oCsvReader.DefaultParseErrorAction = ParseErrorAction.ThrowException;
oCsvReader.MissingFieldAction = MissingFieldAction.ReturnPartiallyParsedValue;
I got the follwong result:
Column1 Column2
2 4
3
4 55
66
5
6
Which is not what I want,
is it possible to throw a Exception at this situation, while line3: "3" won't throw a Exception (as MissingFieldAction.ReturnPartiallyParsedValue works)
Many thanks for this!!
|
|
|
|
|