|
Hello,
When I try to instantiate the CsvReader like this:
using (CsvReader csv = new CsvReader(new StreamReader(TBFileName.Text), true, ',', '"', '\\'))
I get the following error:
'LumenWorks.Framework.IO.Csv.CsvReader' does not contain a constructor that takes '5' arguments'
Do you have any insight as to what I might be doing wrong?
Thanks,
Mitch
|
|
|
|
|
Yes, actually I do. There is, believe it or not, not a single constructor that takes 5 arguments in the CsvReader class.
|
|
|
|
|
I do not know if it is just my problem,however have been struggling.
When i have parsing error, for example
xx, "xx"", x, 1, 1
dd, d, 1, 1,1
it does successfully throw an error, and advance to the next line. However, the parsed result would be like the following
xx, xx", dd, d, 1
somehow concatenate the two lines. How can simply discard the current record? and start with the next records.
Thanks so much
|
|
|
|
|
You need to specify the default parse error action as follow:
using (var csv = new CsvReader(...))
{
csv.DefaultParseErrorAction = ParseErrorAction.AdvanceToNextLine;
while (csv.ReadNextRecord())
{
}
}
|
|
|
|
|
thanks for the quick reply.
I did exactly the same thing,
csv.DefaultParseErrorAction = ParseErrorAction.RaiseEvent;
csv.ParseError += csv_ParseError;
while (csv.ReadNextRecord())
{
string[] drs = new String[150];
if (!csv.ParseErrorFlag && csv.FieldCount == 150)
{
csv.CopyCurrentRecordTo(drs);
if (drs.Length == 150)
{
for (int i = 0; i < drs.Length; i++)
{
HttpContext.Current.Response.Write(drs[i] + " ");
}
}
}
}
|
|
|
|
|
Just raise this question again. In the latest version,
"Is it possible to configure the reader so consecutive delimiters are read as one delimiter? "
For example, the data set looks like:
male 23 student
male 36 teacher
The delimiter is space, but there is one additional space between column #2 and #3.
|
|
|
|
|
My question is: how can the reader distinguish between real empty fields and mere unwanted delimiters?
I'm sorry to tell you that, but the only option I could realistically offer is a quirk mode for Excel. Seriously, how hard is it for the Office team to follow a 5 lines standard which is probably older than I am?
Yet, I am willing to do this for my new project at http://nlight.codeplex.com[^], but not for the library offered here: it's a lost cause. I wrote that code assuming well formed data (how naive of me!) and it's a real PITA when I try to go around Excel/SQL Server idiocies. For that reason, the NLight project will become the 4.0 version of this article rather sooner than later.
|
|
|
|
|
Not sure whether this is a perfect solution, but at least it works on my case.
CsvReader.cs
private string ReadField(int field, bool initializing, bool discardValue)
before _nextFieldIndex = Math.Max(index + 1, _nextFieldIndex);
just add:
if (_skipEmptyField && string.IsNullOrEmpty(value))
continue;
|
|
|
|
|
Have you run the unit tests ?
|
|
|
|
|
|
Ah I see, you added you own flag. Well, that's basically part of the quirk mode I was talking about, but there is much more to it. Your request is one of many ... It happens that this particular one is simple, but on the other hand, as far as I remember, you are only the second person to ask for it. The most requested ones are non-escaped quote characters in fields and extra commas at the end of records.
So I'm glad you went ahead and added code to handle your scenario. It will surely become part of the larger Excel "fix" ...
|
|
|
|
|
I have blank Fields in my header row, courtesy of converting from excel to csv.
I get something like this field1, field2, field3,,,
I only want to include the 3 fields. However, if any of the records contain a null or empty space I want to include them, up to the 3 fields only.
i.e. 3,,3,,, would just include 3,,3
Does the MissingFieldAction apply to headers and records or just headers?
|
|
|
|
|
I am not sure how exactly is your csv. Can you post a short sample which includes the headers?
|
|
|
|
|
Here is my csv file. It gets converted from excel. According to your reader there are 25 fields. However, I only want the headings that have a value (should be 19), DocumentDate is the last field in the excel file. The conversion from excel to csv seems to add empty fields.
Also, I don't want the malformed or empty rows at the end.
This should be the last record...
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,64.3,8.04,USD,0.6713,10160-1,6/3/2010,6/3/2010
===============
csv file below
===============
SailDate,SourceCompanyName,Voyage,Vessel,Port,CountryDestination,SupplierName,SalesPoint,SalesContrNumber,DiscPort,LotNumber,CostType,NetCost,GST,Currency,ExchangeRate,SupplierInvoiceNumber,DueDate,DocumentDate,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6370,Susaki,4,MARKETING FEE,245.21,30.65,USD,0.6713,10160-4,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6370,Susaki,4,MARKETING FEE,519.86,64.98,USD,0.6713,10160-4,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,395.63,49.45,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,291.74,36.47,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,144.21,18.03,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,684.26,85.53,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,815.42,101.93,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,21.12,2.64,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,855.85,106.98,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,300.48,37.56,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,64.3,8.04,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
|
|
|
|
|
I think you should rather concentrate your efforts on making Excel output the file without the extra empty fields/records .. I loaded your file in Excel, saved it as .xslx and saved it back to csv and the extra commas were gone ..
|
|
|
|
|
Well the problem is the initial export is from excel to csv, and it is done automatically, even when I do
it manually I still get the extra commas.
Yeah, if I open it in excel again and then save as csv again it works, however, twice converting a document seems redundant.
Is there a place in your code that I can add the logic to ignore/stop counting/adding values the field collection when it encounters an empty/null field?
any suggestions would be helpful. Long live excel...
|
|
|
|
|
You can write some custom code in ReadNextRecord just before the line 1521 _fieldCount++; . You can check the _fields variable and update _fieldCount to be the last field index that has a non-empty value:
for (int i = _fieldCount; i >= 0; i--)
{
if (string.IsNullOrEmpty(_fields[i]))
_fieldCount--;
}
_fieldCount++;
The reader will still read the empty records at the end of the file, but I guess you can detect those easily in your code.
|
|
|
|
|
Thanks, I will try that. Will doing this also prevent the "Key Name already exist error" from adding duplicate header names to the index?
|
|
|
|
|
That error was caused by trying to add many empty headers, so yes
|
|
|
|
|
Great! One more thing, will this code just be executed for the header row? My data might have null/empty values for records in a given valid field and I wouldn't want to lose out on getting that data.
|
|
|
|
|
The code I posted will in effect make it appear as if you CSV has 19 fields instead of 25. So the last 6 fields will always be skipped. The other 19 are read like any others, so empty values will always come out as string.Empty.
|
|
|
|
|
Great! My last question...
I am getting empty rows in my datatable even though I set csv.SkipEmptyLines = true. Do I need to change missingfieldaction to replacebyempty instead of replace by null? I added my code below...
Btw, You should sell this tool. You may want to break some of the code into smaller methods.
private void LoadFilesToDatabase(List<BaseHelper.UploadData> files)
{
foreach (BaseHelper.UploadData file in files)
{
DataTable dt = new DataTable();
DataRow dr;
using (CsvReader csv = new CsvReader(new StreamReader(file.FilePath + ".csv"), true))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
csv.SkipEmptyLines = true;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
foreach (string header in headers)
{
dt.Columns.Add(new DataColumn());
}
while (csv.ReadNextRecord())
{
dr = dt.NewRow();
for (int i = 0; i < fieldCount; i++)
{
dr[i] = csv[i];
}
dt.Rows.Add(dr);
}
}
string destinationTable = "DataServices" + file.FileName.ToLower().Replace(".csv","") + "CSV";
BulkAddDatatableToSQLServer(dt, destinationTable);
}
}
|
|
|
|
|
I think I should put more emphasis on the fact that the CsvReader implements the IDataReader interface Your code can be rewritten as below (warning, notepad IDE). I added the code to skip empty records at the end. There is no way the reader can skip them by itself, since they are valid records. As for refactoring, have a look at http://nlight.codeplex.com[^].
private void LoadFilesToDatabase(List<BaseHelper.UploadData> files)
{
foreach (BaseHelper.UploadData file in files)
{
DataTable dt = new DataTable();
using (CsvReader csv = new CsvReader(new StreamReader(file.FilePath + ".csv"), true))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
csv.SkipEmptyLines = true;
dt.Load(csv);
}
for (int i = dt.Rows.Count; i >= 0; i++)
{
if (string.IsNullOrEmpty((string) dt.Rows[i][0]))
dt.Rows.RemoveAt(i);
}
string destinationTable = "DataServices" + file.FileName.ToLower().Replace(".csv","") + "CSV";
BulkAddDatatableToSQLServer(dt, destinationTable);
}
}
|
|
|
|
|
Hey, Just want to say thanks for everything.
With my conversion code, your tool and SqlBulkCopy the following took 1.2 seconds:
1) convert 3 excel files (10,000 records each) to csv
2) add records to datatable - btw, adding them row by row after checking null is faster then datatable.load(csv) and deleting empty row afterwards)
3) Add 30,000 records to database (3 separate tables, 10,000 records each)
|
|
|
|
|
Your welcome!
Ya, deleting empty records from DB is another way. In that case, you could have used the CsvReader directly with SqlBulkCopy, instead of loading records in DataTable first Not that for 30,000 records it matters much anyway..
|
|
|
|
|