|
My question is: how can the reader distinguish between real empty fields and mere unwanted delimiters?
I'm sorry to tell you that, but the only option I could realistically offer is a quirk mode for Excel. Seriously, how hard is it for the Office team to follow a 5 lines standard which is probably older than I am?
Yet, I am willing to do this for my new project at http://nlight.codeplex.com[^], but not for the library offered here: it's a lost cause. I wrote that code assuming well formed data (how naive of me!) and it's a real PITA when I try to go around Excel/SQL Server idiocies. For that reason, the NLight project will become the 4.0 version of this article rather sooner than later.
|
|
|
|
|
Not sure whether this is a perfect solution, but at least it works on my case.
CsvReader.cs
private string ReadField(int field, bool initializing, bool discardValue)
before _nextFieldIndex = Math.Max(index + 1, _nextFieldIndex);
just add:
if (_skipEmptyField && string.IsNullOrEmpty(value))
continue;
|
|
|
|
|
Have you run the unit tests ?
|
|
|
|
|
|
Ah I see, you added you own flag. Well, that's basically part of the quirk mode I was talking about, but there is much more to it. Your request is one of many ... It happens that this particular one is simple, but on the other hand, as far as I remember, you are only the second person to ask for it. The most requested ones are non-escaped quote characters in fields and extra commas at the end of records.
So I'm glad you went ahead and added code to handle your scenario. It will surely become part of the larger Excel "fix" ...
|
|
|
|
|
I have blank Fields in my header row, courtesy of converting from excel to csv.
I get something like this field1, field2, field3,,,
I only want to include the 3 fields. However, if any of the records contain a null or empty space I want to include them, up to the 3 fields only.
i.e. 3,,3,,, would just include 3,,3
Does the MissingFieldAction apply to headers and records or just headers?
|
|
|
|
|
I am not sure how exactly is your csv. Can you post a short sample which includes the headers?
|
|
|
|
|
Here is my csv file. It gets converted from excel. According to your reader there are 25 fields. However, I only want the headings that have a value (should be 19), DocumentDate is the last field in the excel file. The conversion from excel to csv seems to add empty fields.
Also, I don't want the malformed or empty rows at the end.
This should be the last record...
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,64.3,8.04,USD,0.6713,10160-1,6/3/2010,6/3/2010
===============
csv file below
===============
SailDate,SourceCompanyName,Voyage,Vessel,Port,CountryDestination,SupplierName,SalesPoint,SalesContrNumber,DiscPort,LotNumber,CostType,NetCost,GST,Currency,ExchangeRate,SupplierInvoiceNumber,DueDate,DocumentDate,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6370,Susaki,4,MARKETING FEE,245.21,30.65,USD,0.6713,10160-4,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6370,Susaki,4,MARKETING FEE,519.86,64.98,USD,0.6713,10160-4,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,395.63,49.45,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,291.74,36.47,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,144.21,18.03,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,684.26,85.53,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,815.42,101.93,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,21.12,2.64,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,855.85,106.98,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,300.48,37.56,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,64.3,8.04,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
|
|
|
|
|
I think you should rather concentrate your efforts on making Excel output the file without the extra empty fields/records .. I loaded your file in Excel, saved it as .xslx and saved it back to csv and the extra commas were gone ..
|
|
|
|
|
Well the problem is the initial export is from excel to csv, and it is done automatically, even when I do
it manually I still get the extra commas.
Yeah, if I open it in excel again and then save as csv again it works, however, twice converting a document seems redundant.
Is there a place in your code that I can add the logic to ignore/stop counting/adding values the field collection when it encounters an empty/null field?
any suggestions would be helpful. Long live excel...
|
|
|
|
|
You can write some custom code in ReadNextRecord just before the line 1521 _fieldCount++; . You can check the _fields variable and update _fieldCount to be the last field index that has a non-empty value:
for (int i = _fieldCount; i >= 0; i--)
{
if (string.IsNullOrEmpty(_fields[i]))
_fieldCount--;
}
_fieldCount++;
The reader will still read the empty records at the end of the file, but I guess you can detect those easily in your code.
|
|
|
|
|
Thanks, I will try that. Will doing this also prevent the "Key Name already exist error" from adding duplicate header names to the index?
|
|
|
|
|
That error was caused by trying to add many empty headers, so yes
|
|
|
|
|
Great! One more thing, will this code just be executed for the header row? My data might have null/empty values for records in a given valid field and I wouldn't want to lose out on getting that data.
|
|
|
|
|
The code I posted will in effect make it appear as if you CSV has 19 fields instead of 25. So the last 6 fields will always be skipped. The other 19 are read like any others, so empty values will always come out as string.Empty.
|
|
|
|
|
Great! My last question...
I am getting empty rows in my datatable even though I set csv.SkipEmptyLines = true. Do I need to change missingfieldaction to replacebyempty instead of replace by null? I added my code below...
Btw, You should sell this tool. You may want to break some of the code into smaller methods.
private void LoadFilesToDatabase(List<BaseHelper.UploadData> files)
{
foreach (BaseHelper.UploadData file in files)
{
DataTable dt = new DataTable();
DataRow dr;
using (CsvReader csv = new CsvReader(new StreamReader(file.FilePath + ".csv"), true))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
csv.SkipEmptyLines = true;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
foreach (string header in headers)
{
dt.Columns.Add(new DataColumn());
}
while (csv.ReadNextRecord())
{
dr = dt.NewRow();
for (int i = 0; i < fieldCount; i++)
{
dr[i] = csv[i];
}
dt.Rows.Add(dr);
}
}
string destinationTable = "DataServices" + file.FileName.ToLower().Replace(".csv","") + "CSV";
BulkAddDatatableToSQLServer(dt, destinationTable);
}
}
|
|
|
|
|
I think I should put more emphasis on the fact that the CsvReader implements the IDataReader interface Your code can be rewritten as below (warning, notepad IDE). I added the code to skip empty records at the end. There is no way the reader can skip them by itself, since they are valid records. As for refactoring, have a look at http://nlight.codeplex.com[^].
private void LoadFilesToDatabase(List<BaseHelper.UploadData> files)
{
foreach (BaseHelper.UploadData file in files)
{
DataTable dt = new DataTable();
using (CsvReader csv = new CsvReader(new StreamReader(file.FilePath + ".csv"), true))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
csv.SkipEmptyLines = true;
dt.Load(csv);
}
for (int i = dt.Rows.Count; i >= 0; i++)
{
if (string.IsNullOrEmpty((string) dt.Rows[i][0]))
dt.Rows.RemoveAt(i);
}
string destinationTable = "DataServices" + file.FileName.ToLower().Replace(".csv","") + "CSV";
BulkAddDatatableToSQLServer(dt, destinationTable);
}
}
|
|
|
|
|
Hey, Just want to say thanks for everything.
With my conversion code, your tool and SqlBulkCopy the following took 1.2 seconds:
1) convert 3 excel files (10,000 records each) to csv
2) add records to datatable - btw, adding them row by row after checking null is faster then datatable.load(csv) and deleting empty row afterwards)
3) Add 30,000 records to database (3 separate tables, 10,000 records each)
|
|
|
|
|
Your welcome!
Ya, deleting empty records from DB is another way. In that case, you could have used the CsvReader directly with SqlBulkCopy, instead of loading records in DataTable first Not that for 30,000 records it matters much anyway..
|
|
|
|
|
Hi,
Firstly, Thank you for a great library.
I've noticed that there are some chars that are not read in correctly from the csv file. The british pound symbol (£) is the only one I really care about but also ¬ and I think there was another one but I can't remember it...
Do you have any Idea how I can add support for these chars? I've narrowed it down to the _reader.read call on line 1079 on CsvReader.cs but this is a system.io.streamreader. How can it not read in these characters properly?
Thanks!
|
|
|
|
|
You need to specify the encoding to use when creating the StreamReader (see System.Text.Encoding).
|
|
|
|
|
Ahhh, thank you. My file was in a strange encoding for some reason.
Thanks
|
|
|
|
|
Hi
First off this is a great project and could likely really help me out.
I have tried to use it referenced by a CLR stored procedure WITH PERMISSION_SET = EXTERNAL_ACCESS. The CREATE ASSEMBLY statement in SQL Server fails:
CREATE ASSEMBLY failed because type "LumenWorks.Framework.IO.Csv.CsvReader" in external_access assembly "LumenWorks.Framework.IO" has a finalizer. Finalizers are not allowed in external_access assemblies.
I can register it as UNSAFE. Is it possible to remove the finalizer(s)? Would it be preferable to do so?
I am not a .NET programmer so I'd be grateful if you could keep things as simple as you can.
Many Thanks
Dan
|
|
|
|
|
As long as you make sure to call Dispose() once you are done reading the csv, you can safely remove the finalizer.
|
|
|
|
|
in this sample line where my text file is this ; separated.
e.g
"13123";"1231231";"23424";"xxx "xx" xxx";"";"";"".
in the above example line at the stage when the reader reads the word till
xxx " and did not read the full that is "xxx "xx" xxx".
Your csv reader work super for all my files
but in 1 file there is this above problem.
it gives error MalformedCsvException
|
|
|
|
|