|
Here is my csv file. It gets converted from excel. According to your reader there are 25 fields. However, I only want the headings that have a value (should be 19), DocumentDate is the last field in the excel file. The conversion from excel to csv seems to add empty fields.
Also, I don't want the malformed or empty rows at the end.
This should be the last record...
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,64.3,8.04,USD,0.6713,10160-1,6/3/2010,6/3/2010
===============
csv file below
===============
SailDate,SourceCompanyName,Voyage,Vessel,Port,CountryDestination,SupplierName,SalesPoint,SalesContrNumber,DiscPort,LotNumber,CostType,NetCost,GST,Currency,ExchangeRate,SupplierInvoiceNumber,DueDate,DocumentDate,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6370,Susaki,4,MARKETING FEE,245.21,30.65,USD,0.6713,10160-4,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6370,Susaki,4,MARKETING FEE,519.86,64.98,USD,0.6713,10160-4,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,395.63,49.45,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,291.74,36.47,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6371,Susaki,3,MARKETING FEE,144.21,18.03,USD,0.6713,10160-3,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,684.26,85.53,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,815.42,101.93,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6372,Susaki,2,MARKETING FEE,21.12,2.64,USD,0.6713,10160-2,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,855.85,106.98,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,300.48,37.56,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
6/3/2010,TAUM,10160,SUSAKI WING V95,Tauranga,JAPAN,TPT Forests Limited - USD,FOB,6373,Susaki,1,MARKETING FEE,64.3,8.04,USD,0.6713,10160-1,6/3/2010,6/3/2010,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
|
|
|
|
|
I think you should rather concentrate your efforts on making Excel output the file without the extra empty fields/records .. I loaded your file in Excel, saved it as .xslx and saved it back to csv and the extra commas were gone ..
|
|
|
|
|
Well the problem is the initial export is from excel to csv, and it is done automatically, even when I do
it manually I still get the extra commas.
Yeah, if I open it in excel again and then save as csv again it works, however, twice converting a document seems redundant.
Is there a place in your code that I can add the logic to ignore/stop counting/adding values the field collection when it encounters an empty/null field?
any suggestions would be helpful. Long live excel...
|
|
|
|
|
You can write some custom code in ReadNextRecord just before the line 1521 _fieldCount++; . You can check the _fields variable and update _fieldCount to be the last field index that has a non-empty value:
for (int i = _fieldCount; i >= 0; i--)
{
if (string.IsNullOrEmpty(_fields[i]))
_fieldCount--;
}
_fieldCount++;
The reader will still read the empty records at the end of the file, but I guess you can detect those easily in your code.
|
|
|
|
|
Thanks, I will try that. Will doing this also prevent the "Key Name already exist error" from adding duplicate header names to the index?
|
|
|
|
|
That error was caused by trying to add many empty headers, so yes
|
|
|
|
|
Great! One more thing, will this code just be executed for the header row? My data might have null/empty values for records in a given valid field and I wouldn't want to lose out on getting that data.
|
|
|
|
|
The code I posted will in effect make it appear as if you CSV has 19 fields instead of 25. So the last 6 fields will always be skipped. The other 19 are read like any others, so empty values will always come out as string.Empty.
|
|
|
|
|
Great! My last question...
I am getting empty rows in my datatable even though I set csv.SkipEmptyLines = true. Do I need to change missingfieldaction to replacebyempty instead of replace by null? I added my code below...
Btw, You should sell this tool. You may want to break some of the code into smaller methods.
private void LoadFilesToDatabase(List<BaseHelper.UploadData> files)
{
foreach (BaseHelper.UploadData file in files)
{
DataTable dt = new DataTable();
DataRow dr;
using (CsvReader csv = new CsvReader(new StreamReader(file.FilePath + ".csv"), true))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
csv.SkipEmptyLines = true;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
foreach (string header in headers)
{
dt.Columns.Add(new DataColumn());
}
while (csv.ReadNextRecord())
{
dr = dt.NewRow();
for (int i = 0; i < fieldCount; i++)
{
dr[i] = csv[i];
}
dt.Rows.Add(dr);
}
}
string destinationTable = "DataServices" + file.FileName.ToLower().Replace(".csv","") + "CSV";
BulkAddDatatableToSQLServer(dt, destinationTable);
}
}
|
|
|
|
|
I think I should put more emphasis on the fact that the CsvReader implements the IDataReader interface Your code can be rewritten as below (warning, notepad IDE). I added the code to skip empty records at the end. There is no way the reader can skip them by itself, since they are valid records. As for refactoring, have a look at http://nlight.codeplex.com[^].
private void LoadFilesToDatabase(List<BaseHelper.UploadData> files)
{
foreach (BaseHelper.UploadData file in files)
{
DataTable dt = new DataTable();
using (CsvReader csv = new CsvReader(new StreamReader(file.FilePath + ".csv"), true))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
csv.SkipEmptyLines = true;
dt.Load(csv);
}
for (int i = dt.Rows.Count; i >= 0; i++)
{
if (string.IsNullOrEmpty((string) dt.Rows[i][0]))
dt.Rows.RemoveAt(i);
}
string destinationTable = "DataServices" + file.FileName.ToLower().Replace(".csv","") + "CSV";
BulkAddDatatableToSQLServer(dt, destinationTable);
}
}
|
|
|
|
|
Hey, Just want to say thanks for everything.
With my conversion code, your tool and SqlBulkCopy the following took 1.2 seconds:
1) convert 3 excel files (10,000 records each) to csv
2) add records to datatable - btw, adding them row by row after checking null is faster then datatable.load(csv) and deleting empty row afterwards)
3) Add 30,000 records to database (3 separate tables, 10,000 records each)
|
|
|
|
|
Your welcome!
Ya, deleting empty records from DB is another way. In that case, you could have used the CsvReader directly with SqlBulkCopy, instead of loading records in DataTable first Not that for 30,000 records it matters much anyway..
|
|
|
|
|
Hi,
Firstly, Thank you for a great library.
I've noticed that there are some chars that are not read in correctly from the csv file. The british pound symbol (£) is the only one I really care about but also ¬ and I think there was another one but I can't remember it...
Do you have any Idea how I can add support for these chars? I've narrowed it down to the _reader.read call on line 1079 on CsvReader.cs but this is a system.io.streamreader. How can it not read in these characters properly?
Thanks!
|
|
|
|
|
You need to specify the encoding to use when creating the StreamReader (see System.Text.Encoding).
|
|
|
|
|
Ahhh, thank you. My file was in a strange encoding for some reason.
Thanks
|
|
|
|
|
Hi
First off this is a great project and could likely really help me out.
I have tried to use it referenced by a CLR stored procedure WITH PERMISSION_SET = EXTERNAL_ACCESS. The CREATE ASSEMBLY statement in SQL Server fails:
CREATE ASSEMBLY failed because type "LumenWorks.Framework.IO.Csv.CsvReader" in external_access assembly "LumenWorks.Framework.IO" has a finalizer. Finalizers are not allowed in external_access assemblies.
I can register it as UNSAFE. Is it possible to remove the finalizer(s)? Would it be preferable to do so?
I am not a .NET programmer so I'd be grateful if you could keep things as simple as you can.
Many Thanks
Dan
|
|
|
|
|
As long as you make sure to call Dispose() once you are done reading the csv, you can safely remove the finalizer.
|
|
|
|
|
in this sample line where my text file is this ; separated.
e.g
"13123";"1231231";"23424";"xxx "xx" xxx";"";"";"".
in the above example line at the stage when the reader reads the word till
xxx " and did not read the full that is "xxx "xx" xxx".
Your csv reader work super for all my files
but in 1 file there is this above problem.
it gives error MalformedCsvException
|
|
|
|
|
and the filed count goes to soo heigh figure i hace 130 field in my text file
and iam reading a text file.
previously all my text file goes perfect
modified on Thursday, July 1, 2010 10:46 AM
|
|
|
|
|
You do know that the source code is available too, right ? Why did you use reflector to reverse engineer my code :p
|
|
|
|
|
i was supplied with these file from a firend of mine i didnt know that it was reflector
any way iam real sorry for that
and apologizes
please help me in this above scenario
|
|
|
|
|
The code you sent is not helping you? Whether it is or not, the real problem is that the csv data you have is invalid because quotes inside quoted fields need to be escaped, for example:
"This is ""field1"".", "This is ""field2""."
or using the \ escape char (which you can specify when calling the CsvReader constructor:
"This is \"field1\".", "This is \"field2\"."
|
|
|
|
|
when i called the csv like this
CachedCsvReader csv = new CachedCsvReader(new StreamReader(Path), false,';');
try
{
csv.ReadToEnd();
}
catch
{
ErrorPosting(....);
return;
}
the text file is supplied to me its not in my controll
And the Text file iam current working on its a sample please save it as a txt file and try to use it ihope you can find out what's happning
here goes
"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"Heading";"SHeading";"SHeading";"SHeading";"SHeading";"OHeading";"OHeading#2";"OHeading#3/PO Number"
"111111111";"20100614";"22222222";"";"597.68";"574.14";"Shoper";"";"3333333333";"3.05";"10.23";"night";"";"2010";"2010";"01000";"AA";"X.AAAAA";"1.00";"X";"1.00";"X";"0000001";"0000000";"000000";"NNNNN CCCCC";"BUTTON";"6TH ";"";"MUSCULAR";"A";"000001";"ZS";"LOrance";"Practice Practice";"";"111 sand";"OBRANCH";"ZX";"4444";"XX";"watch "CLOCK" Niddle";"";""
i have no controll over this text file its totally supplied and i have to read it and post it
i have also tried to import this same file in excel-2003 with delimiter
=; and its perfectly shown.
modified on Friday, July 2, 2010 6:01 AM
|
|
|
|
|
Hi Sebastien
Found your dirty little trick code!
while (ReadField(_nextFieldIndex, true, true) != null)
{
}
How easy would it be to change the code to just increase the field count on the fly to accomodate the 'extra' fields?
(Not asking you to change your code necessarily, just to give me an idea of how easy/difficult it would be)
In this scenario, we have a CSV table but there are several lines of free text above it which may or not contain delimiter chars from the CSV table. We need to keep all data so we just treat the free text as part of the CSV table and increase the maximum column count as we go so the final maximum column count is not known until the whole file is read.
Cheers
Simon
|
|
|
|
|
I do not really understand how you handle free text. If it is free text, then how can it be guaranteed to be valid csv data? If I was in your shoes, I would comment out those lines (by putting a # at the start of those lines).
|
|
|
|
|