|
Thanks for pointing me this limitation! I will add the check to the existing ones.
And thanks for your appreciation
|
|
|
|
|
First of all I want to thank you for providing this and improving on it every once and a while. It really is some great stuff.
I've got a 1.2 gig CSV file.
I'm using the CSV reader to convert this file into another text format using a streamwriter.
It started out taking 4 seconds to write every 100,0000 records, after a while the time to write 100,000 records gradually increases up to 5 minutes or so.
The Page File and memory usage displayed in the task manager increases as well.
Is there anything you can think of that would cause the memory usage to keep increasing like this?
It finally gets to the point where the box runs out of memory and dies.
Any ideas?
Justin
|
|
|
|
|
I'm converting the CSV to | delimited text file.
Friend Sub ConvertCSV2DelimitedTxt(ByVal sCSVFile As String, ByVal sNewFile As String, Optional ByRef sHeaderColumns As String() = Nothing)
colLog.Add(Now & " - Start Converting CSV to TXT.")
colLog.Add(Now & " - " & sCSVFile & " to " & sNewFile)
RaiseEvent Status("Converting CSV File to Delimited Text File")
File.Delete(sNewFile)
Dim sw As StreamWriter = File.CreateText(sNewFile)
Dim chDelimiter As Char = CChar(GetSetting("Delimiter", "|"))
Dim csv As New CsvReader(New StreamReader(sCSVFile), True)
csv.SupportsMultiline = True
csv.MissingFieldAction = 0
sHeaderColumns = csv.GetFieldHeaders()
sw.WriteLine(Join(sHeaderColumns, chDelimiter))
'Write the rest of the file
Dim FakePercentDone As Double = 0.6
Dim LastPercentDone As Integer = 0
While csv.ReadNextRecord()
'Progress bar logic
If CInt(FakePercentDone) <> LastPercentDone Then
RaiseEvent Progress(CInt(FakePercentDone))
LastPercentDone = CInt(FakePercentDone)
End If
FakePercentDone += 0.05
If CInt(FakePercentDone) > 100 Then FakePercentDone = 0.6
Try
Dim sRecord As String = String.Empty
For i As Integer = 0 To csv.FieldCount - 1
sRecord += csv.Item(i) & chDelimiter
Next
sw.WriteLine(sRecord.Substring(0, sRecord.Length - 1))
Catch ex As Exception
Throw ex
End Try
End While
sw.Close()
csv.Dispose()
colLog.Add(Now & " - Done Converting CSV to TXT.")
End Sub
|
|
|
|
|
My first question would be if you tried only reading the csv and doing nothing more. If so, do you still have memory issues? I use the CsvReader with very large files and never had this problem, so your comment is a bit surprising. Could you send me a sample of your csv file which I can use to generate a large file?
Thanks for your comment!
|
|
|
|
|
When I removed all the code except for the basic While and For loops, the problem did not present itself. I've also discovered that the problem seems to be .NET 1.1 related. I code and debug in VS 2005 against the 2.0 framework, but have to compile using the MSBee utility to target the 1.1 framework for my users. The process is quite speedy in the debugger in VS 2005 but it starts dragging when I run the 1.1 version.
I'm going to try to close and reopen the streamwriter for my output file every 100,000 records or so to see if that will clear it up. Any other ideas you might have would be apprecieated.
Thanks again for the CSVReader.
Justin
Here's a sample of the data if you still wanted to tinker with it.
Category Name,Fiscal Year / Period,Store Number,Sku Number,Sales Type,DIY Gross Sales QTY,DIY Damaged Returns,DIY Undamaged Returns QTY,DIY Gross Sales Retail,DIY Damaged Returns Retail,DIY Undamaged Returns Retail,VDP Gross Sales QTY,VDP Damaged Returns QTY,VPD Undamaged Returns QTY,VPD Gross Sales Retail,VPD Damaged Returns Retail,VPd Undamaged Returns Retail
OILATF,2005071,2,7395,0,6,0,-7,9.54,0.00,-11.13,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,7396,0,70,0,-5,111.30,0.00,-7.95,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,20224,0,156,0,0,201.24,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,20233,0,132,0,0,170.28,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,20234,0,2,0,0,2.58,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,54771,0,4,0,0,14.57,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,54805,0,1,0,0,4.39,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,54806,0,11,0,0,57.09,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,69375,0,23,0,0,41.17,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119580,0,6,0,0,15.54,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119879,0,18,0,0,46.62,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119880,0,5,0,0,9.95,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119881,0,39,0,0,77.61,0.00,0.00,0,0,0,0.00,0.00,0.00
|
|
|
|
|
Humm, if the problem is not related to the CsvReader, then I would need to investigate your program to help you out and honestly, I do not have much time.
That said, avoiding the string concatenations and writing your data in chunks would certainly help.
Const ChunkSize As Integer = 4096 'Whatever suits you best
Dim chunk As Integer = 0
While csv.ReadNextRecord()
For i As Integer = 0 To csv.FieldCount - 2
sw.Write(csv.Item(i))
sw.Write(chDelimiter)
Next
sw.WriteLine(csv.Item(csv.FieldCount - 1))
chunk += 1
If chunk > ChunkSize Then
chunk = 0
sw.Flush()
End If
End While
|
|
|
|
|
I want read CSV files as below.
A,F01,F02,F03,F04,F05
U,F11,F12,F13
A,F21,F22,F23,F24,F25
D,F31
A,F41,F42,F43,F44,F45
But the last fields of 2'nd and 3'rd line(in other words, F13 and F31) are seemed like a missing Field, when I access those fields. So, I added ',' after those filelds as below. Then those fields appare now.
A,F01,F02,F03,F04,F05
U,F11,F12,F13,
A,F21,F22,F23,F24,F25
D,F31,
A,F41,F42,F43,F44,F45
Why?
<br />
DataTable tbl = new DataTable() ;<br />
<br />
StreamReader sr = new StreamReader(csvFileName) ;<br />
<br />
CsvReader csv = new CsvReader(sr, false) ;<br />
csv.MissingFieldAction = MissingFieldAction.ReturnNullValue ;<br />
<br />
tbl.Load(csv) ;<br />
<br />
DataGridView testView = new DataGridView() ;<br />
. <br />
.<br />
testView.DataSource = tbl ;<br />
<br />
Japanese C# primer I.K
|
|
|
|
|
You need to use the following line of code instead:
csv.MissingFieldAction = MissingFieldAction.ReturnPartiallyParsedValue;
The logic is as follow:
In "U,F11,F12,F13", the reader does not know if F13 is indeed the complete field value or if it is corrupted and a part of it is missing. After all, the csv is not well formed, so anything could have happened.
So to handle this situation, you have 3 choices: return an empty value instead of "F13", return a null value instead of "F13" or return the partial result "F13" (which in your case, you know is valid).
|
|
|
|
|
Thanks for your reply, Sébastien. I changed my code according to your recommend as bellow.
csv.MissingFieldAction = MissingFieldAction.ReturnPartiallyParsedValue;
|
V
csv.MissingFieldAction = MissingFieldAction.ReturnPartiallyParsedValue;
This time, value of F13 and F31 are displayed in DataGridView just as I wanted, without adding ',' after those fields (in other word, by using the original data).
A,F01,F02,F03,F04,F05
U,F11,F12,F13 <--- 2 really missing fields exist
A,F21,F22,F23,F24,F25
D,F31 <--- 4 really missing fields exist
A,F41,F42,F43,F44,F45
Now in turn, the values of really missing fields are empty string instead of null, whitch is the valid result, I think. And this result is good enough at this time, but I think that it should give the result similar to the ReturnPartiallyParsedValue case in the ReturnPartiallyParsedValue case, too. (i.e. null instead of empty string)
I like your liblary very well. thank you!!
Japanese C# primer I.K
|
|
|
|
|
when I have this line:
Bike;£9,99;bell
the pound sign disappears?
Can you tell me why? Is this a bug?
And/Or how can I fix this?
Thanks,
Bjorn
|
|
|
|
|
This is an encoding issue. Your need to save your file as Unicode (UTF-8 or whatever) or otherwise provide the codepage to the StreamReader (probably Windows-1252).
using (CsvReader csv = new CsvReader(new StreamReader("data.csv", Encoding.GetEncoding(1252)), false, ';'))
|
|
|
|
|
Thanks it worked
Many thanks..
Bye
Bjorn
|
|
|
|
|
Is there any way though of getting the current line in the same way as you can get the headers (i.e. as a string array)? I'm currently reading each field in the record and then putting back together into the array, but I'd like to do something like
while(csv.ReadNextRecord())
{
string[] x = csv.CurrentRecord;
// do something with x
}
Is this possible?
Eifion
|
|
|
|
|
Yes, you can do it as follow:
string[] x = new string[csv.FieldCount];
while(csv.ReadNextRecord())
{
csv.CopyCurrentRecordTo(x);
}
|
|
|
|
|
|
Found this my mistake - the app that generates our CSV file decided to create a file with just two blank lines in it. When I came to read the CSV file, I got a Null Reference Exception in routine GetFieldHeaders(). I don't know whether that's what you're expecting (I wasn't) but I saw that version 3.0 came with "extensive support for malformed CSV files" so thought you'd like to know about this kind of malformation
Thanks for a great library.
Steve
|
|
|
|
|
You are right, I did not test this particular scenario. What would you be expecting ? An exception or the method returns null (and would it also raise the ParseError event) ?
|
|
|
|
|
I have to say, I'm not a great fan of raising exceptions for just about every error. My preference would be for a property (IsCsvValid) that I can test for a malformed CSV file or, as you say, returning null when I fetch the headers.
Thanks
Steve
|
|
|
|
|
I will return an empty array in cases where it is not possible to get the headers (just as when there is none).
If you do not want to wait until the article is updated, simply modify CsvReader.cs as follow:
protected virtual bool ReadNextRecord(bool onlyReadHeaders, bool skipToNextLine)
{
if (!_initialized)
{
_buffer = new char[_bufferSize];
_fieldHeaders = new string[0];
}
|
|
|
|
|
|
Hi Sebastian,
I am not good at .net. I am looking create csv parser service to parse large csv files for different databases. Basically, I will have a config file where different database connections tsrings and the location of the corresponding csv files. And to write the log/error files to corresponding folders. Is it possible to modify your code to do that? If so, please give me some suggestions. In our csv file, we have few fields uses another delimiter and I would need to split those fields with that delimiter to save the data into corresponding tables.
I tried to run the demo applicatyion and it gave me few compile errors. Since the error were related to the untitest component, I downloaded and installed that as well. Can you please send some brief instructions on how I should make it work.
Your help is greatly appreciated.
Thanks,
Thomson.
|
|
|
|
|
For the compile errors, the simplest solution is to remove the unit tests project from the solution and it should compile fine. Only LumenWorks.Framework.IO.Csv project is necessary to use the CsvParser.
This library can read CSV files, but cannot write them.
You can specify any field delimiter when creating an instance of the CsvReader class (using one of its constructors).
If you any more questions, please feel free to ask them
|
|
|
|
|
Thanks Sebastian.
I don't have to write into a csv file. I need to read the csv file (250MB), save it into proper fields in the database.
To be very frank, I could not find how I can use your code for this case.
This is what I think I should be doing:
I need to add the LumenWorks.Framework.IO.Csv project into my service project. Call its methods by passsing the location of the csv file. What will be the output I get?
can you tell me where can I see the different methods and parameters details of this project? How do I set up to read csv files from specific location and to save it into a database. And to save the log file into a specific location.
SAlso, our csv file contains data such as:
abc, sdf, gh:ty:09,tyu
Here, I need to split the field 3 (gh:ty:09) by ":" to save those data into diffreent tables? Is there any provision in the code to do this?
When the file is of larger size, do you split the file to process it faster?
Thanks,
Thomson.
Thomson.
-- modified at 18:52 Wednesday 16th August, 2006
|
|
|
|
|
The reader will not load the whole file into memory but will parse it in a streaming fashion using an internal buffer, so no splitting needs to be done. It can easily handle files of many gigabytes without breaking a sweat.
Here is a code example in .NET 2.0. If you need the equivalent in .NET 1.1, please tell me. I assumed you had headers in your csv. If not, then use this instead:
new CsvReader(new StreamReader("c:\\data.csv"), false, ':')
using System;
using System.Data;
using System.IO;
using LumenWorks.Framework.IO.Csv;
class Foo
{
static void ReadCsv()
{
DataTable data = new DataTable();
using (StreamWriter log = new StreamWriter("c:\\logfile.log"))
{
using (CsvReader csv = new CsvReader(new StreamReader("c:\\data.csv"), true, ':'))
{
csv.ParseError += delegate(object sender, ParseErrorEventArgs e)
{
log.WriteLine(e.Error.ToString());
};
data.Load(csv);
}
}
}
}
|
|
|
|
|
Thanks Sebastain. Can you send the code in .net 1.1?
Also please confirm me the following:
1. The place where you mentioned "// update your DB with the content of data ...", I would get 1 record of the csv file in to the data table? or I would be getting all of the records in the data table with the columns names as the first record? Will this cause a trouble to have upto 250K records in the dataTable?
2. Deas the log file contain the records/lines failed?
3. Suppose an error happens in the 50th record, does the parser log it and go to the next record?
4. Does the parser give me the details as "total records, number of successfully parsed, number of failed records". If so how do I retrieve it?
5. Suppose I use this class in a windows service, if I have 100 csv files to parse daily (different files to different database. but same format), will the system allow me to log the error on any of the files and go to the next files?
6. Is it possible to use this dll from asp (not asp.net)?
Thanks,
Thomson.
|
|
|
|
|