|
Hi Jonathan,
Am very happy to get this article.. i need one more help from you... that is i need to imported (uploaded) CSV file and readed it and have to write the data in another CSV file then exported(download) that files in our local machine..
How can this possible.. if you have the code about this task.. please mail me jonathan.. its Soo help to me.. here am waiting for your mail with code..
Please note my mail id : srvishnukumarmit@gmail.com
Thanks..
Thanks & Regards,
Vishnukumar SR
|
|
|
|
|
I'm sorry, this type of question is outside the scope of the comments section of this site. I am available for consulting if you are looking for someone to write code for you. But my rates are pretty steep. Sorry about that.
If you want to discuss some consulting, just use the Contact form on the website www.blackbeltcoder.com
|
|
|
|
|
Hi Jonathan,
Thank you for your rply... i gone through www.blackbeltcoder.com[^] site for my requirement.. but the code is didnt support in my laptop.. here am using 4.0 version framework.. may be your code is on 4.0 higher.. so please give me older version code for my laptop..
it is better help for my requirement.. thanks
Thanks & regards,
Vishnukumar SR
|
|
|
|
|
The source code is plain text. Simply create your own project and add the C# files to your project. Most of the code appears in the article. It shouldn't be necessary for me to make custom versions of the source code.
|
|
|
|
|
Hello,
thank you for this nice Class. I'm an beginner with object oriented programing an c#. But i managed to use your Class in my Program like this.
CsvFileWriter csvFile1 = new CsvFileWriter("Writet.csv");
CsvRow row1 = new CsvRow();
row1.Add("test");
row1.Add(String.Format("{0}", globalvars.Datensatz.checksumme));
csvFile1.WriteRow(row1);
csvFile1.Close();
Can you please explain how to add lines to an existing csv-file? With your code or with my approach the target file "Writet.csv" always gets overwritten...
best regards Raphael
EDIT: by now i've done this by open, read to string, close the file, then open, write the string, add rows, close the File. Is this the way to go or is there a better solution i missed?
string csv_buffer="";
CsvRow row = new CsvRow();
CsvFileReader csvFile_read = new CsvFileReader((Filename + ".csv"));
csv_buffer = csvFile_read.ReadToEnd();
csvFile_read.Close();
...
modified 13-Dec-13 6:24am.
|
|
|
|
|
Yeah, sorry, the code doesn't currently include an append mode. You could look at the code that opens the file and modify it to append instead of create.
|
|
|
|
|
overload the constructor
public CSVFileWriter(string filename, bool append)
: base(filename, append)
{
}
-
i came, i saw, i hide...
|
|
|
|
|
This library can read CSV files, including handling columns that have line feeds in them. It is inexpensive too.
">http://www.kellermansoftware.com/p-50-csv-reports.aspx
http://www.kellermansoftware.com/p-50-csv-reports.aspx
|
|
|
|
|
Perfect - just what I needed - Cheers
|
|
|
|
|
This is certainly a nice and simple library for reading CSV files.
A more powerful alternative that takes into account quotes and newlines within fields, etc., would be the
LINQ to CSV library.
|
|
|
|
|
I have a generic list array with point type which have x and y coordinates.I need to put those values in csv file. I think I need to change the main window class. How can I do that? Can I do this with this code? Any ideas?
|
|
|
|
|
The code is not robust for general CSV format e.g. as can be generated with MS-Excel:
- a field may also contain new lines (then it must be quoted field)
- if the file comes from MS-Excel (or from powershell), then the current localization tells what the field or list separator[^] is - it is not always a comma (e.g. in my de-CH localization, it is semi-colon)
In addition to that, I would have reservations for maintaining the parsing code with all the position tweaking... Why not using established parsing techniques by properly tokenizing the data and parse the tokens?
Cheers
Andi
|
|
|
|
|
First, I consider multi-line CSV files to be an extension to the format. Something that would be nice to implement but hardly needed to make the code useful. I've processed hundreds of CSV files generated by Excel without trouble.
Regarding your last point, I do, in effect, tokenize the data. You seem to be referring to a specific technique. If you could explain or provide a reference to what you mean specifically, I could consider what you are saying.
|
|
|
|
|
Hello Jonathan,
I consider Excel as possible source for CSV files - and what Excel can gernerate should be processed by a CSV parser (my claim).
In my work I am faced with localized files (I get files produced en-UK, en-US, de-DE, de-CH, etc.), and I am faced with multi-line fields. I did ask myself, if I was able to enhance your code to support these two features (localized separator and multi-line fields). I came to the conclusion that I would rather rewrite the parser.
E.g. I would first define the grammar:
Then I would define how I want to use it, e.g.
public static void Main()
{
string filename = @"c:\temp\csv.csv";
var scanner = new RegexCsvScanner(";", File.ReadAllText(filename));
var recordHandler = new ConsoleCsvRecordHandler();
var parser = new CsvParser(scanner, recordHandler);
parser.ParseRecords();
}
From that I would derive the interfaces, e.g.,
public interface ICsvRecordHandler
{
void BeginCsv();
void BeginRecord(int recordNr);
void AddField(int fieldNr, string field);
void EndRecord(int recordNr);
void EndCsv(int recordsTotal);
}
public interface ICsvParser
{
void ParseRecords();
}
public interface ICsvScanner
{
string Curr { get; }
void Next();
bool HasData { get; }
bool IsSep { get; }
bool IsEol { get; }
}
Finally implement the parser...
public class CsvParser: ICsvParser
{
private ICsvScanner Scanner { get; set; }
private ICsvRecordHandler RecordHandler { get; set; }
public CsvParser(ICsvScanner scanner, ICsvRecordHandler recordHandler)
{
Scanner = scanner;
RecordHandler = recordHandler;
}
public void ParseRecords()
{
RecordHandler.BeginCsv();
Scanner.Next();
int recordNr = 0;
while (Scanner.HasData) ParseRecord(recordNr++);
RecordHandler.EndCsv(recordNr);
}
private void ParseRecord(int recordNr)
{
RecordHandler.BeginRecord(recordNr);
int fieldNr = 0;
ParseField(fieldNr++);
while (Scanner.IsSep)
{
Scanner.Next();
ParseField(fieldNr++);
}
if (Scanner.IsEol) Scanner.Next();
RecordHandler.EndRecord(recordNr);
}
private void ParseField(int fieldNr)
{
string field = Scanner.Curr;
if (Scanner.IsSep || Scanner.IsEol) field = string.Empty;
else Scanner.Next();
RecordHandler.AddField(fieldNr, field);
}
}
... a variant of a scanner ...
public class RegexCsvScanner : ICsvScanner
{
private IEnumerator<Match> Tokenizer { get; set; }
public string Curr { get { return HasData
? Tokenizer.Current.Groups.Cast<Group>().Reverse().First(g => g.Success)
.Value.Replace(@"""""", @"""")
: string.Empty; } }
public void Next() { HasData = HasData && Tokenizer.MoveNext(); }
public bool HasData { get; private set; }
public bool IsSep { get { return HasData && Tokenizer.Current.Groups[2].Success; } }
public bool IsEol { get { return HasData && Tokenizer.Current.Groups[3].Success; } }
public RegexCsvScanner(string sep, string data)
{
string tokens = string.Join("|", new [] {
@"""((?:""""|[^""])*)""",
@"(" + sep + @")",
@"(\n\r?|\r\n?)",
@"([^""\n\r" + sep + @"]+)",
});
Tokenizer = Regex.Matches(data, tokens, RegexOptions.Compiled|RegexOptions.Singleline)
.Cast<Match>().GetEnumerator();
HasData = true;
}
}
... and the record handler...
public class ConsoleCsvRecordHandler: ICsvRecordHandler
{
private List<string> Fields { get; set; }
public ConsoleCsvRecordHandler()
{ Fields = new List<string>(); }
public void BeginCsv()
{ Console.WriteLine("------ CSV Parsing ---------"); }
public void BeginRecord(int recordNr)
{ Fields.Clear(); }
public void AddField(int fieldNr, string field)
{ System.Diagnostics.Debug.Assert(fieldNr == Fields.Count); Fields.Add(field); }
public void EndRecord(int recordNr)
{ Console.WriteLine("Record {0}: '{1}'", recordNr, string.Join("', '", Fields)); }
public void EndCsv(int recordsTotal)
{ Console.WriteLine("------ Records: {0} ---------", recordsTotal); }
}
The crucial thing is the separation of the scanning from the parsing: the parsing implements 1:1 the grammar as stated above. The scanner provides the tokens (can be exchanged by any other implementation as long as the interface is satisfied - eases testing anyways ). E.g. if one needs a scanner that can work on streams instead of preloading all data at once, the scanner can be replaced by such an implementation.
Likewise, processing of the parsed record is easily replaced.
So, to summarize, my main concern was the missing separation of scanner and parser. I hope my example code above illustrates what I meant by this.
Cheers
Andi
modified 6-Jul-12 23:13pm.
|
|
|
|
|
Thanks for the detailed reply. The code you posted is certainly interesting.
I can see you are a big fan of RegEx, which I'm not so much. I find RegEx both cryptic and restrictive, and I don't consider RegEx to be an "established parsing technique". Certainly, I wouldn't expect products like language compilers to use RegEx. As a result, I prefer my "position tweaking" for processing text input.
Regarding your point about keeping the tokenizing separate from the parsing, I would tend to agree. I wrote a little language interpreter that worked that way. However, I didn't see much advantage to doing that when parsing something as simple as a CSV file. Some of the other things your code provides, such as the use of interfaces, could also provide advantages, although just not for any needs I've run into.
But, again, your code is interesting and I think it adds value to this article by offering an alternative approach.
Thanks.
|
|
|
|
|
To my experience, separating tokenizing from parsing pays out immediately in terms of maintainability and testing. I've seen many codes that started off "simple" like CSV parsing and ended up uggly since some detail was overlooked and had to be hacked in in some way.
E.g. I've just noticed that I made a mistake in my grammar: the field definition is too restrictive regarding word :
Wrong | Fixed |
---|
...
...
... |
...
...
... |
Fixing this with in the code is now simple: the tokenizer has to be adapted to match:
Wrong | Fixed |
---|
...
@"(\n)",
@"([^""\s" + sep + @"]+)",
... |
...
@"(\n\r?|\r\n?)",
@"([^""\n\r" + sep + @"]+)",
... |
What concerns regular expression language for scanners: this is the lingua franca for token description on any scanner/parser generator I know. See Lex[^] (go to the rules section), Flex[^] (nice comparison of handcrafted versus regex), Coco/R[^] (especially the Tutorial, section 4: Scanner Specification)[^]), ANTLR[^] (especially Expression evaluator[^] example in the getting started page), etc.
So, why not take/learn that language if one is confronted with scanning/parsing (the same holds for specifying the grammar for the parser: take any variant of BNF or EBNF)?
Regex is cryptic, but C was cryptic to me too in the beginning... .
Using Regex in C# has admittedly some limitations (especially speed may be one), but they are weighted up in my eyes with maintainability (assuming you know Regex well enough). My view is: if you intend to process huge data where speed is of relevance (e.g. write a C# compiler), C# is not the first choice to do so... (I would go for C++11).
Finally, for more complex languages, you should employ one of the above mentioned tools to generate a parser anyways...
Cheers
Andi
|
|
|
|
|
Another version of this parser that manage cell lines and double quotes :
public class CsvFileReader : StreamReader
{
private char delimiter;
public CsvFileReader(Stream stream, char delimiter)
: base(stream)
{
this.delimiter = delimiter;
}
public CsvFileReader(string filename, char delimiter)
: base(filename)
{
this.delimiter = delimiter;
}
public CsvRow ReadRow()
{
var line = ReadCompleteLine();
if (line == null)
return null;
var keys = new Dictionary<string, string>();
var guillemet = "_" + Guid.NewGuid() + "_";
keys.Add(guillemet, "\"");
var temp = line.Replace("\"\"", guillemet);
var regex = new Regex(
@"""[^""]*""",
RegexOptions.IgnoreCase
| RegexOptions.Multiline
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
foreach (Match m in regex.Matches(temp))
{
var k = "_" + Guid.NewGuid().ToString() + "_";
keys.Add(k, m.Value);
temp = temp.Replace(m.Value, k);
}
CsvRow row = new CsvRow();
var split = temp.Split(delimiter);
foreach (var s in split)
{
row.Add(replaceKeys(s, keys));
}
return row;
}
private string replaceKeys(string s, Dictionary<string, string> keys)
{
s = keys.Keys.Reverse().Aggregate(s, (current, key) => current.Replace(key, keys[key]));
return s.StartsWith("\"") && s.EndsWith("\"") ? s.Substring(1, s.Length - 2) : s;
}
private string ReadCompleteLine()
{
string line;
var completeLine = String.Empty;
while ((line = ReadLine()) != null)
{
completeLine += "\n" + line;
if (completeLine.ToCharArray().Count(c => c == '"') % 2 == 0)
return completeLine.Trim('\n');
}
return null;
}
}
|
|
|
|
|
I think I'm going to rework my code to support multi-line fields and to handle some edge cases a little better.
<pre lang="text">RE: Another version of this parser that manage cell lines and double quotes</pre>
Could you please clarify this comment? What do you mean by "cell lines", and do you see any problem with the way my code handles double quotes?
Thanks.
|
|
|
|
|
By "cell lines", i meant multi-lines cells. I don't see any problem with the way you're handling double quotes.
I just posted another version of your code which handles double quotes, multi-lines fields and custom delimiter (in french format, the delimiter is ';')
Thanks for your concern
|
|
|
|
|
Thanks everyone for all the comments.
Please note that I have completely reworked my code, making it more robust and adding new functionality such as support for multi-line fields, custom delimiter and quote characters and options for how empty lines in the input file are handled.
I've posted the new code in the article Reading and Writing CSV Files in C#[^]
I expect to update the code on Code Project as well before long.
modified 12-Oct-12 22:55pm.
|
|
|
|
|
Hi,
Thanks for your code. I have a few comments if you don't mind.
This regards your new code (linked above on your blackbelt coder site). There are a number of issues I find:
1) debug.assert... this occurs 3 times in the code. It throws an error "The name 'Debug' does not exist in the current context".
Also, not sure if that is kind of code I would want to use, can you please offer some insight into it's inclusion, and/or how to get it to work properly?
2) To look at your "Project" sample code requires creating an id to log into the website - why? I am not creating yet another single instance of a login to look at the project. Why would you go to all the trouble to create such a lengthy detailed page full of code, and then require a login to get the sample project...why why why...
3) If I do get the debug.assert working.....then I will see if this code behaves the same as the code on this "codeproject" page, then it actually fails when it encounters a quoted element within the line. I can show an example if you like, but I thought maybe you might have fixed it in your new version - so I want to try that first before raising an issue. Short version: The entire section between the double quotes gets dropped from the resultant text being read in.
That's it for now, I appreciate your time and code and do not intend this to sound negative in any way!
Thanks kindly,
trevor
|
|
|
|
|
Hi,
Two quick modifications to my comment above
I got the debug.assert to compile by adding "using System.Diagnostics;" to my top declarations, might be nice to include those on the page....but they are in your sample project so....either way. I still am not clear if debug.assert should be used in a working code, I don't fully understand it's use here and would still appreciate your advice about it.
I did create yet another website login to get your sample code, but I still appeal that it should not be required...anyway...not a big deal I guess
Now that I have the new code actually compiling, I'll see if it reads the quoted elements properly, and post back here after testing!
Thanks again
|
|
|
|
|
Hi Jonathan,
The new code reads the quoted section with no problems...so I am not sure if you are interested in looking at the older code on this codeproject website - if you are, here is a sample data row that I am using in which it fails (but works with your new code):
"0000007777","8 impressions created for job "FR_RM_4_LE10_REG_No_Ret_2.pdf"","f798dc0f-6829-4522-b5ec-e3ca194c22e8","Rubi PDF::PS - CSI","PDF::PostScript(PDFL)","Fri Oct 26 07:37:55 2012","Accounting"
You can see that each field is in quotes, and the second field contains a item embedded within quotes (ends in a double quote). The code here on the codeproject site reads the line and returns everything but is missing the embedded string from the second field so, the missing data is: FR_RM_4_LE10_REG_No_Ret_2.pdf
I fully understand that since the new code fixes the issue, you may not care, but this page still comes up in search results and maybe it might help someone in the future to know that there could be issues...
Thanks once again!
trevor
|
|
|
|
|
Thanks for the comments.
1. Debug is in the System.Diagnostics namespace. On newer versions of Visual Studio, all you need to do is put the caret over this symbol and press Ctrl+. but you can add a using statement manually if needed. I recommend using Debug.Assert liberally throughout your code to assert any assumptions the code makes. They do not affect release builds.
2. The policy on Black Belt Coder is that a log in is required to download any of the code samples. This is exactly the same policy as on Code Project. Without it, anyone could link to the content as a download from their own site. I'll probably post the code on Code Project some day. As far as why I posted most of the code in the article is because I thought it would be helpful for someone trying to understand the article.
3. I'm not sure I know what you're referring to there. I did make a change so it handles quoted text within a field more like Excel does, however, this is invalid CSV format as all such fields should themselves be enclosed in quotes.
-- modified 26-Oct-12 12:33pm.
|
|
|
|
|
Hi,
I replied to my own message without refreshing my browser and had not seen your reply. for convenience, I will repeat my response here...sorry about that:
Hi Jonathan,
The new code reads the quoted section with no problems...so I am not sure if you are interested in looking at the older code on this codeproject website - if you are, here is a sample data row that I am using in which it fails (but works with your new code):
"0000007777","8 impressions created for job "FR_RM_4_LE10_REG_No_Ret_2.pdf"","f798dc0f-6829-4522-b5ec-e3ca194c22e8","Rubi PDF::PS - CSI","PDF::PostScript(PDFL)","Fri Oct 26 07:37:55 2012","Accounting"
You can see that each field is in quotes, and the second field contains a item embedded within quotes (ends in a double quote). The code here on the codeproject site reads the line and returns everything but is missing the embedded string from the second field so, the missing data is: FR_RM_4_LE10_REG_No_Ret_2.pdf
I fully understand that since the new code fixes the issue, you may not care, but this page still comes up in search results and maybe it might help someone in the future to know that there could be issues...
Thanks once again!
trevor
|
|
|
|
|