Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C++

C++: Minimalistic CSV Streams

4.81/5 (78 votes)
10 Mar 2023MIT4 min read 179.9K   5K  
Read/write CSV in few lines of code!

Introduction

MiniCSV is a small, single header library which is based on C++ file streams and is comparatively easy to use. Without further ado, let us see some code in action.

Writing

We see an example of writing tab-separated values to file using csv::ofstream class. Now you can specify the escape string when calling set_delimiter in version 1.7.

C++
#include "minicsv.h"

struct Product
{
    Product() : name(""), qty(0), price(0.0f) {}
    Product(std::string name_, int qty_, float price_) 
        : name(name_), qty(qty_), price(price_) {}
    std::string name;
    int qty;
    float price;
};

int main()
{
    csv::ofstream os("products.txt");
    os.set_delimiter('\t', "##");
    if(os.is_open())
    {
        Product product("Shampoo", 200, 15.0f);
        os << product.name << product.qty << product.price << NEWLINE;
        Product product2("Soap", 300, 6.0f);
        os << product2.name << product2.qty << product2.price << NEWLINE;
    }
    os.flush();
    return 0;
}

NEWLINE is defined as '\n'. We cannot use std::endl here because csv::ofstream is not derived from the std::ofstream.

Reading

To read back the same file, csv::ifstream is used and std::cout is for displaying the read items on the console.

C++
#include "minicsv.h"
#include <iostream>

int main()
{
    csv::ifstream is("products.txt");
    is.set_delimiter('\t', "##");
    if(is.is_open())
    {
        Product temp;
        while(is.read_line())
        {
            is >> temp.name >> temp.qty >> temp.price;
            // display the read items
            std::cout << temp.name << "," << temp.qty << "," << temp.price << std::endl;
        }
    }
    return 0;
}

The output in console is as follows:

C++
Shampoo,200,15
Soap,300,6

Overloaded Stream Operators

String stream has been introduced in v1.6. Let me show you an example on how to overload string stream operators for the Product class. The concept is the same for file streams.

C++
#include "minicsv.h"
#include <iostream>

struct Product
{
    Product() : name(""), qty(0), price(0.0f) {}
    Product(std::string name_, int qty_, float price_) : name(name_), 
                               qty(qty_), price(price_) {}
    std::string name;
    int qty;
    float price;
};

template<>
inline csv::istringstream& operator >> (csv::istringstream& istm, Product& val)
{
    return istm >> val.name >> val.qty >> val.price;
}

template<>
inline csv::ostringstream& operator << (csv::ostringstream& ostm, const Product& val)
{
    return ostm << val.name << val.qty << val.price;
}

int main()
{
    // test string streams using overloaded stream operators for Product
    {
        csv::ostringstream os;
        os.set_delimiter(',', "$$");
        Product product("Shampoo", 200, 15.0f);
        os << product << NEWLINE;
        Product product2("Towel, Soap, Shower Foam", 300, 6.0f);
        os << product2 << NEWLINE;

        csv::istringstream is(os.get_text().c_str());
        is.set_delimiter(',', "$$");
        Product prod;
        while (is.read_line())
        {
            is >> prod;
            // display the read items
            std::cout << prod.name << "|" << prod.qty << "|" << prod.price << std::endl;
        }
    }
    return 0;
}

This is what is displayed on the console.

C++
Shampoo|200|15
Towel, Soap, Shower Foam|300|6

What if the type has private members? Create a member function that takes in the stream object.

C++
class Product
{
public:
    void read(csv::istringstream& istm)
    {
        istm >> this->name >> this->qty >> this->price;
    }
};

template<>
inline csv::istringstream& operator >> (csv::istringstream& istm, Product& prod)
{
    prod.read(istm);
    return istm;
}

Conclusion

MiniCSV is a small CSV library that is based on C++ file streams. Because delimiter can be changed on the fly, I have used this library to write file parser for MTL and Wavefront OBJ format in a relatively short time compared to handwritten with no library help. MiniCSV is now hosted at Github. Thank you for reading!

History

  • 2014-03-09: Initial release
  • 2014-08-20: Remove the use of smart ptr
  • 2015-03-23: 75% perf increase on writing by removing the flush on every line, fixed the lnk2005 error of multiple redefinition. read_line replace eof on ifstream.
  • 2015-09-22: v1.7: Escape/unescape and surround/trim quotes on text
  • 2015-09-24: Added overloaded stringstream operators example.
  • 2015-09-27: Stream operator overload for const char* in v1.7.2.
  • 2015-10-04: Fixed G++ and Clang++ compilation errors in v1.7.3.
  • 2015-10-20: Ignore delimiters within quotes during reading when enable_trim_quote_on_str is enabled in v1.7.6. Example: 10.0,"Bottle,Cup,Teaspoon",123.0 will be read as as 3 tokens : <10.0><Bottle,Cup,Teaspoon><123.0>
  • 2016-05-05: Now the quote inside your quoted string are escaped now. Default escape string is "&quot;" which can be changed through os.enable_surround_quote_on_str() and is.enable_trim_quote_on_str()
  • 2016-07-10: Version 1.7.9: Reading UTF-8 BOM
  • 2016-08-02: Version 1.7.10: Separator class for the stream, so that no need to call set_delimiter repeatedly if delimiter keep changing. See code example below:
    C++
    // demo sep class usage
    csv::istringstream is("vt:33,44,66");
    is.set_delimiter(',', "$$");
    csv::sep colon(':', "<colon>");
    csv::sep comma(',', "<comma>");
    while (is.read_line())
    {
        std::string type;
        int r = 0, b = 0, g = 0;
        is >> colon >> type >> comma >> r >> b >> g;
        // display the read items
        std::cout << type << "|" << r << "|" << b << "|" << g << std::endl;
    }
  • 2016-08-23: Version 1.7.11: Fixed num_of_delimiter function: do not count delimiter within quotes
  • 2016-08-26: Version 1.8.0: Added better error message for data conversion during reading. Before that, data conversion error with std::istringstream went undetected.

    Before change:
    C++
    template<typename T>
    csv::ifstream& operator >> (csv::ifstream& istm, T& val)
    {
        std::string str = istm.get_delimited_str();
        
    #ifdef USE_BOOST_LEXICAL_CAST
        val = boost::lexical_cast<T>(str);
    #else
        std::istringstream is(str);
        is >> val;
    #endif
    
        return istm;
    }

    After change:

    C++
    template<typename T>
    csv::ifstream& operator >> (csv::ifstream& istm, T& val)
    {
        std::string str = istm.get_delimited_str();
    
    #ifdef USE_BOOST_LEXICAL_CAST
        try 
        {
            val = boost::lexical_cast<T>(str);
        }
        catch (boost::bad_lexical_cast& e)
        {
            throw std::runtime_error(istm.error_line(str).c_str());
        }
    #else
        std::istringstream is(str);
        is >> val;
        if (!(bool)is)
        {
            throw std::runtime_error(istm.error_line(str).c_str());
        }
    #endif
    
        return istm;
    }

    Breaking changes: It means old user code to catch boost::bad_lexical_cast must be changed to catch std::runtime_error. Same for csv::istringstream. Beware std::istringstream is not as good as boost::lexical_cast at catching error. Example, "4a" gets converted to integer 4 without error.

    Example of the csv::ifstream error log as follows:

    C++
    csv::ifstream conversion error at line no.:2, 
    filename:products.txt, token position:3, token:aa

    Similar for csv::istringstream except there is no filename.

    C++
    csv::istringstream conversion error at line no.:2, token position:3, token:aa
  • 2017-01-08: Version 1.8.2 with better input stream performance. Run the benchmark to see (Note: Need to update the drive/folder location 1st).

    Benchmark results against version 1.8.0:

    C++
         mini_180::csv::ofstream:  348ms
         mini_180::csv::ifstream:  339ms <<< v1.8.0
             mini::csv::ofstream:  347ms
             mini::csv::ifstream:  308ms <<< v1.8.2
    mini_180::csv::ostringstream:  324ms
    mini_180::csv::istringstream:  332ms <<< v1.8.0
        mini::csv::ostringstream:  325ms
        mini::csv::istringstream:  301ms <<< v1.8.2
    
  • 2017-01-23: Version 1.8.3 add unit test and to allow 2 quotes escape 1 quote to be in line with CSV specification.
  • 2017-02-07: Version 1.8.3b add more unit tests and remove CPOL license file.
  • 2017-03-12: Version 1.8.4 fixed some char output problems and added NChar (char wrapper) class to write to numeric value [-127..128] to char variables.
    C++
    bool test_nchar(bool enable_quote)
    {
        csv::ostringstream os;
        os.set_delimiter(',', "$$");
        os.enable_surround_quote_on_str(enable_quote, '\"');
    
        os << "Wallet" << 56 << NEWLINE;
    
        csv::istringstream is(os.get_text().c_str());
        is.set_delimiter(',', "$$");
        is.enable_trim_quote_on_str(enable_quote, '\"');
    
        while (is.read_line())
        {
            try
            {
                std::string dest_name = "";
                char dest_char = 0;
    
                is >> dest_name >> csv::NChar(dest_char);
    
                std::cout << dest_name << ", " 
                    << (int)dest_char << std::endl;
            }
            catch (std::runtime_error& e)
            {
                std::cerr << __FUNCTION__ << e.what() << std::endl;
            }
        }
        return true;
    }

    Display Output:

    C++
    Wallet, 56
  • 2017-09-18: Version 1.8.5:

    If your escape parameter in set_delimiter() is empty, text with delimiter will be automatically enclosed in quotes (to be compliant with Microsoft Excel and general CSV practice).

    "Hello,World",600

    Microsoft Excel and MiniCSV read this as "Hello,World" and 600.

  • 2021-02-21: Version 1.8.5d: Fixed infinite loop in quote_unescape.
  • 2021-05-06: MiniCSV detects the end of line with the presence of newline. Newline in the string input inevitably breaks the parsing. New version 1.8.6 takes care of newline by escaping it.
  • 2023-03-11: v1.8.7 added set_precision(), reset_precision() and get_precision() to ostream_base for setting float/double/long double precision in the output.

FAQ

Why does the reader stream encounter errors for CSV with text not enclosed within quotes?

Answer: To resolve it, please remember to call enable_trim_quote_on_str with false.

Product that Makes Use of MiniCSV

Points of Interest

Recently, I encountered a interesting benchmark result of reading a 5MB file, up against a string_view CSV parser by Vincent La. You can see the effects of Short String Buffer (SSO).

Benchmark of every column is 12 chars in length

The length is within SSO limit (24 bytes) to avoid heap allocation.

C++
csv_parser timing:113ms
MiniCSV timing:71ms
CSV Stream timing:187ms

Benchmark of every column is 30 chars in length

The length is outside SSO limit, memory has to allocated on the heap! Now string_view csv_parser wins.

C++
csv_parser timing:147ms
MiniCSV timing:175ms
CSV Stream timing:434ms

Note: Through I am not sure why CSV Stream is so slow in VC++ 15.9 update.

Note: Benchmark could be different with other C++ compiler like G++ and Clang++ which I do not have access now.

Related Articles

License

This article, along with any associated source code and files, is licensed under The MIT License