Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / productivity / Office / MS-Excel

Parsing CSV Files Directly from VBA

5.00/5 (2 votes)
8 Mar 2021CPOL3 min read 12.5K  
VBA solution for CSV parsing problem
This topic shows how to parse a CSV file from VBA without using Power Query. The parser is able to correctly handle any file that follows the RFC-4180 specifications and also those that do not follow it completely.

Introduction

Many solutions have been developed to solve the problem of working with CSV files, a problem that has been solved using different programming languages: Python, JavaScript, Java, PHP, C#, Ruby, Go, C++, and others.

Some enthusiasts have worked the problem in VBA, an action that is considered by many as unsuccessful and unfortunate since Excel has tools such as Power Query that allow importing data into worksheets in a simple way and with great processing power.

However, using Power Query implies establishing a physical connection to each data source (CSV file). If we want to process the data from VBA, using data arrays to store the imported information, the Microsoft tool will not be of much help, since before storing the data in the array, we must create the connection, write the data in the worksheet, pass the information to the array, delete the connection and delete the data stored in the worksheet.

Background

The CSV files are a special kind of tabulated plain text data container widely used in data exchange. There is no globally accepted standard format for those kind of files, however, out there are well formed specifications such as RFC-4180 proposed by The Internet Society. If you need CSV's raw data directly to VBA, you must develop a procedure to accomplish the data exchange task between your project and CSV files.

Starting from the previous point, I decided to get involved in the development of a set of class modules that would allow its users to work with CSV files, ensuring stability and flexibility in the data processing stage. At the same time, I wanted to make sure to fill some gaps present in many of the open source CSV parsers:

  • Allow comment lines
  • Omit empty or blank lines
  • Import fields by name or position in the CSV file's records
  • Allow to change the data type of fields on user's request
  • Method (Yaroslavskiy Dual-Pivot Quicksort) to sort imported data based on a field index

Question: Where can I learn more about the project?

Answer: VBA CSV interface is an open-source project available on GitHub.

Using the Code

Once you follow the installation instructions, you just need to insert a new "normal" module and type something like this:

VB.NET
Sub ImportSomeColumns()
    Dim path As String
    Dim conf As parserConfig, a As Variant
    
    Set CSVint = New CSVinterface
    Set conf = New parserConfig
    With conf
        conf.recordsDelimiter = vbCrLf
        path = "C:\100000.quoted.csv"
        conf.path = path
        conf.dynamicTyping = False
    End With
    CSVint.ImportFromCSV conf, "Region", 2
    a = CSVint(0, 0)
    Set CSVint = Nothing
End Sub

The above code snippet will import all "Region" fields and all the fields in the 2nd column from the CSV file. The data will be treated as Strings because the dynamic typing behavior was not enabled.

If you want to change the data type of the files and sort the retrieved data, you will need code like this:

VB.NET
Sub DynamicTypingAndSort()
    Dim path As String
    Dim conf As parserConfig
    Dim CSVstring As String
    
    Set CSVint = New CSVinterface
    Set conf = New parserConfig
    path = "C:\100000.quoted.csv"
    With conf
        .recordsDelimiter = vbCrLf
        .path = path
        .dynamicTyping = True
        .DefineTypingTemplate TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble
        .DefineTypingTemplateLinks 6, _
                                7, _
                                8, _
                                9, _
                                10, _
                                11, _
                                12
    End With
    CSVint.ImportFromCSV(conf).Sort SortColumn:=1, Descending:=True
    Set CSVint = Nothing
End Sub

If you want to exploit all the capabilities of the VBA CSV interface, please refer to the article in this link. You can find there a large number of situations in which an user can be involved and the necessary code to solve the problem.

Points of Interest

  1. The CSV syntax slows down the performance. When the number of escaped fields are increased, the performance decreases.
  2. The Dynamic Typing causes more overheat than the Sort operation. This can be explained by the great performance of the Yaroslavskiy sorting algorithm used.

History

  • 8th March, 2021: Initial version
  • 13th July, 2021: A reference to an article explaining the CSV interface in more detail has been added.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)