This topic shows how to parse a CSV file from VBA without using Power Query. The parser is able to correctly handle any file that follows the RFC-4180 specifications and also those that do not follow it completely.
Introduction
Many solutions have been developed to solve the problem of working with CSV files, a problem that has been solved using different programming languages: Python, JavaScript, Java, PHP, C#, Ruby, Go, C++, and others.
Some enthusiasts have worked the problem in VBA, an action that is considered by many as unsuccessful and unfortunate since Excel has tools such as Power Query that allow importing data into worksheets in a simple way and with great processing power.
However, using Power Query implies establishing a physical connection to each data source (CSV file). If we want to process the data from VBA, using data arrays to store the imported information, the Microsoft tool will not be of much help, since before storing the data in the array, we must create the connection, write the data in the worksheet, pass the information to the array, delete the connection and delete the data stored in the worksheet.
Background
The CSV files are a special kind of tabulated plain text data container widely used in data exchange. There is no globally accepted standard format for those kind of files, however, out there are well formed specifications such as RFC-4180 proposed by The Internet Society. If you need CSV's raw data directly to VBA, you must develop a procedure to accomplish the data exchange task between your project and CSV files.
Starting from the previous point, I decided to get involved in the development of a set of class modules that would allow its users to work with CSV files, ensuring stability and flexibility in the data processing stage. At the same time, I wanted to make sure to fill some gaps present in many of the open source CSV parsers:
- Allow comment lines
- Omit empty or blank lines
- Import fields by name or position in the CSV file's records
- Allow to change the data type of fields on user's request
- Method (Yaroslavskiy Dual-Pivot Quicksort) to sort imported data based on a field index
Question: Where can I learn more about the project?
Answer: VBA CSV interface is an open-source project available on GitHub.
Using the Code
Once you follow the installation instructions, you just need to insert a new "normal" module and type something like this:
Sub ImportSomeColumns()
Dim path As String
Dim conf As parserConfig, a As Variant
Set CSVint = New CSVinterface
Set conf = New parserConfig
With conf
conf.recordsDelimiter = vbCrLf
path = "C:\100000.quoted.csv"
conf.path = path
conf.dynamicTyping = False
End With
CSVint.ImportFromCSV conf, "Region", 2
a = CSVint(0, 0)
Set CSVint = Nothing
End Sub
The above code snippet will import all "Region
" fields and all the fields in the 2nd column from the CSV file. The data will be treated as Strings
because the dynamic typing behavior was not enabled.
If you want to change the data type of the files and sort the retrieved data, you will need code like this:
Sub DynamicTypingAndSort()
Dim path As String
Dim conf As parserConfig
Dim CSVstring As String
Set CSVint = New CSVinterface
Set conf = New parserConfig
path = "C:\100000.quoted.csv"
With conf
.recordsDelimiter = vbCrLf
.path = path
.dynamicTyping = True
.DefineTypingTemplate TypeConversion.ToDate, _
TypeConversion.ToLong, _
TypeConversion.ToDate, _
TypeConversion.ToLong, _
TypeConversion.ToDouble, _
TypeConversion.ToDouble, _
TypeConversion.ToDouble
.DefineTypingTemplateLinks 6, _
7, _
8, _
9, _
10, _
11, _
12
End With
CSVint.ImportFromCSV(conf).Sort SortColumn:=1, Descending:=True
Set CSVint = Nothing
End Sub
If you want to exploit all the capabilities of the VBA CSV interface, please refer to the article in this link. You can find there a large number of situations in which an user can be involved and the necessary code to solve the problem.
Points of Interest
- The CSV syntax slows down the performance. When the number of escaped fields are increased, the performance decreases.
- The Dynamic Typing causes more overheat than the
Sort
operation. This can be explained by the great performance of the Yaroslavskiy sorting algorithm used.
History
- 8th March, 2021: Initial version
- 13th July, 2021: A reference to an article explaining the CSV interface in more detail has been added.