Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / parsing

Querying CSV in a Like SQL Way from VBA [Microsoft Excel and Access]

4.60/5 (3 votes)
17 Mar 2021CPOL3 min read 7.5K  
Fully VBA “Power Query” alternative for work with CSV files
Although Office has all the necessary tools to perform a number of tasks, amateur developers and experienced programmers are tempted to automate each of the processes that are executed on a daily basis. CSV file queries are no exception to this rule, and automating them is a problem that involves many solutions that focus on spreadsheets and a few others that focus on opening a path between VBA and CSV files without the need for any other intermediary.

Introductory Words

The capability and flexibility offered by the application ecosystem integrated in Microsoft Office is known by almost all its users, also by the absolute majority of amateur and full-time programmers. This is not surprising, the Office package includes VBA, which is a subset of functionality that was present in the popular Visual Basic programming language until its version 6 (VB6).

One of the most exciting features, which in turn can be considered as the most powerful tool in the Office package, is the ability to work with external data sources with exceptional fluidity. However, do you know the most efficient way to query CSV files and import the result directly into RAM?

Background

The number of tools that allow working with CSV files is growing every day, most of them adhere to the specifications provided in RFC-4180. Others add new features, making these applications more flexible.

In particular, most of the tools designed for the purpose specified above load the entire contents of the CSV into memory, from where it is parsed and loaded into an array or a collection.

The previous solution offers the advantage of being fast, but we must always keep in mind that memory resources are limited when working in VBA. This is why it is imperative to perform queries on the files and load the most relevant information contained in them into memory.

Until the time of writing this article, there was no tool capable of emulating, from VBA and without using objects inherent to a specific application (Excel sheets or Access tables), the complicated queries that Power Query allows to execute on CSV files without knowing the SQL language.

Study Case

We have a dataset, CSV, containing the sales records of the last 10 years (100k records) of a company and we want to perform a query on the file and select the records that meet the following requirements:

  1. Purchases made from Central America and the Caribbean
  2. Online purchases
  3. Items with profit margin above the median (789165.52)
  4. Query results sorted in descending order

The problem will be addressed using VBA CSV interface, a set of open source VBA class modules.

Solution

Achieving the task can be tricky when using the majority of the VBA CSV parsers, but with the selected one, the code look so simple. The first thing we need to do is convert the data fields to the proper VBA data type.

VB.NET
.DefineTypingTemplate TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble TypeConversion.ToDouble
.DefineTypingTemplateLinks 6, _
                                7, _
                                8, _
                                9, _
                                10, _
                                11, _
                                12

The first method of the above code snippet creates a template for type conversion. The second command links the template with the specified fields indexes. As a result, the snippet will convert the 6th and 8th field to Date, the 7th and 9th field to Long and the 10th, 11th and 12th fields to Double.

For some reason, the Power Query editor throws an error when trying to change fields type from Text to Date as shown in this image.

Now, we can implement some logic to choose how our CSV data is retrieved.

VB.NET
Public Sub Excel_VBA_Query_Over_CSV()
    Dim path As String
    Dim conf As parserConfig
    Dim CSVrecord As ECPArrayList
    Dim CSVrecords As ECPArrayList
    
    Set CSVint = New CSVinterface
    Set conf = CSVint.parseConfig
    Set CSVrecords = New ECPArrayList
    path = "C:\Demo_100k_records.csv"
    With conf
        .recordsDelimiter = vbCr
        .path = path
        .dynamicTyping = True
        .headers = True
        .DefineTypingTemplate TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble
        .DefineTypingTemplateLinks 6, _
                                7, _
                                8, _
                                9, _
                                10, _
                                11, _
                                12
    End With
    CSVint.OpenSeqReader conf 'Open a Sequential Reader
    Set CSVrecord = CSVint.GetRecord 'Get CSV record
    If conf.headers Then
        If Not CSVrecord Is Nothing Then
            CSVrecords.Add CSVrecord(0) 'Save the CSV header
        End If
    End If
    Do While Not CSVrecord Is Nothing 'Loop
        If CSVrecord(0)(0) = "Central America and the Caribbean" Then
            If CSVrecord(0)(3) = "Online" Then
                If CSVrecord(0)(11) > 789165.52 Then
                    CSVrecords.Add CSVrecord(0) 'Append data
                End If
            End If
        End If
        Set CSVrecord = CSVint.GetRecord 'Load next CSV record
    Loop
    CSVrecords.Sort 2, SortColumn:=6, Descending:=True
    CSVint.DumpToSheet DataSource:=CSVrecords
    Set CSVint = Nothing
    Set CSVrecords = Nothing
End Sub

If the host application is MS Access, the problem is solved as shown below:

VB.NET
Public Sub Access_VBA_Query_Over_CSV()
    Dim path As String
    Dim conf As parserConfig
    Dim dBase As DAO.Database
    Dim CSVrecord As ECPArrayList
    Dim CSVrecords As ECPArrayList
    
    Set CSVint = New CSVinterface
    Set conf = CSVint.parseConfig
    Set CSVrecords = New ECPArrayList
    With conf
        .recordsDelimiter = vbCr
        path = "C:\Demo_100k_records.csv"
        .path = path
        .dynamicTyping = True
        .headers = True
        .DefineTypingTemplate TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDate, _
                                TypeConversion.ToLong, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble, _
                                TypeConversion.ToDouble
        .DefineTypingTemplateLinks 6, _
                                7, _
                                8, _
                                9, _
                                10, _
                                11, _
                                12
    End With
    CSVint.OpenSeqReader conf 'Open a Sequential Reader
    Set CSVrecord = CSVint.GetRecord 'Get CSV record
    If conf.headers Then
        If Not CSVrecord Is Nothing Then
            CSVrecords.Add CSVrecord(0) 'Save the CSV header
        End If
    End If
    Do While Not CSVrecord Is Nothing 'Loop
        If CSVrecord(0)(0) = "Central America and the Caribbean" Then
            If CSVrecord(0)(3) = "Online" Then
                If CSVrecord(0)(11) > 789165.52 Then
                    CSVrecords.Add CSVrecord(0) 'Append data
                End If
            End If
        End If
        Set CSVrecord = CSVint.GetRecord 'Load next CSV record
    Loop
    Set CSVint.items = CSVrecords
    Set dBase = CurrentDb
    CSVint.Sort(SortColumn:=6, Descending:=True).DumpToAccessTable dBase, "Demo_100k"
    Set CSVint = Nothing
    Set dBase = Nothing
    Set CSVrecords = Nothing
End Sub

The Access_VBA_Query_Over_CSV procedure will create a table named Demo_100k in the current database. The created table has no CSV field indexes.

Points of Interest

Although with VBA CSV interface we can sequentially access the records stored in a CSV file, the cost paid for doing so is the reduced performance of our query. However, it is preferable to load only the required portion of information into memory rather than store all the records in RAM.

History

  • 17th March, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)