Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

Query Excel or CSV files with T-SQL

4.70/5 (4 votes)
19 Jun 2024CPOL5 min read 15K   215  
Query Excel or CSV files with T-SQL without importing them in a table first
This article is a tutorial on how to query Excel or CSV files through T-SQL without first importing them into a table.

Introduction

Sometimes, it is useful to query Excel or CSV files using SQL statements without having to import them into a table through BulkInsert. This article shows in detail how and it uses Microsoft Access Database Engine which should be working on all editions of SQL Server since 2005.

SQL Server Setup (must be done once)

  • Enable Ad Hoc Distributed Queries in SSMS:
    SQL
    sp_configure 'show advanced options',1
    reconfigure
    GO
    sp_configure 'Ad Hoc Distributed Queries',1
    reconfigure
    GO
    • Install Microsoft Access Database Engine Redistributable on server, the latest version I was able to find on Microsoft website is 2016 (click here)
    • Enable InProcess and DynamicParameters for Microsoft Access Database Engine by entering in SSMS:
    SQL
    USE [master]
    GO
    EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.16.0', N'AllowInProcess', 1
    GO
    EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.16.0', N'DynamicParameters', 1 
    GO
  • Restart SQL Server, after reconnecting with SSMS, you should see Microsoft..ACE.OLEDB.16.0 under Server Objects / Linked Servers / Providers.

    Image 1

Using the Code

In order to query an Excel or CSV file, you must first copy it to the SQL Server in a folder which is accessible with READ permission to the user account under which the SQL Server instance is running (you can know which account is by running services.msc, right click on SQL Server (MSSQLSERVER) -> Properties -> Log on tab -> This account). If you are having issues, just for testing, you can copy it to a folder (e.g., c:\temp) and add in the security tab (right click on folder -> Properties --> Security tab) the EVERYONE user with READ access.

  • How to query a CSV file (comma separated): You can set the HDR parameter to YES if first row of CSV file contains the column names.
    SQL
    SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.16.0',_
       'Text;Database=<path to file.csv>;HDR=YES','SELECT * FROM <file.csv>')
  • How to query a CSV file (with a different separator than comma): In this case, you have to create a schema.ini inside the same folder containing the CSV file with this content in order to specify the separator (you can specify different separators for different files, just repeat the 4 lines structure):
    [<file.csv>]
    ColNameHeader=True
    CharacterSet=ANSI
    Format=Delimited(<separator char>)

    The statement to query the CSV is the same as above, e.g.:

    SQL
    SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.16.0',_
      'Text;Database=<only directories to file.csv>;HDR=YES','SELECT * FROM <file.csv>')
  • How to query an Excel file (.xlsx or .xls): You have to use a different statement with the sheet name you want to query (usually, it's Sheet1$ but pay attention that Excel is localized so it changes for different languages).
    SQL
    SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.16.0','Excel 12.0; _
             Database=<full path to excel.xlsx>', [<sheet name>$])
    Image 2

Troubleshooting

You could be experiencing some strange errors, usually most of them are about security issues. In order to troubleshoot them, we must understand that Microsoft.ACE.OLEDB.16.0 is a COM component. COM components are a core part of Windows operating system and they have been existing since its first editions (from Windows 95). Basically a COM component is a server object which exposes one or more functionalities through a set of interfaces accessible to every programming language (so to programs written in a different language than the one used to develop it, usually C/ C++. You may think that this feature nowadays is quite common but in the 90s, there was little interoperability). COM components are always registered in the Windows registry under Computer\HKEY_CLASSES_ROOT\CLSID when they are installed for the first time and have a CLSID (a guid) and a PROGID (a meaningful name, e.g., the Microsoft.ACE.OLEDB.16.0 above) for identifying and instancing them. A COM server can be "hosted" in three ways:

  • In-Process: The COM server is usually a DLL which gets dynamically loaded and runs in the security context of the calling program. So when we configure the above In-Process option in SQL Server, the Microsoft.ACE.OLEDB.16.0 DLL gets loaded from SQL Server and runs
    • Under the user account of SQL Server, if you have connected to the server with SQL Server authentication
    • Under the user currently executing the above queries, if you have connected with Windows authentication.
  • Out-Of-Process: The COM server is usually an external executable which is run automatically whenever you need to access the interfaces provided by the server. There is also the option to allow an In-Process DLL to run as an Out-Of-Process by using a standard system-provided surrogate EXE which hosts it (have you ever seen dllhost.exe in Task Manager :-) ?). The Out-Of-Process hosting can be done either on local computer or on a remote computer, in this case, we are talking about DCOM - Distributed COM.

DCOM security permissions is quite complex and granular, in fact, we have:

  • Launch Permissions: The users which are allowed to start FOR THE FIRST TIME the COM component.
  • Access Permissions: The users which are allowed to access an already launched COM component (e.g., to use the method provided in its interfaces). So you may be allowed to access a COM component, but not to launch it (in this case, someone else has to launch it for you before you can access it).
  • Configuration Permissions: The users which are allowed to change the DCOM configuration by using the command line tool dcomcnfg.exe.
  • Identity (on the tab with the same name): which specifies the user security context under which the COM component is run, it can be either the currently logged on (the interactive user), the user account of the client process that launched the server, a specified user, or a service.

COM components security is configured by the running the commandline dcomcnfg.exe, if you go to Component Services \ Computers\ My Computer \ DCOM Config you should see all registered components, the one responsible for instantiating all the OLEDB Linked Servers Providers is MSDAINITIALIZE.

Image 3

If you run the Microsoft.ACE.OLEDB.16.0 with In-Process hosting, you should be able to solve most of the security issues by launching SSMS as an administrator. If you instead prefer (or need) to host it with Out-Of-Process hosting, you can check this Microsoft Tech Community article which should explain how to solve most of the issues by tweaking DCOM configuration. Also, this post on StackOverflow has some good information on how to solve the most common issues.

Points of Interest

I attach the sample Excel / CSV / schema.ini files I have used to test the above code. You can find the source code on my GitHub repository TSqlResources.

History

  • V1.0 (22nd October, 2023)
    • Initial version
  • V1.1 (3rd November, 2023)
    • Added troubleshooting section with a brief explanation of COM

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)