Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

.NET Programming using HP Vertica

4.94/5 (7 votes)
30 Nov 2018CPOL5 min read 38.6K   307  
An article on .NET data layer programming using BigData HP Vertica platform

Introduction

In continuation to my CodeProject article that I wrote last year on Cassandra/.NET programming, this year's article is related to .NET programming with HP Vertica, which is another BigData platform.

This article covers three aspects in details. First on HP Big Data platform Vertica and its dimension. Secondly, .NET application layer leverages HP Vertica data storage. Finally, .NET code samples are used for CRUD (Create Read Update Delete) data operations.

Data Warehouse

A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. The term was coined by W. H. Inmon.

Applications of data warehouses include data mining, Web Mining, and decision support systems (DSS).

Traditional Data Warehouse

Traditionally, data warehouse is housed on an enterprise mainframe server. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access.

Traditional Data Warehouse

Latest Data Warehouse

In the latest industry trend, Data Warehouse is built not only with internal system but also from the related external systems for better clarity and granularity.

The world of business intelligence (BI) and data warehousing keeps evolving. Technologies such as analytical database servers, big data, NoSQL, mashups, SaaS BI, data virtualization servers, advanced analytics, and SOA, have become available and offer new possibilities to organisations.

Latest Data Warehouse

Comparison

As the result of detailed analysis, traditional warehouse and latest big data warehouse are compared as below:

Comparision between Traditional Data Warehouse and Big Data

HP Vertica

What Is It?

Vertica Systems is an analytic database management software company, founded in 2005, acquired by HP (Hewlett Packard) on Mar 2011.

The cluster-based, column-oriented, Vertica Analytics Platform is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications.

Concepts

HP Vertica is built on a foundation of four distinguishing characteristics (the four Cs):

  • Column storage: store data to optimize for access
  • Compression: store more data in less physical storage
  • Clustering: scale your database easily, whether you need three nodes or 300
  • Continuous performance: optimize the database on an ongoing basis, automatically

Why Vertica?

Pretty straight forward question as the traditional data storage programmer.

In modern business, actionable data is becoming more important than ever. The ability to manage data quickly and effectively is a huge competitive advantage - an advantage easily attained with the Vertica Analytics Platform. Vertica is a powerful, analytic database that allows users to extract value from massive amounts of data at lightning fast speeds. Big Data needs are achievable using the above 4C concepts.

Who Use It?

Some of the biggest names in data, including Facebook. Facebook selected the HP Vertica Analytics Platform as one component of its big data infrastructure. Reference Link.

Vertica External Tables

With the latest version of Vertica, it is now possible to use the full expressive power of Vertica Analytics Engine and its analytics without having to load the data into Vertica. It is depicted in the below diagram.

HP Vertica external tables

Microsoft Connectivity

HP Vertica allows you to integrate your HP Vertica server with an environment that includes Microsoft components previously installed on your system. The Connectivity Pack includes the ADO.NET client driver and additional tools for integration with Microsoft Visual Studio and Microsoft SQL Server.

HP Vertica's Microsoft Connectivity Pack allows you to integrate your HP Vertica server with an environment that includes Microsoft components previously installed on your system. The Connectivity Pack includes the ADO.NET client driver and additional tools for integration with Microsoft Visual Studio and Microsoft SQL Server.

Specifically, the Connectivity Pack is an enhanced .NET data provider package and provides integration with the following Microsoft components:

  • Business Intelligence Development Studio (BIDS) for Visual Studio 2008 for use with SQL Server 2012. BIDS is a client-based application used to develop business intelligence solutions and based on the Microsoft Visual Studio development environment; BIDS includes additional project types specific to SQL Server Business Intelligence. Developers use BIDS to develop business solutions.
  • SQL Server Data Tool - Business Intelligence (SSDT-BI) for Visual Studio 2010/2012 for use with SQL Server 2012. SSDT-BI replaces BIDS for Visual Studio 2010 and 2012. It serves the same purpose as BIDS, providing a development environment for developing business intelligence solutions.
  • SQL Server Analysis Services (SSAS) for SQL Server 2008 and 2012. Use SSAS for OLAP and data mining, using HP Vertica as source for cube creation.
  • SQL Server Integration Services (SSIS) for SQL Server 2008 and 2012, which provides SQL Server Type Mappings to map data types between HP Vertica and SQL Server. Use SSIS for data migration, data integration and workflow, and ETL.
  • SQL Server Reporting Services (SSRS). Use SSRS to generate reports.

Connectivity between Microsoft and HP Vertica

Vertica .NET Integration

Infra Architecture

In Vertica .NET integration effort, Infra architecture is designed as below:

App Infra Model

App Server layer is added with Vertica ODBC connectivity component to establish the communication between .NET App and HP Vertica data layer.

Data layer contains cluster-based HP Vertica DB. Presentation layer stays as it is.

Logical Architecture

As per the industry standard n-tier architecture, HP Vertica .NET integrated Application logical architecture goes as below:

App Logical Model

Connectivity Code

Programmers should use connection properties to configure the connection between ADO.NET client application and HP Vertica database.

You can set a connection property in two ways:

  1. Include the property name and value as part of the connection string you pass to a VerticaConnection.
  2. Set the properties in a VerticaConnectionStringBuilder object, and then pass the object as a string to a VerticaConnection.
C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;

namespace VerticaApp
{
    class Connectivity
    {
        static void Main(string[] args)
        {
            /*
             * HP Vertica ADO.NET drivers connection string builder
             */
            VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();

            /*
             * In the connection string, you need to specify the Host, 
             * Database, and User at the minimum
             */
            builder.Host = "192.10.10.10";
            builder.Database = "TestDB";
            builder.User = "AppUser";

            /*
             * Build an HP Vertica connection object by the connection strings
             * Then, initiate the connection by Open method
             */
            VerticaConnection conn = new VerticaConnection(builder.ToString());
            conn.Open();

            /*
             * Perform some data operations here..
             * Finally, connection object is closed
             */
            conn.Close();
        }
    }
}

CRUD Operation

CRUD operation is the best use case to explain the functionalities between Application and Data Storage layer. In the given sample, the working code (except delete) has been attached in details.

Create Code

You can load data in batches using a prepared statement with parameters. You can also use transactions to rollback the batch load if any errors are encountered.

If you are loading large batches of data (more than 100MB), then consider using a direct batch insert.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;

namespace VerticaApp
{
    class Create
    {
        static void Main(string[] args)
        {
            /*
             * HP Vertica ADO.NET drivers connection string builder
             */
            VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();

            /*
             * In the connection string, you need to specify the Host, 
             * Database, and User at the minimum
             */
            builder.Host = "192.10.10.10";
            builder.Database = "TestDB";
            builder.User = "AppUser";

            /*
             * Build an HP Vertica connection object by the connection strings
             * Then, initiate the connection by Open method
             */
            VerticaConnection conn = new VerticaConnection(builder.ToString());
            conn.Open();

            /*
             * Vertica INSERT operation into the existing Employee table with ID, 
             * First, Last, Mail, Department fields
             */
            VerticaCommand command = conn.CreateCommand();
            command.CommandText =
                  "INSERT into Employee values(1001, 'FirstName', 'LastName', 'eMail', 'Dept7')";
            Int32 rowsAdd = command.ExecuteNonQuery();
            Console.WriteLine(rowsAdd + " rows added!");

            /*
             * Finally, connection object is closed
             */
            conn.Close();
        }
    }
}

Read Code

The attached code snap is to retrieve the content from the existing vertica table into .NET application layer.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;

namespace VerticaApp
{
    class Read
    {
        static void Main(string[] args)
        {
            /*
             * HP Vertica ADO.NET drivers connection string builder
             */
            VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();

            /*
             * In the connection string, you need to specify the Host, 
             * Database, and User at the minimum
             */
            builder.Host = "192.10.10.10";
            builder.Database = "TestDB";
            builder.User = "AppUser";

            /*
             * Build an HP Vertica connection object by the connection strings
             * Then, initiate the connection by Open method
             */
            VerticaConnection conn = new VerticaConnection(builder.ToString());
            conn.Open();

            /*
             * Vertica SELECT operation from the existing 2 relational tables
             * Employee and Department with DeptID field as reference
             */
            VerticaCommand command = conn.CreateCommand();
            command.CommandText =
                  "SELECT Mast.EmpID, Mast.FirstName, Mast.LastName, Mast.Mail, Ref.DeptName " +
                  "   FROM Employee Mast, Department Ref " +
                  "   WHERE Mast.DeptID == Ref.DeptID " +
                  "   ORDER BY Mast.EmpID " +
                  "   LIMIT 100 ";
            /*
             * To read data from the database use VerticaDataReader, 
             * an implementation of DbDataReader
             */
            VerticaDataReader reader = command.ExecuteReader();
            Console.WriteLine(" EmployeeName\t  Department");
            Console.WriteLine("------------\t  -------------------");
            
            /*
             * Data Read query is parsed to navigate the final result
             */
            int rows = 0;
            if (VerticaDataReader.HasRows())
            {
                while (reader.Read())
                {
                    Console.WriteLine("     " + reader[1] + reader[2] + "    \t  " + reader[4]);
                    ++rows;
                }
            }
            Console.WriteLine("Total Read Rows: (" + rows + ")");
            reader.Close();

            /*
             * Finally, connection object is closed
             */
            conn.Close();
        }
    }
}

Update Code

Update scenario is to alter the content of the existing Vertica table elements for the given condition. It is derived by Vertica Client library as given.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Vertica.Data.VerticaClient;

namespace VerticaApp
{
    class Update
    {
        static void Main(string[] args)
        {
            /*
             * HP Vertica ADO.NET drivers connection string builder
             */
            VerticaConnectionStringBuilder builder = new VerticaConnectionStringBuilder();

            /*
             * In the connection string, you need to specify the Host, 
             * Database, and User at the minimum
             */
            builder.Host = "192.10.10.10";
            builder.Database = "TestDB";
            builder.User = "AppUser";

            /*
             * Build an HP Vertica connection object by the connection strings
             * Then, initiate the connection by Open method
             */
            VerticaConnection conn = new VerticaConnection(builder.ToString());
            conn.Open();

            /*
             * Vertica UPDATE operation in the existing table Employee
             * Mail info is updated for the given FirstName and LastName
             * Updation driving fields are parameterized
             */
            VerticaCommand command = new VerticaCommand();
            command.Text =
                  "UPDATE Employee " +
                  "   SET Mail = " + "test@test.com" +
                  "   WHERE FirstName = @First AND " +
                  "   LastName = @Last ";
            command.Parameters.Add(new VerticaParameter( "First", VerticaType.VarChar));
            command.Parameters.Add(new VerticaParameter( "Last", VerticaType.VarChar));
            command.Parameters["First"] = 'Ganesan';
            command.Parameters["Last"] = 'Senthilvel';

            Int32 rowsUpdate = command.ExecuteNonQuery();


            /*
             * Finally, connection object is closed
             */
            conn.Close();
        }
    }
}

Points of Interest

It is interesting to know the HP Vertica big data and .NET programming concepts along with CRUD code level implementation.

History

  • Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)