Introduction
After struggling for 2 days, finally I figured out how to connect .NET code to Hadoop using Hadoop Keytab file. I was unable to find an article or solution on Google that could help me accomplish this. As this code is purely my inception, please let me know your suggestions to improve it better.
A keytab is a file containing pairs of Kerberos principals and an encrypted copy of that principal's key. A keytab file for a Hadoop daemon is unique to each host since the principal names include the hostname. This file is used to authenticate a principal on a host to Kerberos without human interaction or storing a password in a plain text file (source).
Hadoop Configuration File
The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and admin servers for the Kerberos realms of interest, defaults for the current realm and for Kerberos applications, and mappings of hostnames onto Kerberos realms. (source)
How It Works
The scenario I am explaining here is about connecting .NET C# application to Kerberos authenticated Hadoop Server. This article is just about the Connection part so I would not be explaining Hadoop concepts. In order to make a successful connection, the following steps are involved:
- Setup Hadoop Server Configuration information
- Generate Kerberos authentication ticket based on the Hadoop Keytab file
- Create an ODBC connection to Hadoop Server
Pre-Requisites
The following pre-requisites should be installed before connection is established between .NET and Hadoop:
- Install MIT Kerberos for Windows from this link
- Install Microsoft Hadoop ODBC driver from this link
You need to have the following information:
- Hadoop Configuration file (krb5.ini)
- Keytab file (HDDev.keytab)
- Hadoop Server Host name
- Hadoop Server Port address (default is 10000)
- Hadoop hostFQDN
- Hadoop Service Name
- Hadoop Principal account (explained below)
Detail
After installing the MIT Kerberos, copy the Hadoop Configuration file (krb5.ini) to location C:\ProgramData\MIT\Kerberos5 (change the path depending on your installation location).
Copy the Keytab to whichever location you want as per your convenience. In my demo, I have copied it to projects Bin\Debug and Bin\Release folders.
After installation of MIT Kerberos software, you would be able to generate Kerberos ticket using the kinit
command. The syntax for using the kinit
command is:
kinit -k -t HDDev.keytab
hadoopDevPrincipal@HDP.DEV
In this syntax, HDDev.keytab is the keytab file. You can also specify the full path of the file if you want in the command syntax. Example: kinit -k -t “d:\test\HDDev.keytab”
hadoopDevPrincipal@HDP.DEV
For connecting to Hadoop, Kerberos principals are required. It reads the authentication information saved in keytab file with appropriate permission. In my demo, I have used it as hadoopDevPrincipal@HDP.DEV which is obviously fake for demonstration purposes but it will give you an idea about the format of Kerberos principal account.
You can further add switches in command to configure the Kerberos ticket expiry, etc. For more documentation regarding the kinit
command, please refer to this link.
Now, we are ready to jump into the code and make a connection.
Step 1
Execute the kinit
command, providing the Keytab file and principal account, to generate the Kerberos ticket.
string.Format("-k -t \"{0}\\{1}\" {2}",
Environment.CurrentDirectory,
ConfigurationManager.AppSettings["keyTabFileName"],
ConfigurationManager.AppSettings["principal"]);
ProcessStartInfo psi = new ProcessStartInfo("kinit")
{
UseShellExecute = true,
RedirectStandardOutput = false,
RedirectStandardInput = false,
RedirectStandardError = false,
CreateNoWindow = true,
WindowStyle = ProcessWindowStyle.Hidden,
Arguments = path
};
Process process = Process.Start(psi);
Step 2
Create ODBC connection to Hadoop using Hadoop server information:
OdbcConnection conn = new OdbcConnection(
string.Format(@"DRIVER={{Microsoft Hive ODBC Driver}};
Host={0};
Port={1};
Schema={2};
HiveServerType=2;
AuthMech=1;
KrbHostFQDN={3};
KrbServiceName={4};"));
conn.Open();
AuthMech=1 specifies the Kerberos Authentication Mode
After this Hadoop queries can be fired normally as we do for SQL Server or Oracle:
OdbcCommand cmd = new OdbcCommand("select * from Schema_Name.Table_Name;", conn);
Using the Code
Download the Hadoop Connector.zip. Please replace the AppSettings
in Web.Config with your Hadoop settings:
<appSettings>
<add key="host" value="hostname" />
<add key="port" value="10000" />
<add key="schema" value="Schema_Name" />
<add key="hostFQDN" value="hostname.domain.com" />
<add key="serviceName" value="Service_Name" />
<add key="principal" value="hadoopDevPrincipal@HDP.DEV" />
<add key="kerberosAquireTicketCommand"
value="kinit -k -t HDDev.keytab hadoopDevPrincipal@HDP.DEV" />
<add key="keyTabFileName" value="HDDev.keytab" />
</appSettings>
In the code at line number 19, change Environment.CurrentDirectory
to relevant file path.
In the code at line number 67, replace the query statement with your relevant Hadoop query.
Summary
As stated earlier, this is just my inception, please do provide your suggestions or optimizations that can be implemented.