If you use integrated security for connecting to a SQL server, you may run into Kerberos problems. In this article, I provide a background to help you understand how it all works and what you can do when troubleshooting.
Introduction
If I'm being honest, I was in two minds about writing this article. This site is predominantly aimed at developers. And this article is not really about coding. It's about Kerberos, Service Principal Names (SPNs), SQL Server and a bit of AD. So not really your typical CodeProject article.
However, pretty much any developer who has written any client that connects to a SQL Server, has come in contact with the security aspect of SQL Server. And in modern applications, this means Integrated Security. And if there is a misconfiguration somewhere, connection security fails. When that happens, it can be very hard to diagnose, especially if you don't have a good understanding of the underlying principles.
This article aims to explain the basics of how the connection security is set up between client and server because that is an important consideration for you as a developer. That way, if you develop and deploy a client application with a SQL backend and get the dreaded 'The SSPI context cannot be generated' you will be have enough knowledge to investigate the problem in a structural manner instead of frantically Googling the error and getting a gazillion different solutions that do not work for your problem.
To be clear, the process of implementing database security, granting table permissions, execute permissions and so on is NOT in scope for this article. That falls under system administration.
Background
In modern client - server systems such as SQL Server, there are two main ways to implement security. The first is by using application specific logins and letting the application perform authentication based on that. For SQL server, these are the old-style SQL logins that are created and maintained in the database instance itself. In modern installations, this is disabled by default because there are many problems with such accounts.
- They are not centrally managed. Even instance has their own collection of accounts and passwords. Furthermore, these are not automatically invalidated when a user leaves the site.
- They are not usable for code that may require external resources, meaning if a SQL query uses a stored procedure that e.g., interacts with a file or other resource, there is no good way to manage that because SQL logins are not OS logins.
- Because of their nature, they are often stored in connection strings, settings files, or other such things, meaning they are much more vulnerable than OS logins or Active Directory logins.
Other than for supporting legacy applications, it is best if you leave these logins disabled. If you do need them, that is called 'mixed mode security'.
Windows Integrated Security is much better because it follows the existing user management paradigms of users being in groups, and groups being given permissions to resources. And individual users will be subject to the standard corporate access control procedures.
There are two protocols used by Windows for the purpose of securing a channel between a client and a server: NTLM and Kerberos.
NTLM vs Kerberos
NTLM is the old protocol for security in domain environments. It's convenient, but less secure than Kerberos. Kerberos otoh is much more secure, but can be a pain to configure because it requires you to define with a great deal of precision who can do what on whose behalf.
To paint an analogy, NTLM is the security equivalent of the personal secretary of the CEO being able to call HR on the CEO's behalf to give themselves a hefty pay increase and HR being satisfied with 'The boss told me to tell you ...'.
With Kerberos, the secretary would have to prove she recently got the order from the CEO, and HR would be able to verify both the order and whether the secretary has officially been granted the permission to be delegated that specific type of order.
Which of these protocols is used depends on system configuration. NTLM can be enabled / disabled at various levels in the domain, either via GPO, Local Security Policy, or registry manipulation. If you look at the reference for Security Options GPO, you can see all possible NTLM related settings.
Now, assuming that both NTLM and Kerberos are allowed, SQL Server will still use Kerberos for integrated security connections when it can. If it can't, then it will fall back to NTLM. In order for SQL Server to use Kerberos, the correct Service Principal Names need to be configured in Active Directory.
Checking Which Authentication Mechanism is Used
Verifying which method is used is simple, because when you connect, that information can be queried. For testing, I use the SqlServer PowerShell module which is available on Github and is the standard way for connecting to a SQL Server.
The query for figuring out the connection type is this:
SELECT auth_scheme FROM sys.dm_exec_connections WHERE session_id=@@SPID
So in PowerShell, with the SqlServer
module, your script will look like this:
ipmo SqlServer
Invoke-Sqlcmd -Query "SELECT auth_scheme _
FROM sys.dm_exec_connections WHERE session_id=@@SPID" -ServerInstance <INSTANCE_NAME>
which will result in something like this:
Service Principal Names
Explaining the details of Kerberos is beyond the scope of this article. If you want to know more about Kerberos, I recommend this book.
All details aside, Kerberos is built around the concept that both parties in a communication have a way to securely verify each other's identity. This means that a client needs to be able to know the identity of the service it is connecting to. The service itself may be running as 'local system' (with the identity of the computer object) or a general user account or a Managed Service account.
A Service Principal Name (SPN) is used to associate that service with an Active Directory object so that the client knows whom it is talking to. Without SPN, a client would be unable to figure that out because it is essentially just connecting to a socket without knowing anything about the identity of who is listening on that socket. By querying Active Directory, the client can learn whom it is supposed to talk to, and set up security appropriately.
The configuration of these SPNs is done in the servicePrincipalName
attribute of the AD object whose identity it is using. In this case, you can see that the Microsoft SQL Server MSSQLSvc
on server SE-WINDEV021
is using the identity of the computer itself on the network.
You can also verify this with the SetSPN
command:
As a result, the communication channel security will be established between the identity of the client and the identity of the computer object. If the SQL Server was configured to run with the identity of a regular user account such as (hypothetically) SQLSVC01
, then the SPN MSSQLSv/SE-WINDEV01...
would be listed under the servicePrincipalName
attribute of the SQLSVC01
user account.
The exact format for the SPN is as follows (from Microsoft):
MSSQLSvc/fqdn | The provider-generated, default SPN for a default instance when a protocol other than TCP is used.
fqdn is a fully-qualified domain name. |
MSSQLSvc/fqdn:port | The provider-generated, default SPN when TCP is used.
port is a TCP port number. |
MSSQLSvc/fqdn:InstanceName | The provider-generated, default SPN for a named instance when a protocol other than TCP is used.
InstanceName is a SQL Server instance name. |
Now that you know where the SPNs should be located and what they should look like, the next question is: who creates them?
Automatic Creation of the SPN
In some cases, they can be created automatically by the service itself. For example, when SQL Server starts, as part of the startup sequence, it will try to register the SPNs for you. The documentation mentions that this is done if the service account is LocalSystem
or NetworkService
, or has the correct permission explicitly granted. This is not correct, and quite confusing.
Registering an SPN requires the permission to modify the servicePrincipalName
attribute. Since it concerns the attribute of the object whose identity is used by the service, the object security needs to have a special permission for the SELF
principal. This setting is configured by default in a standard Directory.
The thing to note is that the access granted is for a validated write to the servicePrincipalName
attribute. According to Microsoft, validated in this context means " to require that the system perform value checking, or validation, beyond that which is required by the schema, before writing a value to an attribute on a DS object. This ensures that the value entered for the attribute conforms to required semantics, is within a legal range of values, or undergoes some other special checking that would not be performed for a simple low-level write to the attribute."
This makes some sense because an object can add an SPN only if it has a valid syntax. However, the problem is that the syntax is defined as "an SPN attribute which is compliant with the DNS host name of the computer to be set.". Remember that unlike other services, SQL supports multiple instances, and this is expressed in the syntax of the SPN
.
The default instance of SQL has an SPN of the form MSSQLSv/SE-WINDEV01.contoso.com
. A named instance with instance name SQLSANDBOX
has an SPN of the form MSSQLSv/SE-WINDEV01.contoso.com:SQLSANDBOX
.
This format violates the DNS hostname requirement of the validated write. As a result, and contrary to the documentation, the SPN records will only be created automatically for the default instance because it has no instance name attached. As a result, named instance security falls back to NTLM even when using a service account that should register itself according to the documentation.
You can fix this by allowing SELF
the Write servicePrincipalName
permission.
If you do this, SQL server will start and be able to register the SPNs successfully.
And if you then do a connection test, this is what you get:
Manually Configuring the SPN
In scenarios where the security configuration for enabling SELF
registration is not allowed or otherwise not an option, anyone with administrative permissions on the AD object of the service identity can do this manually via the SetSPN
utility.
There is a warning against registering a duplicate SPN so I manually removed them in Active Directory and then added them using SetSPN
.
Considerations for Choosing Between Auto and Manual Configuration
Deciding which is the best approach depends on corporate policy, and the way the infrastructure is administered.
Allowing SELF
to update the servicePrincipalName
attribute is certainly the easiest approach in terms of making sure the registration is correct. However, it does require modification of AD object security and many admins are wary of such things for various reasons (the AD configuration itself may be subject to cumbersome change control, may require a lengthy testing process or documentation updates, the admins may not be allowed by corporate to make changes, or they may simply be paranoid).
Plus if we're being honest, adding a SELF
permission needs to be done either directly on the object itself, or on the container OU
it is in. Given that computers and users are not going to be in the same OU
, and you don't want ALL principals to be able to register SPNs, this can be tedious. Sadly, creating a group for setting this permission is not possible because if you do, then all objects inside the group could set each other's SPNs.
And allowing a principal the permission to set its own SPNs can be a dangerous thing, because that principal then has the ability to perform a Denial of Service (DoS) attack against the Active Directory by registering bunches of erroneous SPNs.
It is for these reasons that most environments choose for manual registration. However, if you do that, then it's recommended to fix the port number of the SQL Server service so that the port number of the registered SPN matches the port number.
By default, any named instance of a SQL Server uses dynamic ports. The documentation states that in order to use dynamic ports, you should set dynamic port to 0. However if you check, then you will see this:
What happens is that when the instance is created, SQL takes a random high port number as dynamic port. You'll notice that it will reuse this port across service restarts and server reboots. So why is it called dynamic port then, you may ask. The sneaky thing is that it will try to always reuse this port, but if it should happen to be in use, it will simply pick a new one and start using that. This is one of those things that happen so rarely that you're lulled into trusting it will always be the same.
In a scenario where the SQL server service principal does not have the permissions to automatically update the SPN
, the consequence of that happening is that the SPN record will no longer match the actual running service. When that happens, the SQL client will no longer be able to set up a Kerberos session, and either fall back to NTLM, or simply fail if NTLM was disabled.
So if you have the situation where the required configuration for automatic SPN registration is not possible, make sure that you do NOT use dynamic ports, but use fixed ports. This is as simple as taking the current dynamic port number, and saving it in the TCP Port field instead of the dynamic port field. Be sure to leave the dynamic port field blank to ensure that the service doesn't also use a dynamic port.
Note that if some other process happens to take that high port, the SQL service won't start. But that is much preferable because it is easy to observe and diagnose. One trick that is sometimes used is to use port numbers that were assigned in the tcp protocol to companies or software that no longer exists, or is known to not exist on your network or server.
The Pitfall of Orphaned SPNs
If automatic registration of the SPNs is not allowed, there is the potential for a serious problem. Incidentally, this was the issue that caused me to research SPNs and write this article.
Suppose you have a SQL server up and running, and for some reason or other (move to production, a security audit, ...) you have to change the user account of the SQL Service. For example from LocalSystem to a Managed Service Account, and automatic registration is disabled. If the original account had the SPNs registered to itself, then switching to a different service principal will not update the service principal names. The SPNs are orphaned.
When that happens, things stop working in a hurry. This is much worse than not having SPN at all. When there are no SPNs, mutual authentication falls back to NTLM. But if there is a wrong SPN, this does not happen. The client will try to set up a secure channel for communicating with the user principal it thinks is used to run the SQL Server, and fail.
Troubleshooting Tips and Tricks
If you are experiencing issues with mutual authentication and / or Kerberos, you can troubleshoot in the following way:
- Start with the SQL Server log and see if the SPNs were automatically generated or not. If not, check which two SPNs are supposed to exist for the instance, based on the instance name and port number.
- Use the powershell
Test-Connection
cmdlet to verify network connectivity. - Check if the port number is dynamic or fixed.
- Use
SetSPN
to query both SPNs and verify if they are registered to the correct Active Directory principal. - If you're unsure about the client itself, you can use the SSPIClient test application to test if a client can set up a secure connection or not.
- If you want to run diagnostics on the SQL Server itself, you can use Kerberos Configuration Manager for SQL Server.
Conclusion
With this article, I hope to have explained the basics enough to give you a basic understanding. In summary:
- SPNs are used to associate a user principal with a service running on a computer, for the purpose of making it possible to set up a secure channel between the client and the service, using Kerberos.
- SPNs can be registered automatically, or manually. If they cannot be created automatically, every change to the Service Principal will require a manual update of the SPN registration.
- When the SPNs do not exist, Kerberos cannot be used, and mutual authentication falls back to NTLM or fail completely, depending on security configuration via GPO, Local Security Policy or registry.
- Erroneous SPN records will cause authentication failure.
History
- 15th May, 2023: First version