This article provides an introduction to Microsoft Neural Network algorithm in SQL Server. It covers the use of neural networks for predictive analytics (whether a customer is likely to be a buyer) based on stored customer information.
Background about Data Mining and Neural Networks
There are several definitions for both business and scientific data mining. Data mining is a method of automatically sorting large amounts of data to discover behaviors, patterns, and trends. It enables companies to make proactive, data driven decisions. Data Mining helps to discover patterns from large data sets by using its historical experience (stored in relational databases or cubes) to predict the future,
So why don’t we use traditional datawarehouse? Datawarehouses are good for descriptive analysis (like what happened) and diagnostic analysis (why it happened). However, companies also need to do deep analysis and that is where Data mining can help with predictive analytics.
Artificial Intelligence, machine learning and deep learning are now commonly used terms. Things have certainly progressed a lot over the last few years with use of AI in various aspects of everyday life including conversational AI like chatbots for customer service, etc.
Truly intelligent machines would be the ones that can learn, reason, and make decisions like humans. Neural Networks (NNs) are perhaps the closest to provide the answer.
Human brains are made up of connected networks of neurons (nerve cells). NNs try to mimic these networks. This would enable “machines” to learn and make decisions just like humans do.
Our brain is arranged or structured in layers. Different layers in our brain process information and pass it on to the next layer. Information that comes in is processed by neurons of the layer, provide an insight and then pass that data/insight to the next layer.
Let’s take an example of freshly baked Muffin from your favorite café and your subsequent decision making. Our brain processes the smell of freshly baked Muffins in multiple stages. Initially, you smell the Muffin (data input) and then you think “I like Muffins” (thought – next layer) and then “I want at least one Muffin” (decision).
What is an Artificial Neural Network?
Artificial Neural Networks (ANN) attempt to simulate this layered or multi-level approach to information processing and decision. An ANN can have three basic layers of neurons: the input layer (input data), the hidden layer (core processing layer) and the output layer (decision layer). We are taking a simple example of an ANN but complex ANNs can include multiple hidden layers. The basic principle however remains the same, information flows from one layer to another, just like in the human brain.
Of course, this is a very simplistic view of how our brain works, but I hope you get the picture of what “Artificial” Neural Network try to achieve.
The Algorithm
Input layer: All the input attribute values for the data mining model, and their probabilities
Hidden layer: Receive data from input layer and supply outputs to next layer. Probabilities of the varied inputs can be assigned weights. A weight refers to importance of an input. You can weigh something as negative which means that this input isn’t favorable.
Output layer: Predictable value for the model
Microsoft Neural Network in SQL Server is generally more advanced than Decision Trees and Naïve Bayes.
A neural network model must contain a key, inputs (one or more), and predictable values (one or multiple). Data mining models are affected by the values that are specified for the parameters that are available to the algorithm.
If you are conversant with Adventureworks, we’ll use Age and Gender as inputs for the input layer.
The Hidden layer is the middle layer that expects a weightage to every input.
All predicted attributes are mapped to the Output layer. In this example, it’s the customer of the bicycle.
A neuron combines multiple inputs with different techniques. Similarly, Microsoft Neural Network uses Weighted Sum. Maximum, Average, logical AND, logical OR.
Once these inputs are calculated, then the activation function is used. “tanh” is used as the hidden layer activation function and “sigmoid” function is used for the output layer.
Backward Propagation of Errors or Backpropagation
Backpropagation is like training the neural network. It fine tunes the weights within a neural network based on error rate from the previous iteration. Further fine tuning reduces error rates and increases the reliability of the model.
So how does this work with Microsoft Neural Network algorithm in SQL Server.
Initially random values are assigned as weightages followed by the algorithm calculating outputs and errors for the training data set. The process calculates the output error and then weightages are changed or modified. The whole process repeats till minimum error is achieved.
Implementing an Artificial Neural Network in SQL Server
The neural network algorithm is perhaps one of the most difficult to explain. SQL Server Analysis Services allows for a relatively simple way of implementing of the algorithm for data analytics.
SQL Server is a very popular database but used primarily as a data warehouse or transactional data storage. SQL Server supports Data Mining features which are excellent for predictive analysis.
SQL Server 2019 is used in this article, but 2017 version is also compatible.
In this article, we will be using a sample data set which you can download AdventureWorks database and install it to your SQL Server instance. The dataset and the sample used in this article is based on Adventureworks which is a widely used sample set.
Create a data mining project. Open Microsoft Visual Studio and create Multidimensional project using Analysis Services Multidimensional and Data Mining template. You’ll see the project under Solution Explorer.
Next, we need to configure the data source for the project. Choose a new data connection as shown below.
The data source makes a connection to the sample database, AdventureWorksDW2019
.
Next step is to provide the credentials for Analysis service to connect to the database. On the Impersonation Information screen, choose "Use a specific Windows user name and password."
Analysis service is the container used to store the data mining models. I have used Windows authentication for this example.
Data source will appear in the solution explorer (see below screen shot).
Select a data source view. You can choose the required objects (subset of tables and Views) for the project.
On the Select Tables and Views screen, choose vTargetMail view from the Available Objects to be Included. You can filter the objects. If you have selected tables with foreign key constraints, related tables can be included by selecting Add Related Table.
You can create multiple data source views. But, you can have only one data source for a given data source view.
The data source view will now appear in the Solution Explorer window.
Data Mining Model
Now that the basic setup is done, we can now start with the data mining project. Create a mining structure based on the Data View vTargetMail Model. This view has the historical data related to customers and that is what we want to use for our mining model, for predictive analytics.
In the Solution Explorer, highlight and right click on Mining Structures Folder and create a New Mining Structure.
The following screen will pop up for the data mining model creation.
We are using an existing relational database or data warehouse option for this example. You can choose an OLAP cube.
Select the Microsoft Neural Network as the data mining technique on the next screen.
Next, select the Data Source View created earlier and select vTargetMail.
Remember we are training the Neural Network. So, it’s time to specify what we’ll provide as input and what we want to predict.
On the Specify the Training Data page, pick CustomerKey
as the Key column. The remainder of the columns shown in the screen shot will be used as input. The BikeBuyer
is what we want to know or predict, so check the Predictable box for the BikeBuyer
column.
Choose the relevant attributes that are relevant to predicting the bike buyer. Attributes such as “Email
” are not relevant. So, in our case, we are choosing the Age
and CommuteDistance
, etc. The predict attribute is BikeBuyer
.
There are other input columns chosen that may have an impact on our predictions such as if a person is married, (the maritalstatus
column) and that it may affect in the decision to buy a bike.
Next step is to change the Content
and DataType
.
Content Types
Discrete
Column with Discrete content type contains a finite number of values with no continuum between values. For instance, a gender column is a typical discrete attribute column. The data in this column can only be a specific number of categories.
The values are not considered as sorted or orderly. Area Codes are an example of discrete data that is numeric. Discrete columns cannot have fractional values.
Continuous
A column is Continuous when it contains values that represent numeric data on a scale that also can have interim values. This type of column represents scalable measurements. Weather temperatures is an example of a continuous attribute.
Neural Network supports continuous types. In this example, Age and Yearly Income are continuous, and other content types are Discrete. The content type for Bike Buyer, Number Cars Owned, Number Children at Home, and Total Children, etc. have been changed to Discrete. Bike Buyer is either True
or False
so the data type for Bike Buyer has been changed from Long
to Boolean
.
Testing Set
Next create a Testing Set.
Now let's test our model with a valid data set. As the name suggests, Train dataset is used to train the model while the test dataset is used to test the model.
A a general rule, we use 70/30 as a split for train/test dataset. Input data will be randomly split into two sets, a training and a testing set. This is based on the percentage of data for testing and a maximum number of cases in testing data set. The training set is for create the mining model while the testing set is used to check model's accuracy.
The percentage of data for testing specifies percentage of cases reserved for testing. The Maximum number of cases in testing data set are limited by the total number of cases specified to test. Both values are enforced if specified.
You can add another model (Mining structure can contain multiple mining models for comparison) for yearly income. Complete the step and name the mining structure name and the mining model.
Mining structure will now appear in the Solution Explorer.
I created another model for Income just for comparison. If you are choosing to create this model, then after the model is created, change the Yearly Income to predict.
Process the Models
Click on the Mining Model Viewer tab. SSAS objects will be deployed to the server specified in the project properties. When prompted, "Would you like to build and deploy the project first?", choose "Yes".
When the Process Mining Model window appears, Click the "Run..." button.
Mining Model Viewer
Now model has been processed using the Microsoft Artificial Neural Network algorithm and the results for the Bike Buyer model are available using Mining Model Viewer.
The above screenshot shows that Customers
with an income bracket of $79,000 to $154,000 are more favorable of buying a bike. We can filter the results further. In the above screen shot, I have filtered the result for age 35-44 and commute distance of 1 to 2 miles.
Try analyzing another view such as results for Single, Female who has two cars.
By analyzing these views, the data scientists and analysts can understand what are contributing factors for a customer to buy a bike.
Predictions
Now that we have our Data Mining, let’s "ask" the prediction engine if a customer with specific characteristics will buy the bike. We can create queries for this
Click on the Mining Model Prediction Tab. Click there. The user interface for Mining Model Predictions looks like the below screen shot:
Our First Query and Prediction
The first query is whether a "36 years old customer with a Commute Distance of 1-2 miles with High School education, Professional occupation, Male, Single and House Owner and a Yearly income of $45,000" will buy a bike.
You can also use a DMX query in SQL Server Management Studio (SSMS), as shown in the below screenshot. I personally prefer using a query rather than a user interface.
SELECT
(Predict([NNetwork].[Bike Buyer])) as [Buyer],
(PredictProbability([NNetwork].[Bike Buyer])) as [Probability_to_buy]
From
[NNetwork]
NATURAL PREDICTION JOIN
(SELECT 36 AS [Age],
'True' AS [Bike Buyer],
'1-2 Miles' AS [Commute Distance],
'High School' AS [English Education],
'Professional' AS [English Occupation],
'M' AS [Gender],
'1' AS [House Owner Flag],
'S' AS [Marital Status],
0 AS [Number Cars Owned],
0 AS [Number Children At Home],
0 AS [Total Children],
45000 AS [Yearly Income]) AS t
The probability to buy for the above query is 73.8%.
We can try another query in SSMS. This time, we are asking for results for a "45 years old, commutes 1-2 miles, Professional, Female, house owner, Married, owns 2 cars and 2 children at home and a yearly income of $80,000".
SELECT
(Predict([NNetwork].[Bike Buyer])) as [Buyer],
(PredictProbability([NNetwork].[Bike Buyer])) as [Probability_to_buy]
From
[NNetwork]
NATURAL PREDICTION JOIN
(SELECT 45 AS [Age],
'True' AS [Bike Buyer],
'1-2 Miles' AS [Commute Distance],
'High School' AS [English Education],
'Professional' AS [English Occupation],
'F' AS [Gender],
'1' AS [House Owner Flag],
'M' AS [Marital Status],
2 AS [Number Cars Owned],
2 AS [Number Children At Home],
2 AS [Total Children],
80000 AS [Yearly Income]) AS t
I have got a probability to buy of 60.4%. The results may differ based on the data that you have. Next, we are trying to predict if a "36 year old, Bachelors educated, Professional, Male, owns a house and is Single with a Yearly income of $85,000" will buy a Bike.
SELECT
(Predict([NNetwork].[Bike Buyer])) as [Buyer],
(PredictProbability([NNetwork].[Bike Buyer])) as [Probability_to_buy]
From
[NNetwork]
NATURAL PREDICTION JOIN
(SELECT 36 AS [Age],
'True' AS [Bike Buyer],
'1-2 Miles' AS [Commute Distance],
'Bachelors' AS [English Education],
'Professional' AS [English Occupation],
'M' AS [Gender],
'1' AS [House Owner Flag],
'S' AS [Marital Status],
0 AS [Number Cars Owned],
0 AS [Number Children At Home],
0 AS [Total Children],
85000 AS [Yearly Income]) AS t
The result is 88% probability.
Algorithm Parameters
There are model parameters for Microsoft Neural Network that can improve results. We have used the deafult values for our example.
There are several steps to take in training a data mining model. These are influenced by the values specified as algorithm parameters. The algorithm first extracts training data from the data source for evaluation. A percentage of the training data is reserved for assessing the accuracy of the network. This is called the holdout data. Throughout the training process, the network is evaluated after each iteration through the training data. When the accuracy cannot increase any further, the training process is stopped.
The values of the SAMPLE_SIZE
and HOLDOUT_PERCENTAGE
parameters are used to determine sample number of cases from the training data and the number of cases to be put aside for the holdout data. The value of the HOLDOUT_SEED
parameter helps determine random cases that are put aside for the holdout data. The algorithm then uses the value of the HIDDEN_NODE_RATIO
parameter when determining the initial number of neurons to form the hidden layer.
You can find more details here.
Summary
Although there are a number of other algorithms available for data mining in SQL Server, Microsoft Artificial Neural Network is one of the more sophisticated one available. This algorithm simulates how our brain works with layered inputs and outputs which is what makes it powerful for Data Scientists.
History
- 8th January, 2022: Initial version