Introduction
There universally exists a relationship among variables. Indeed, the relationship can be divided into two categories, namely, certainty relation and uncertainty relation. The certainty relation can be expressed with a function. The certainty relation is also called correlation, which can be studied with regression analysis.
Generally, the linear regression model is:
The optimal can be determined by minimum the loss function:
Regression Model
Linear regression consists of linear regression, local weighted linear regression, ridge regression, Lasso regression and stepwise linear regression.
Linear Regression
The parameter for linear regression can be calculated by gradient descent method or regular expression. Because gradient descent method has been introduced in Step-by-Step Guide to Implement Machine Learning IV - Logistic Regression, we introduce the solution with regular expression in this article.
First, calculate the derivative of loss function:
Then, make the derivative equal to 0, we can obtain:
Finally, is:
where X is the training data and Y is the corresponding label. The code of linear regression is shown below:
def standardLinearRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)
xTx = np.dot(x.T, x)
if np.linalg.det(xTx) == 0:
print("Error: Singluar Matrix !")
return
w = np.dot(np.linalg.inv(xTx), np.dot(x.T, y))
return w
Local Weighted Linear Regression
It is underfitting in linear regression for it using the unbiased estimation of minimum mean square error(MMSE). To solve the problem, we assign weights on the points around the point to be predicted. Then, we apply normal regression analysis on it. The loss function for local weighted linear regression is:
Like linear regression, we calculate the derivative of loss function and make it equal to 0. The optimal is
The weights in local weighted linear regression is like the kernel function in SVM, which is given by:
The code of local weighted linear regression is shown below:
def LWLinearRegression(self, x, y, sample):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)
sample_num = len(x)
weights = np.eye(sample_num)
for i in range(sample_num):
diff = sample - x[i, :]
weights[i, i] = np.exp(np.dot(diff, diff.T)/(-2 * self.k ** 2))
xTx = np.dot(x.T, np.dot(weights, x))
if np.linalg.det(xTx) == 0:
print("Error: Singluar Matrix !")
return
result = np.dot(np.linalg.inv(xTx), np.dot(x.T, np.dot(weights, y)))
return np.dot(sample.T, result)
Ridge Regression
If the feature dimension is large, than the number of samples, the input matrix is not full rank, whose inverse matrix doesn't exist. To solve the problem, ridge regression add to make the matrix nonsingular. Actually, it is equal to add L2 regularization on the loss function for ridge regression, namely:
Like linear regression, we calculate the derivative of loss function and make it equal to 0. The optimal is:
The code of ridge regression is shown below:
def ridgeRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)
feature_dim = len(x[0])
xTx = np.dot(x.T, x)
matrix = xTx + np.exp(feature_dim)*self.lamda
if np.linalg.det(xTx) == 0:
print("Error: Singluar Matrix !")
return
w = np.dot(np.linalg.inv(matrix), np.dot(x.T, y))
return w
Lasso Regression
Like ridge regression, Lasso regression add L1 regularization on the loss function, namely:
Because the L1 regularization contains absolute value expression, the loss function is not derivable anywhere. Thus, we apply coordinate descent method (CD). The CD gets a minimum at a direction each iteration, namely,
We can get a closed solution for CD, which is given by:
where:
The code of Lasso regression is shown below:
def lassoRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)
y = np.expand_dims(y, axis=1)
sample_num, feataure_dim = np.shape(x)
w = np.ones([feataure_dim, 1])
for i in range(self.iterations):
for j in range(feataure_dim):
h = np.dot(x[:, 0:j], w[0:j]) + np.dot(x[:, j+1:], w[j+1:])
w[j] = np.dot(x[:, j], (y - h))
if j == 0:
w[j] = 0
else:
w[j] = self.softThreshold(w[j])
return w
Stepwise Linear Regression
Stepwise linear regression is similar to Lasso, which applies greedy algorithm at each iteration to get a minimum rather than CD. Stepwise linear regression adds or cuts down a small part on the weights at each iteration. The code of stepwise linear regression is shown below:
def forwardstepRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)
sample_num, feature_dim = np.shape(x)
w = np.zeros([self.iterations, feature_dim])
best_w = np.zeros([feature_dim, 1])
for i in range(self.iterations):
min_error = np.inf
for j in range(feature_dim):
for sign in [-1, 1]:
temp_w = best_w
temp_w[j] += sign * self.learning_rate
y_hat = np.dot(x, temp_w)
error = ((y - y_hat) ** 2).sum()
if error < min_error:
min_error = error
best_w = temp_w
w = best_w
return w
Conclusion and Analysis
There are many solutions to get the optimal parameter for linear regression. In this article, we only introduce some basic algorithms. Finally, let's compare our linear regression with the linear regression in Sklearn and the detection performance is displayed below:
Sklearn linear regression performance:
Our linear regression performance:
The performances look similar.
The related code and dataset in this article can be found in MachineLearning.
History
- 28th May, 2019: Initial version