This article discusses linear regression, which is a supervised machine learning method used for predicting or diagnosing results based on input data.
This post will be another short Machine Learning lesson (or a set of materials to be more precise). Particularly, it will be about Linear Regression, which is a method of supervised machine learning (when there is a training set).
First of all, please watch these videos explaining the theoretical material: video 1 and video 2
Basically, it is a way to "predict" (or diagnose) a result from an input, after training the model with the training set.
The mathematics behind this is explained here and it is also known as the best fitting curve.
In order to consolidate this material (at least for myself :)), here is an easy exercise. Given the training set:
x: | 0 | 1 | 2 | 3 | 4 |
y: | 3 | 6 | 7 | 8 | 11 |
Find the hypothesis function:
$f(x)=w_{1}\cdot x + w_{0}$
Using the formulas from the video material:
$w_{0}=\frac{1}{M}\cdot\sum y_{i} - \frac{w_{1}}{M} \cdot\sum x_{i}$
$w_{1}=\frac{M\cdot \sum x_{i}\cdot y_{i} - \left(\sum x_{i}\right)\cdot\left(\sum y_{i}\right)}{M\cdot \sum x^2_{i} - \left(\sum x_{i}\right)^2}$
and the fact that M=5, we have:
$w_{1}=\\ \frac{5\cdot(0\cdot3+1\cdot6+2\cdot7+3\cdot8+4\cdot11)–(0+1+2+3+4)\cdot(3+6+7+8+11)}{5\cdot(0+1+4+9+16)-(0+1+2+3+4)^2}\\ =\frac{5\cdot 88-350}{5\cdot 30-100}=\frac{9}{5}$
$w_{0}=\frac{1}{5} \cdot (3+6+7+8+11) - \frac{9}{5} \cdot \frac{1}{5} \cdot (0+1+2+3+4) =\\ 7 – \frac{18}{5} = \frac{17}{5}$
So
$f(x)=1.8\cdot x + 3.4$