Linear Regression
Prediction
assumption of linearity means that the expected value of the target can be expressed as a weighted sum of it's features
Reality
even if the best model for predicting y given x is linear, we would not expect any real world dataset where \(\(y^{(i)}\)\) exactly equals \(\(w^{(i)} x^{(i)}+b\)\) for all \(\(1 \leq i \leq n\)\) due to measurement errors and the like, as there will generally be some error.
Loss Function
for linear regression. the mean squared error loss function is generally used
Analytic solution
linear regression has an analytic solution. we can subsume the bias into the parameter \(w\) by appending a column to design matrix consisting of all ones. then the prediction problem becomes to minimize \(\(|| y - Xw ||^2\)\). taking the derivative with respect to \(w\) and setting it to zero, in the end yields solution.
note that the solution will only be unique when \(\(X^\top X\)\) is invertible.
Gradient Descent
If we have an analytic solution, why would we need to use gradient descent?
matrix multiplication is a very computationally intensive problem (\(O(n^3)\)), which means that while it gives an optimal solution, it's unfeasible to compute it on very large datasets. For such problems it's better to use other optimization solutions.
Example
import torch
import matplotlib.pyplot as plt
f = lambda x: -5 * x + 25
xs = torch.arange(-5, 5, .25).view(-1, 1)
ys = f(xs)
# add randomness to data
ys += torch.randn(ys.shape)
# init
weights = torch.randn((1,), requires_grad=True)
bias = torch.tensor(0., requires_grad=True)
data = []
# train
for i in range(100):
# feed forward
yhat = xs * weights + bias
loss = ((yhat - ys) ** 2 / 2).mean()
print(f'w={weights.item():e}, l={loss.item():e}, b={bias.item():e}')
# backprop
loss.backward()
# optimize
weights.data -= weights.grad.data * .01
bias.data -= bias.grad.data * .1
weights.grad.data.zero_()
bias.grad.data.zero_()
w=1.132170e+00, l=4.839108e+02, b=0.000000e+00
w=5.935752e-01, l=3.936823e+02, b=2.567060e+00
w=1.031284e-01, l=3.203598e+02, b=4.870683e+00
w=-3.435172e-01, l=2.607646e+02, b=6.937812e+00
w=-7.503120e-01, l=2.123172e+02, b=8.792645e+00
...
w=-4.948588e+00, l=3.953586e-01, b=2.491010e+01
w=-4.948682e+00, l=3.953578e-01, b=2.491014e+01
w=-4.948768e+00, l=3.953570e-01, b=2.491017e+01
w=-4.948846e+00, l=3.953564e-01, b=2.491020e+01