When finding the "line of best fit" between one or more variables and a target, we often turn to our favourite linear regression package. If we're feeling adventurous, we might even write our own gradient descent algorithm. But we can also find a solution using linear algebra. Sounds fun, right?

Right?

Consider a situation where there are two features, $x_1$ and $x_2$, and a target variable, $y$. These could be the hours spent on Instagram per day, the hours on Tik Tok, and the level of brain rot experienced.

Instagram Hours ($x_1$) | Tik Tok Hours ($x_2$) | Brain Rot ($y$) |
---|---|---|

2 | 4 | 29 |

5 | 1 | 15 |

3 | 3 | 24 |

4 | 3 | 27 |

2 | 5 | 34 |

We want to find the relationship between $x_1$, $x_2$ and $y$. For a linear relationship, we would be finding values for $\theta$ in the equation, $$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2$$ such that the error between $y$ and $\hat{y}$ is minimised.

We first need to set up a system of linear equations, which will take the form,

$$Ax = b$$Writing out the matrices, they would be

$$\begin{bmatrix}
1 & 2 & 4 \\
1 & 5 & 1 \\
1 & 3 & 3 \\
1 & 4 & 3 \\
1 & 2 & 5 \\
\end{bmatrix}
\begin{bmatrix}
\theta_0 \\
\theta_1 \\
\theta_2 \\
\end{bmatrix}
=
\begin{bmatrix}
29 \\
15 \\
24 \\
27 \\
34 \\
\end{bmatrix}$$

Note that a column of 1's has been inserted at the start of matrix $A$ so we can get a value for the intercept, $\theta_0$.

If matrix $A$ was singular/invertible, we could solve the system by multiplying both sides by the inverse of $A$, ie

$$\begin{aligned} Ax &= b \\ A^{-1}Ax &= A^{-1}b \\ x &= A^{-1}b \end{aligned}$$Unfortunately, not every matrix has an inverse. For starters, the matrix would need to be square - and in our example the a dimensions are 5x3. So we'll turn to what's known as the pseudo-inverse.

The Moore-Penrose inverse, often referred to as the *pseudo-inverse* or *generalised inverse*, provides an approximation of the inverse of a matrix.

If we again take the system $Ax=b$ and now multiply both sides by $A^T$, we get

$$\begin{aligned} Ax &= b \\ A^T Ax &= A^{T} b \\ x &= (A^T A)^{-1} A^{T} b \\ \end{aligned}$$The term $(A^T A)^{-1} A^T$ is the pseudo-inverse, which is denoted as $A^{\dagger}$, so we can write

$$x = A^{\dagger}b $$We'll run through the example from above using Numpy to calculate this. First, we need to set up the matrices as Numpy arrays.

```
import numpy as np
A = np.array([
[2,4],
[5,1],
[3,3],
[4,3],
[2,5],
])
# Add in the column of ones to A
A_ = np.append(np.ones((A.shape[0],1)), A, axis=1)
b = np.array([
[29],
[15],
[24],
[27],
[34],
])
```

If we want to calculate the pseudo-inverse, $A^{\dagger}$, we need to perform $(A^T A)^{-1} A^T$. The @ operator can be used for matrix multiplication for Numpy arrays.

```
A_pinv = np.linalg.inv((A_.T) @ A_) @ A_.T
print('Pseudo-inverse of A:')
print(np.round(A_pinv,2))
```

We can now try to find a solution for x, using

$$x = A^{\dagger} b$$```
x = A_pinv @ b
print(x)
```

We have found,

$$\begin{aligned} \theta_0 &= -1.4 \\ \theta_1 &= 2.1 \\ \theta_2 &= 6.4 \\ \end{aligned}$$So our least squares solution is,

$$\hat{y} = -1.4 + 2.1x_1 + 6.4x_2$$Numpy provides us with a function for computing the pseudoinverse, numpy.linalg.pinv(). It uses singular-value decomposition (SVD) which is more robust and will also make the code a lot cleaner.

```
x = np.linalg.pinv(A_) @ b
print(x)
```

Again, we find the same values as before. We can also check these with a gradient descent algorithm.

```
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(A,b)
print(f'Intercept: {reg.intercept_} \
\nCoefficients: {reg.coef_}')
```

Perfect, all our values match. We can put it in a function for later use.

```
def lss(A: np.ndarray, b: np.ndarray) -> np.ndarray:
"""
Least squares solution for the linear system Ax=b.
Returns an array of the coefficients, the first value
is the intercept/bias.
"""
if not all(isinstance(obj, np.ndarray) for obj in [A,b]):
return 'Arguments must be of type np.ndarray'
if A.ndim == 1:
A = A.reshape(-1,1)
A_ = np.append(np.ones((A.shape[0],1)), A, axis=1)
return np.linalg.pinv(A_) @ b
```