Linear regression See also slides for curve fitting , regression , colab link , bias and intercept
python: ols_and_kls.py
curve fitting
how do we fit a distribution of data over a curve?
Given a set of n n n data points S = { ( x i , y i ) } n = 1 n S=\set{(x^i, y^i)}^{n}_{n=1} S = { ( x i , y i ) } n = 1 n
x ∈ R d x \in \mathbb{R}^{d} x ∈ R d
y ∈ R y \in \mathbb{R} y ∈ R (or R k \mathbb{R}^{k} R k )
ols.
Ordinary Least Squares (OLS)
Let y i ^ \hat{y^i} y i ^ be the prediction of a model X X X , d i = ∥ y i − y i ^ ∥ d^i = \| y^i - \hat{y^i} \| d i = ∥ y i − y i ^ ∥ is the error, minimize ∑ i = 1 n ( y i − y i ^ ) 2 \sum_{i=1}^{n} (y^i - \hat{y^i})^2 ∑ i = 1 n ( y i − y i ^ ) 2
In the case of 1-D ordinary least square, the problems equates find a , b ∈ R a,b \in \mathbb{R} a , b ∈ R to minimize min a , b ∑ i = 1 n ( a x i + b − y i ) 2 \min\limits_{a,b} \sum_{i=1}^{n} (ax^i + b - y^i)^2 a , b min ∑ i = 1 n ( a x i + b − y i ) 2
optimal solution
a = x y ‾ − x ‾ ⋅ y ‾ x 2 ‾ − ( x ‾ ) 2 = COV ( x , y ) Var ( x ) b = y ‾ − a x ‾ \begin{aligned}
a &= \frac{\overline{xy} - \overline{x} \cdot \overline{y}}{\overline{x^2} - (\overline{x})^2} = \frac{\text{COV}(x,y)}{\text{Var}(x)} \\
b &= \overline{y} - a \overline{x}
\end{aligned} a b = x 2 − ( x ) 2 x y − x ⋅ y = Var ( x ) COV ( x , y ) = y − a x
where x ‾ = 1 N ∑ x i \overline{x} = \frac{1}{N} \sum{x^i} x = N 1 ∑ x i , y ‾ = 1 N ∑ y i \overline{y} = \frac{1}{N} \sum{y^i} y = N 1 ∑ y i , x y ‾ = 1 N ∑ x i y i \overline{xy} = \frac{1}{N} \sum{x^i y^i} x y = N 1 ∑ x i y i , x 2 ‾ = 1 N ∑ ( x i ) 2 \overline{x^2} = \frac{1}{N} \sum{(x^i)^2} x 2 = N 1 ∑ ( x i ) 2
hyperplane
y ^ = w 0 + ∑ j = 1 d w j x j ∵ w 0 : the y-intercept (bias) \begin{aligned}
\hat{y} &= w_{0} + \sum_{j=1}^{d}{w_j x_j}\\[12pt]
&\because w_0: \text{the y-intercept (bias)}
\end{aligned} y ^ = w 0 + j = 1 ∑ d w j x j ∵ w 0 : the y-intercept (bias)
Homogeneous hyperplane:
w 0 = 0 y ^ = ∑ j = 1 d w j x j = ⟨ w , x ⟩ = w T x \begin{aligned}
w_{0} & = 0 \\
\hat{y} &= \sum_{j=1}^{d}{w_j x_j} = \langle{w,x} \rangle \\
&= w^Tx
\end{aligned} w 0 y ^ = 0 = j = 1 ∑ d w j x j = ⟨ w , x ⟩ = w T x
Matrix form OLS:
X n × d = ( x 1 1 ⋯ x d 1 ⋮ ⋱ ⋮ x 1 n ⋯ x d n ) , Y n × 1 = ( y 1 ⋮ y n ) , W d × 1 = ( w 1 ⋮ w d ) X_{n\times d} = \begin{pmatrix}
x_1^1 & \cdots & x_d^1 \\
\vdots & \ddots & \vdots \\
x_1^n & \cdots & x_d^n
\end{pmatrix}, Y_{n\times 1} = \begin{pmatrix}
y^1 \\
\vdots \\
y^n
\end{pmatrix}, W_{d\times 1} = \begin{pmatrix}
w_1 \\
\vdots \\
w_d
\end{pmatrix} X n × d = x 1 1 ⋮ x 1 n ⋯ ⋱ ⋯ x d 1 ⋮ x d n , Y n × 1 = y 1 ⋮ y n , W d × 1 = w 1 ⋮ w d
Obj : ∑ i = 1 n ( y ^ i − y i ) 2 = ∑ i = 1 n ( ⟨ w , x i ⟩ − y i ) 2 Def : Δ = ( Δ 1 ⋮ Δ n ) = ( x 1 1 ⋯ x d 1 ⋮ ⋱ ⋮ x 1 n ⋯ x d n ) ( w 1 ⋮ w d ) − ( y 1 ⋮ y n ) = ( y ^ 1 − y 1 ⋮ y ^ n − y n ) \begin{aligned}
\text{Obj} &: \sum_{i=1}^n (\hat{y}^i - y^i)^2 = \sum_{i=1}^n (\langle w, x^i \rangle - y^i)^2 \\
&\\\
\text{Def} &:
\Delta = \begin{pmatrix}
\Delta_1 \\
\vdots \\
\Delta_n
\end{pmatrix} = \begin{pmatrix}
x_1^1 & \cdots & x_d^1 \\
\vdots & \ddots & \vdots \\
x_1^n & \cdots & x_d^n
\end{pmatrix} \begin{pmatrix}
w_1 \\
\vdots \\
w_d
\end{pmatrix} - \begin{pmatrix}
y^1 \\
\vdots \\
y^n
\end{pmatrix} = \begin{pmatrix}
\hat{y}^1 - y^1 \\
\vdots \\
\hat{y}^n - y^n
\end{pmatrix}
\end{aligned} Obj Def : i = 1 ∑ n ( y ^ i − y i ) 2 = i = 1 ∑ n (⟨ w , x i ⟩ − y i ) 2 : Δ = Δ 1 ⋮ Δ n = x 1 1 ⋮ x 1 n ⋯ ⋱ ⋯ x d 1 ⋮ x d n w 1 ⋮ w d − y 1 ⋮ y n = y ^ 1 − y 1 ⋮ y ^ n − y n
min W ∈ R d × 1 ∥ X W − Y ∥ 2 2 \min\limits_{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2 W ∈ R d × 1 min ∥ X W − Y ∥ 2 2
W LS = ( X T X ) − 1 X T Y W^{\text{LS}} = (X^T X)^{-1}{X^T Y} W LS = ( X T X ) − 1 X T Y
Example:
y ^ = w 0 + w 1 ⋅ x 1 + w 2 ⋅ x 2 \hat{y} = w_{0} + w_{1} \cdot x_{1} + w_{2} \cdot x_{2} y ^ = w 0 + w 1 ⋅ x 1 + w 2 ⋅ x 2
With
X n × 2 = ( x 1 1 x 2 1 x 1 2 x 2 2 x 1 3 x 2 3 ) X_{n \times 2} = \begin{pmatrix}
x^{1}_{1} & x^{1}_{2} \\
x^{2}_{1} & x^{2}_{2} \\
x^{3}_{1} & x^{3}_{2}
\end{pmatrix} X n × 2 = x 1 1 x 1 2 x 1 3 x 2 1 x 2 2 x 2 3
and
X n × 3 ′ = ( x 1 1 x 2 1 1 x 1 2 x 2 2 1 x 1 3 x 2 3 1 ) X^{'}_{n \times 3} = \begin{pmatrix}
x^{1}_{1} & x^{1}_{2} & 1 \\
x^{2}_{1} & x^{2}_{2} & 1 \\
x^{3}_{1} & x^{3}_{2} & 1
\end{pmatrix} X n × 3 ′ = x 1 1 x 1 2 x 1 3 x 2 1 x 2 2 x 2 3 1 1 1
With
W = ( w 1 w 2 ) W = \begin{pmatrix}
w_1 \\
w_2
\end{pmatrix} W = ( w 1 w 2 )
and
W ′ = ( w 1 w 2 w 0 ) W^{'} = \begin{pmatrix}
w_1 \\
w_2 \\
w_0
\end{pmatrix} W ′ = w 1 w 2 w 0
thus
X ′ × W = ( w 0 + ∑ w i × x i 1 ⋮ w 0 + ∑ w i × x i n ) X^{'} \times W = \begin{pmatrix}
w_0 + \sum{w_i \times x_i^{1}} \\
\vdots \\
w_0 + \sum{w_i \times x_i^{n}}
\end{pmatrix} X ′ × W = w 0 + ∑ w i × x i 1 ⋮ w 0 + ∑ w i × x i n