Support Vector Machine

idea: maximises margin and more robust to “perturbations”

Euclidean distance between two points $x$ and the hyperplane parametrised by $W$ is:

\frac{\mid W^T x + b \mid }{\|W\|_2}

Assuming $\| W \|_2=1$ then the distance is $\mid W^T x + b \mid$

regularization

SVMs are good for high-dimensional data

We can probably use a solver, or gradient descent

maximum margin hyperplane

$W$ has $\gamma$ margin if

\begin{aligned} W^T x + b \ge \gamma \space &\forall \text{ blue x} \\ W^T x +b \le - \gamma \space &\forall \text{ red x} \end{aligned}

Margin:

Z = \{(x^{i}, y^{i})\}_{i=1}^{n}, y \in \{-1, 1\}, \|W\|_2 = 1

hard-margin SVM

this is the version with bias

"\\begin{algorithm}\n\\caption{Hard-SVM}\n\\begin{algorithmic}\n\\REQUIRE Training set $(\\mathbf{x}_1, y_1),\\ldots,(\\mathbf{x}_m, y_m)$\n\\STATE \\textbf{solve:} $(w_{0},b_{0}) = \\argmin\\limits_{(w,b)} \\|w\\|^2 \\text{ s.t } \\forall i, y_{i}(\\langle{w,x_i} \\rangle + b) \\ge 1$\n\\STATE \\textbf{output:} $\\hat{w} = \\frac{w_0}{\\|w_0\\|}, \\hat{b} = \\frac{b_0}{\\|w_0\\|}$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 6 Hard-SVM

Require: Training set $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$

1:solve: $(w_{0},b_{0}) = \argmin\limits_{(w,b)} \|w\|^2 \text{ s.t } \forall i, y_{i}(\langle{w,x_i} \rangle + b) \ge 1$

2:output: $\hat{w} = \frac{w_0}{\|w_0\|}, \hat{b} = \frac{b_0}{\|w_0\|}$

Note that this version is sensitive to outliers

it assumes that training set is linearly separable

soft-margin SVM

can be applied even if the training set is not linearly separable

"\\begin{algorithm}\n\\caption{Soft-SVM}\n\\begin{algorithmic}\n\\REQUIRE Input $(\\mathbf{x}_1, y_1),\\ldots,(\\mathbf{x}_m, y_m)$\n\\STATE \\textbf{parameter:} $\\lambda > 0$\n\\STATE \\textbf{solve:} $\\min_{\\mathbf{w}, b, \\boldsymbol{\\xi}} \\left( \\lambda \\|\\mathbf{w}\\|^2 + \\frac{1}{m} \\sum_{i=1}^m \\xi_i \\right)$\n\\STATE \\textbf{s.t: } $\\forall i, \\quad y_i (\\langle \\mathbf{w}, \\mathbf{x}_i \\rangle + b) \\geq 1 - \\xi_i \\quad \\text{and} \\quad \\xi_i \\geq 0$\n\\STATE \\textbf{output:} $\\mathbf{w}, b$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 7 Soft-SVM

Require: Input $(\mathbf{x}_1, y_1),\ldots,(\mathbf{x}_m, y_m)$

1:parameter: $\lambda > 0$

2:solve: $\min_{\mathbf{w}, b, \boldsymbol{\xi}} \left( \lambda \|\mathbf{w}\|^2 + \frac{1}{m} \sum_{i=1}^m \xi_i \right)$

3:s.t: $\forall i, \quad y_i (\langle \mathbf{w}, \mathbf{x}_i \rangle + b) \geq 1 - \xi_i \quad \text{and} \quad \xi_i \geq 0$

4:output: $\mathbf{w}, b$

Equivalent form of soft-margin SVM:

\begin{aligned} \min_{w} &(\lambda \|w\|^2 + L_S^{\text{hinge}}(w)) \\[8pt] L_{S}^{\text{hinge}}(w) &= \frac{1}{m} \sum_{i=1}^{m} \max{(\{0, 1 - y \langle w, x_i \rangle\})} \end{aligned}

SVM with basis functions

\min_{W} \frac{1}{n} \sum \max \{0, 1 - y^i \langle w, \phi(x^i) \rangle\} + \lambda \|w\|^2_2

$\phi(x^i)$ can be high-dimensional

representor theorem

W^{*} = \argmin_{W} \frac{1}{n} \sum \max \{0, 1- y^i \langle w, \phi (x^i) \rangle\} + \lambda \|w\|^2_2

theorem

There are real values $a_{1},\ldots,a_{m}$ such that ¹
$W^{*} = \sum a_i \phi(x^i)$

kernelized SVM

from representor theorem, we have the kernel:

K(x,z) = \langle \phi(x), \phi(z) \rangle

drawbacks

prediction-time complexity
need to store all training data
Dealing with $\mathbf{K}_{n \times n}$
choice of kernel, in which is tricky and pretty heuristic sometimes.

note that we can also write $a^T \phi$ where $\phi = [\phi(x^1),\ldots,\phi(x^n)]^T$ ↩

Support Vector Machine

Étiquette

publié à

modifié à

durée

source

maximum margin hyperplane

hard-margin SVM

soft-margin SVM

SVM with basis functions

representor theorem

kernelized SVM

drawbacks

Vous pourriez aimer ce qui suit

Liens retour

Support Vector Machine

Étiquette

publié à

modifié à

durée

source

maximum margin hyperplane

hard-margin SVM

soft-margin SVM

SVM with basis functions

representor theorem

kernelized SVM

drawbacks

Remarque

Vous pourriez aimer ce qui suit

Liens retour