Logistic regression

😄 fun fact: actually better for classification instead of regression problems

Assume there is a plane in $\mathbb{R}^d$ parameterized by $W$

\begin{aligned} P(Y = 1 \mid x, W) &= \phi (W^T x) \\ P(Y= 0 \mid x, W) &= 1 - \phi (W^T x) \\[12pt] &\because \phi (a) = \frac{1}{1+e^{-a}} \end{aligned}

maximum likelihood

1 - \phi (a) = \phi (-a)

\begin{aligned} W^{\text{ML}} &= \argmax_{W} \prod P(x^i, y^i \mid W) \\ &= \argmax_{W} \prod \frac{P(x^i, y^i, W)}{P(W)} \\ &= \argmax_{W} \prod P(y^i | x^i, W) P(x^i) \\ &= \argmax_{W} \lbrack \prod P(x^i) \rbrack \lbrack \prod P(y^i \mid x^i, W) \rbrack \\ &= \argmax_{W} \sum_{i=1}^{n} \log (\tau (y^i W^T x^i)) \end{aligned}

equivalent form

maximize the following:
$\sum_{i=1}^{n} (y^i \log p^i + (1-y^i) \log (1-p^i))$

softmax

\text{softmax(y)}_i = \frac{e^{y_i}}{\sum_{i} e^{y_i}}

where $y \in \mathbb{R}^k$