probabilitic modeling

discriminant analysis

P(x \mid y = 1, \mu_0, \mu_1, \beta) = \frac{1}{a_0} e^{-\|x-\mu_1\|^2_2}

maximum likelihood estimate

given $\Theta = \{\mu_1, \mu_2, \beta\}$ :

\begin{aligned} \argmax_{\Theta} P(Z \mid \Theta) &= \argmax_{\Theta} \prod_{i=1}^{n} P(x^i, y^i \mid \Theta) \\ \end{aligned}

How can we predict the label of a new test point?

Or in another words, how can we run inference?

Check $\frac{P(y=0 \mid X, \Theta)}{P(y=1 \mid X, \Theta)} \ge 1$

Generalization for correlated features

Gaussian for correlated features:
$\mathcal{N}(x \mid \mu, \Sigma) = \frac{1}{(2 \pi)^{d/2}|\Sigma|^{1/2}} \exp (-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu))$

Naive Bayes Classifier

assumption

Given the label, the coordinates are statistically independent
$P(x \mid y = k, \Theta) = \pi_j P(x_j \mid y=k, \Theta)$

idea: comparison between discriminative and generative models

url: thoughts/Logistic-regression
Logistic regression
😄 fun fact: actually better for classification instead of regression problems

Assume there is a plane in $\mathbb{R}^d$ parameterized by $W$
$\begin{aligned} P(Y = 1 \mid x, W) &= \phi (W^T x) \\ P(Y= 0 \mid x, W) &= 1 - \phi (W^T x) \\[12pt] &\because \phi (a) = \frac{1}{1+e^{-a}} \end{aligned}$
maximum likelihood
$1 - \phi (a) = \phi (-a)$ $\begin{aligned} W^{\text{ML}} &= \argmax_{W} \prod P(x^i, y^i \mid W) \\ &= \argmax_{W} \prod \frac{P(x^i, y^i, W)}{P(W)} \\ &= \argmax_{W} \prod P(y^i | x^i, W) P(x^i) \\ &= \argmax_{W} \lbrack \prod P(x^i) \rbrack \lbrack \prod P(y^i \mid x^i, W) \rbrack \\ &= \argmax_{W} \sum_{i=1}^{n} \log (\tau (y^i W^T x^i)) \end{aligned}$

equivalent form

maximize the following:
$\sum_{i=1}^{n} (y^i \log p^i + (1-y^i) \log (1-p^i))$

url: thoughts/optimization
description: softmax
softmax
$\text{softmax(y)}_i = \frac{e^{y_i}}{\sum_{i} e^{y_i}}$
where $y \in \mathbb{R}^k$
Lien vers l'original

url: thoughts/cross-entropy
cross entropy
Lien vers l'original
Lien vers l'original

probabilitic modeling

Étiquette

publié à

modifié à

durée

source

discriminant analysis

maximum likelihood estimate

Naive Bayes Classifier

Logistic regression

maximum likelihood

softmax

cross entropy

Vous pourriez aimer ce qui suit

Liens retour