profile pic
⌘ '
raccourcis clavier

see also: unstable docs

MultiMarginLoss

Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input xx (a 2D mini-batch Tensor) and output yy (which is a 1D tensor of target class indices, 0yx.size(1)10 \le y \le \text{x}.\text{size}(1) -1):

For each mini-batch sample, loss in terms of 1D input xx and output yy is:

loss(x,y)=imax0,marginx[y]+x[i]px.size(0)i{0,x.size(0)1} and iy\text{loss}(x,y) = \frac{\sum_{i} \max{0, \text{margin} - x[y] + x[i]}^p}{x.\text{size}(0)} \\ \because i \in \{0, \ldots x.\text{size}(0)-1\} \text{ and } i \neq y

SGD

Nesterov momentum is based on On the importance of initialization and momentum in deep learning

"\\begin{algorithm}\n\\caption{SGD in PyTorch}\n\\begin{algorithmic}\n\\State \\textbf{input:} $\\gamma$ (lr), $\\theta_0$ (params), $f(\\theta)$ (objective), $\\lambda$ (weight decay),\n\\State $\\mu$ (momentum), $\\tau$ (dampening), nesterov, maximize\n\\For{$t = 1$ to $...$}\n \\State $g_t \\gets \\nabla_\\theta f_t(\\theta_{t-1})$\n \\If{$\\lambda \\neq 0$}\n \\State $g_t \\gets g_t + \\lambda\\theta_{t-1}$\n \\EndIf\n \\If{$\\mu \\neq 0$}\n \\If{$t > 1$}\n \\State $b_t \\gets \\mu b_{t-1} + (1-\\tau)g_t$\n \\Else\n \\State $b_t \\gets g_t$\n \\EndIf\n \\If{$\\text{nesterov}$}\n \\State $g_t \\gets g_t + \\mu b_t$\n \\Else\n \\State $g_t \\gets b_t$\n \\EndIf\n \\EndIf\n \\If{$\\text{maximize}$}\n \\State $\\theta_t \\gets \\theta_{t-1} + \\gamma g_t$\n \\Else\n \\State $\\theta_t \\gets \\theta_{t-1} - \\gamma g_t$\n \\EndIf\n\\EndFor\n\\State \\textbf{return} $\\theta_t$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 1 SGD in PyTorch

1:input: γ\gamma (lr), θ0\theta_0 (params), f(θ)f(\theta) (objective), λ\lambda (weight decay),

2:μ\mu (momentum), τ\tau (dampening), nesterov, maximize

3:for t=1t = 1 to ...... do

4:gtθft(θt1)g_t \gets \nabla_\theta f_t(\theta_{t-1})

5:if λ0\lambda \neq 0 then

6:gtgt+λθt1g_t \gets g_t + \lambda\theta_{t-1}

7:end if

8:if μ0\mu \neq 0 then

9:if t>1t > 1 then

10:btμbt1+(1τ)gtb_t \gets \mu b_{t-1} + (1-\tau)g_t

11:else

12:btgtb_t \gets g_t

13:end if

14:if nesterov\text{nesterov} then

15:gtgt+μbtg_t \gets g_t + \mu b_t

16:else

17:gtbtg_t \gets b_t

18:end if

19:end if

20:if maximize\text{maximize} then

21:θtθt1+γgt\theta_t \gets \theta_{t-1} + \gamma g_t

22:else

23:θtθt1γgt\theta_t \gets \theta_{t-1} - \gamma g_t

24:end if

25:end for

26:return θt\theta_t