Think of using autoencoders to extract representations.
sparsity allows us to interpret hidden layers and internal representations of Transformers model.
graph TD A[Input X] --> B[Layer 1] B --> C[Layer 2] C --> D[Latent Features Z] D --> E[Layer 3] E --> F[Layer 4] F --> G[Output X'] subgraph Encoder A --> B --> C end subgraph Decoder E --> F end style D fill:#c9a2d8,stroke:#000,stroke-width:2px,color:#fff style A fill:#98FB98,stroke:#000,stroke-width:2px style G fill:#F4A460,stroke:#000,stroke-width:2px
see also latent space
definition
loss function:
contrastive representation learning
The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart. Contrastive learning can be applied to both supervised and unsupervised settings. article
intuition: to give a positive and negative pairs for optimizing loss function.
Lien vers l'original
training objective
we want smaller reconstruction error, or
we want to get the latent space distribution to look something similar to isotopic Gaussian!
Kullback-Leibler divergence
denoted as
definition
The statistical distance between a model probability distribution difference from a true probability distribution :
alternative form:
Lien vers l'original
variational autoencoders
idea: to add a gaussian sampler after calculating latent space.
objective function: