• ↑↓ pour naviguer
  • pour ouvrir
  • pour sélectionner
  • ⌘ ⌥ ↵ pour ouvrir dans un panneau
  • esc pour rejeter
⌘ '
raccourcis clavier

resources: frame-context collapse and OpenAI’s on alignment research (before all the safety disband)

The act of aligning oneself with a particular group or ideology. This can be done for a variety of reasons, including:

  • To gain social acceptance
  • To gain power
  • To gain resources

Often known as a solution to solve “hallucination” in large language models token-generation. 1

To align a model is simply teaching it to generate tokens that is within the bound of the Overton Window.

The goal is to build a aligned system that help us solve other alignment problems

Should we build a ethical aligned systems, or morally aligned systems?

One of mechanistic interpretability’s goal is to ablate harmful features.

RSP

published by Anthropic

The idea is to create a standard for risk mitigation strategy when AI system advances. Essentially create a scale to judge “how capable a system can cause harm”