• ↑↓ pour naviguer
  • pour ouvrir
  • pour sélectionner
  • ⌘ ⌥ ↵ pour ouvrir dans un panneau
  • esc pour rejeter
⌘ '
raccourcis clavier

To be used with vLLM or any other inference engine.

Built on top of IGW

roadmap.

llm-d/llm-d#26

WG

or well-lit path

  1. P/D disagg serving
    • working implementation
    • Think of large MoE, R1 and serve with certain QPS
  2. NS vs. EW KV Cache management NS Caching:
    • System resources
    • Inference Scheduler handles each nodes separately (HPA) EW Caching:
    • Global KV Manager to share across nodes
    • Scheduler-aware KV (related to the scheduler WG)
    • Autoscaling (KEDA)
  3. Autoscaling

components.

inference gateway