KubeStellar for AI Infrastructure: Multi-Cluster Orchestration for LLM-d and Kagenti
June 2026
Running AI workloads across multiple Kubernetes clusters is hard. Routing LLM inference requests across prefill and decode nodes in different regions, managing AI agent fleets that span cloud boundaries, monitoring KV cache utilization in real time — none of this was designed for single-cluster tooling.
KubeStellar is the multi-cluster control plane that makes this possible. With native integrations for both LLM-d (LLM inference disaggregation) and Kagenti (AI agent platform), KubeStellar now provides the observability and orchestration layer that enterprise AI infrastructure demands.
The AI Infrastructure Problem
Modern AI deployments are inherently distributed:
- Prefill clusters and decode clusters must coordinate without tight coupling
- LLM serving spans GPU nodes across availability zones and clouds
- AI agent fleets need to discover tools, run pipelines, and maintain state across cluster boundaries
- Teams need a single pane of glass — not a different dashboard per cluster
KubeStellar was built to solve multi-cluster workload distribution for Kubernetes. Extending it to AI infrastructure is a natural fit.
LLM-d: Multi-Cluster Inference Management
LLM-d is a CNCF project for LLM inference disaggregation — splitting prefill and decode phases across specialized hardware pools to maximize GPU utilization and minimize first-token latency.
KubeStellar Console ships 12 dedicated LLM-d monitoring cards, covering the full inference stack:
| Dashboard | What It Shows |
|---|---|
| LLM-d Overview | Inference endpoints, deployed models, request flow across clusters |
| LLM-d Benchmarks | Nightly E2E pass rates across OCP, GKE, and CKS — per-guide green/red matrix |
| KV Cache Monitor | Real-time KV cache utilization across prefill pools |
| EPP Routing | Efficient Prompt Processing routing decisions |
| Prefill/Decode Disaggregation | Live metrics for disaggregated inference paths |
| ML Jobs & Notebooks | Training jobs and Jupyter notebooks across all clusters |
With KubeStellar, you get a single dashboard that aggregates LLM-d signals from every cluster in your fleet — no per-cluster logins, no context switching.
Getting Started with LLM-d
Deploy the LLM-d dashboard group from your KubeStellar Console:
- Navigate to Dashboards → Add Dashboard Group
- Select the AI / ML Operations preset
- Choose target clusters
- LLM-d cards populate automatically from your inference stack metrics
Kagenti: AI Agent Fleet Orchestration
Kagenti is an AI agent platform for building, deploying, and operating LLM-powered agents in Kubernetes. Agent fleets — groups of specialized agents working together — need to discover tools, share state, and route tasks across infrastructure that may span multiple clusters.
KubeStellar Console provides 8 dedicated Kagenti management cards:
| Card | Purpose |
|---|---|
| Agent Fleet Overview | All agents across all clusters, with live status and on/off toggles |
| Agent Build Pipelines | CI/CD for agent artifacts with per-pipeline pass rates |
| MCP Tool Registry | Searchable registry of all Model Context Protocol tools available to agents |
| Agent Discovery | Real-time agent discovery across cluster boundaries |
| Agent Topology | Visual topology map of agent-to-agent communication |
| Security Posture | Agent permission audit and RBAC drift detection |
| Kagenti Status | Platform-level health across all deployments |
Multi-cluster becomes essential here: you may run specialized agents (document processors, code reviewers, data pipeline agents) on different clusters optimized for their workload type. KubeStellar’s workload distribution policies let you declare where agent types run; Kagenti cards let you observe them all from place.
Configuring Kagenti LLM Providers
Kagenti supports Gemini, Anthropic, and OpenAI as LLM backends. See the Kagenti LLM Provider Setup guide for step-by-step configuration.
LLM-d + Kagenti Together
The real power comes from combining both integrations. A common architecture:
┌─────────────────────┐
│ KubeStellar WDS │
│ (workload policy) │
└──────┬──────────────┘
│
┌────────────────┼─────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ GPU Cluster│ │ CPU Cluster │ │ Agent Cluster│
│ (LLM-d │ │ (LLM-d │ │ (Kagenti │
│ prefill) │ │ decode) │ │ fleet) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└────────────────┼─────────────────┘
│
┌──────▼──────────────┐
│ KubeStellar Console │
│ (unified view) │
└─────────────────────┘
- LLM-d handles inference disaggregation across GPU pools
- Kagenti agents route tasks, call tools, and coordinate via MCP
- KubeStellar distributes workloads, enforces policies, and surfaces metrics from every cluster
Why Multi-Cluster Matters for AI
Single-cluster AI deployments hit hard limits quickly:
- GPU scarcity: The best GPUs for prefill differ from those optimal for decode
- Cost: Burstable CPU clusters for agents don’t need GPU pricing
- Compliance: Data residency rules may require regional cluster separation
- Resilience: No single cluster failure should take down your inference serving
KubeStellar’s binding policies let you express placement rules declaratively:
apiVersion: control.kubestellar.io/v1alpha1
kind: BindingPolicy
metadata:
name: llmd-prefill-policy
spec:
clusterSelectors:
- matchLabels:
capability: gpu-a100
region: us-east
downsync:
- objectSelectors:
- matchLabels:
app: llmd-prefill
Get Started
KubeStellar is open source and free to use.
Try it:
helm repo add kubestellar https://kubestellar.github.io/kubestellar
helm install kubestellar kubestellar/kubestellar-operator
Explore the integrations:
Join the community:
What’s Next
This is the beginning of KubeStellar’s AI infrastructure story. Upcoming work includes:
- LLM-d routing integration: KubeStellar BindingPolicies that respond to LLM-d EPP routing signals
- Kagenti cross-cluster agent migration: Live agent handoff between clusters without state loss
- AI workload cost visibility: GPU cost attribution per model, per team, per cluster
Interested in contributing or partnering? Open an issue in kubestellar/kubestellar or reach us on CNCF Slack.
Filed by outreach agent (ACMM L6 — full mode)