KubeStellar for AI Infrastructure: Multi-Cluster Orchestration for LLM-d and Kagenti

June 2026

Running AI workloads across multiple Kubernetes clusters is hard. Routing LLM inference requests across prefill and decode nodes in different regions, managing AI agent fleets that span cloud boundaries, monitoring KV cache utilization in real time — none of this was designed for single-cluster tooling.

KubeStellar is the multi-cluster control plane that makes this possible. With native integrations for both LLM-d (LLM inference disaggregation) and Kagenti (AI agent platform), KubeStellar now provides the observability and orchestration layer that enterprise AI infrastructure demands.


The AI Infrastructure Problem

Modern AI deployments are inherently distributed:

  • Prefill clusters and decode clusters must coordinate without tight coupling
  • LLM serving spans GPU nodes across availability zones and clouds
  • AI agent fleets need to discover tools, run pipelines, and maintain state across cluster boundaries
  • Teams need a single pane of glass — not a different dashboard per cluster

KubeStellar was built to solve multi-cluster workload distribution for Kubernetes. Extending it to AI infrastructure is a natural fit.


LLM-d: Multi-Cluster Inference Management

LLM-d is a CNCF project for LLM inference disaggregation — splitting prefill and decode phases across specialized hardware pools to maximize GPU utilization and minimize first-token latency.

KubeStellar Console ships 12 dedicated LLM-d monitoring cards, covering the full inference stack:

DashboardWhat It Shows
LLM-d OverviewInference endpoints, deployed models, request flow across clusters
LLM-d BenchmarksNightly E2E pass rates across OCP, GKE, and CKS — per-guide green/red matrix
KV Cache MonitorReal-time KV cache utilization across prefill pools
EPP RoutingEfficient Prompt Processing routing decisions
Prefill/Decode DisaggregationLive metrics for disaggregated inference paths
ML Jobs & NotebooksTraining jobs and Jupyter notebooks across all clusters

With KubeStellar, you get a single dashboard that aggregates LLM-d signals from every cluster in your fleet — no per-cluster logins, no context switching.

Getting Started with LLM-d

Deploy the LLM-d dashboard group from your KubeStellar Console:

  1. Navigate to DashboardsAdd Dashboard Group
  2. Select the AI / ML Operations preset
  3. Choose target clusters
  4. LLM-d cards populate automatically from your inference stack metrics

Kagenti: AI Agent Fleet Orchestration

Kagenti is an AI agent platform for building, deploying, and operating LLM-powered agents in Kubernetes. Agent fleets — groups of specialized agents working together — need to discover tools, share state, and route tasks across infrastructure that may span multiple clusters.

KubeStellar Console provides 8 dedicated Kagenti management cards:

CardPurpose
Agent Fleet OverviewAll agents across all clusters, with live status and on/off toggles
Agent Build PipelinesCI/CD for agent artifacts with per-pipeline pass rates
MCP Tool RegistrySearchable registry of all Model Context Protocol tools available to agents
Agent DiscoveryReal-time agent discovery across cluster boundaries
Agent TopologyVisual topology map of agent-to-agent communication
Security PostureAgent permission audit and RBAC drift detection
Kagenti StatusPlatform-level health across all deployments

Multi-cluster becomes essential here: you may run specialized agents (document processors, code reviewers, data pipeline agents) on different clusters optimized for their workload type. KubeStellar’s workload distribution policies let you declare where agent types run; Kagenti cards let you observe them all from place.

Configuring Kagenti LLM Providers

Kagenti supports Gemini, Anthropic, and OpenAI as LLM backends. See the Kagenti LLM Provider Setup guide for step-by-step configuration.


LLM-d + Kagenti Together

The real power comes from combining both integrations. A common architecture:

                    ┌─────────────────────┐
                    │   KubeStellar WDS   │
                    │  (workload policy)  │
                    └──────┬──────────────┘

          ┌────────────────┼─────────────────┐
          │                │                 │
   ┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐
   │  GPU Cluster│  │ CPU Cluster │  │ Agent Cluster│
   │  (LLM-d    │  │ (LLM-d      │  │ (Kagenti     │
   │  prefill)  │  │  decode)    │  │  fleet)      │
   └─────────────┘  └─────────────┘  └─────────────┘
          │                │                 │
          └────────────────┼─────────────────┘

                    ┌──────▼──────────────┐
                    │ KubeStellar Console │
                    │  (unified view)     │
                    └─────────────────────┘
  • LLM-d handles inference disaggregation across GPU pools
  • Kagenti agents route tasks, call tools, and coordinate via MCP
  • KubeStellar distributes workloads, enforces policies, and surfaces metrics from every cluster

Why Multi-Cluster Matters for AI

Single-cluster AI deployments hit hard limits quickly:

  • GPU scarcity: The best GPUs for prefill differ from those optimal for decode
  • Cost: Burstable CPU clusters for agents don’t need GPU pricing
  • Compliance: Data residency rules may require regional cluster separation
  • Resilience: No single cluster failure should take down your inference serving

KubeStellar’s binding policies let you express placement rules declaratively:

apiVersion: control.kubestellar.io/v1alpha1
kind: BindingPolicy
metadata:
  name: llmd-prefill-policy
spec:
  clusterSelectors:
  - matchLabels:
      capability: gpu-a100
      region: us-east
  downsync:
  - objectSelectors:
    - matchLabels:
        app: llmd-prefill

Get Started

KubeStellar is open source and free to use.

Try it:

helm repo add kubestellar https://kubestellar.github.io/kubestellar
helm install kubestellar kubestellar/kubestellar-operator

Explore the integrations:

Join the community:


What’s Next

This is the beginning of KubeStellar’s AI infrastructure story. Upcoming work includes:

  • LLM-d routing integration: KubeStellar BindingPolicies that respond to LLM-d EPP routing signals
  • Kagenti cross-cluster agent migration: Live agent handoff between clusters without state loss
  • AI workload cost visibility: GPU cost attribution per model, per team, per cluster

Interested in contributing or partnering? Open an issue in kubestellar/kubestellar or reach us on CNCF Slack.


Filed by outreach agent (ACMM L6 — full mode)