# ai-ops-chatbot > Deploy an AI-powered Kubernetes operations chatbot stack on AKS. This skill sets up LiteLLM (AI gateway), AKS-MCP (kubectl/helm/az tools), MCP Proxy (tool injection), and a simple chat UI. Use this skill when deploying AI Ops infrastructure for cluster triage, debugging, and self-service diagnostics. The stack enables SREs to ask natural language questions about their cluster and get answers based on real kubectl output. - Author: David Gardiner - Repository: davidmarkgardiner/prd-template - Version: 20260125143945 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-08 - Source: https://github.com/davidmarkgardiner/prd-template - Web: https://mule.run/skillshub/@@davidmarkgardiner/prd-template~ai-ops-chatbot:20260125143945 --- --- name: ai-ops-chatbot description: Deploy an AI-powered Kubernetes operations chatbot stack on AKS. This skill sets up LiteLLM (AI gateway), AKS-MCP (kubectl/helm/az tools), MCP Proxy (tool injection), and a simple chat UI. Use this skill when deploying AI Ops infrastructure for cluster triage, debugging, and self-service diagnostics. The stack enables SREs to ask natural language questions about their cluster and get answers based on real kubectl output. --- # AI Ops Chatbot Stack Deploy a complete AI operations stack that enables SREs to chat with their Kubernetes cluster using natural language. The LLM automatically executes kubectl, helm, and az commands to investigate and diagnose cluster issues. ## Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Chatbot UI │────▶│ MCP Proxy │────▶│ LiteLLM │────▶│ Azure OpenAI │ │ (browser) │ │ (injects tools)│ │ (gateway) │ │ (gpt-4o-mini) │ └─────────────────┘ └─────────────────┘ └────────┬────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ AKS-MCP │ │ (kubectl/helm/az)│ └─────────────────┘ ``` **Flow:** 1. User types question in Chatbot UI 2. MCP Proxy injects MCP tools + system prompt into request 3. LiteLLM forwards to Azure OpenAI 4. Azure OpenAI decides which kubectl/helm/az commands to run 5. LiteLLM calls AKS-MCP to execute commands on cluster 6. Azure OpenAI analyzes output and responds with insights ## Prerequisites - AKS Cluster with Istio service mesh enabled - Azure OpenAI deployment with gpt-4o-mini model - Workload Identity configured for Azure OpenAI authentication - cert-manager installed with ClusterIssuer `letsencrypt-prod` - DNS provider access to create A records ## Deployment ### Step 1: Get Gateway IP ```bash GATEWAY_IP=$(kubectl get svc -n aks-istio-ingress -l istio=aks-istio-ingressgateway-external -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}') echo $GATEWAY_IP ``` ### Step 2: Create DNS Records Create A records pointing to `$GATEWAY_IP`: - `litellm.` - `aks-mcp.` - `mcp-proxy.` - `chatbot-ui.` ### Step 3: Run Deployment Script ```bash ./scripts/deploy-stack.sh \ --domain example.com \ --azure-openai-endpoint https://your-aoai.openai.azure.com \ --litellm-key sk-your-api-key \ --gateway-ip $GATEWAY_IP ``` ### Step 4: Verify ```bash # Check pods kubectl get pods -n aks-mcp -n litellm -n mcp-proxy -n aks-chat # Check certificate kubectl get certificate -n aks-istio-ingress ai-stack-cert # Test chat curl https://chatbot-ui./ ``` ## Component Details ### AKS-MCP Server (`aks-mcp` namespace) MCP server providing kubectl, helm, and az CLI access. Uses Workload Identity for Azure authentication. ### LiteLLM (`litellm` namespace) AI gateway routing to Azure OpenAI. Handles MCP tool execution via `mcp_servers` config. ### MCP Proxy (`mcp-proxy` namespace) Lightweight Python proxy that injects MCP tools and system prompt into every chat request. This is what makes the LLM "aware" of cluster tools. ### Chatbot UI (`aks-chat` namespace) Simple HTML/JS chat interface with conversation history. Served by nginx with API proxy to MCP Proxy. ## Customization ### System Prompt Edit the MCP Proxy ConfigMap to customize how the LLM uses cluster tools: ```bash kubectl edit configmap mcp-proxy-server -n mcp-proxy ``` ### Adding MCP Servers Add more MCP servers to LiteLLM config: ```yaml mcp_servers: aks: url: "http://aks-mcp.aks-mcp.svc.cluster.local:8000/mcp" another: url: "http://another-mcp.namespace.svc.cluster.local:8000/mcp" ``` ## Troubleshooting ### Certificate Not Ready Ensure VirtualServices route `/.well-known/acme-challenge/` to ACME solver pods. ### MCP Tools Not Working Test connectivity: `kubectl exec -n litellm deploy/litellm -- curl http://aks-mcp.aks-mcp.svc.cluster.local:8000/health` ### Chat Errors Check logs: `kubectl logs -n mcp-proxy -l app=mcp-proxy` ## Resources - `scripts/deploy-stack.sh` - Main deployment script - `assets/manifests/` - All Kubernetes manifests - `references/architecture.md` - Detailed architecture documentation