# aks-mcp-expert > Deploy and configure the AKS Model Context Protocol (MCP) server to Azure Kubernetes Service clusters with Workload Identity, connect MCP to VS Code GitHub Copilot and Claude Code for AI-assisted Kubernetes operations, and implement event-driven self-healing automation. Use this skill when deploying AKS-MCP servers, configuring MCP clients, troubleshooting cluster issues with AI assistance, or building automated remediation workflows. - Author: David Gardiner - Repository: davidmarkgardiner/prd-template - Version: 20260125143945 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-08 - Source: https://github.com/davidmarkgardiner/prd-template - Web: https://mule.run/skillshub/@@davidmarkgardiner/prd-template~aks-mcp-expert:20260125143945 --- --- name: aks-mcp-expert description: Deploy and configure the AKS Model Context Protocol (MCP) server to Azure Kubernetes Service clusters with Workload Identity, connect MCP to VS Code GitHub Copilot and Claude Code for AI-assisted Kubernetes operations, and implement event-driven self-healing automation. Use this skill when deploying AKS-MCP servers, configuring MCP clients, troubleshooting cluster issues with AI assistance, or building automated remediation workflows. --- # AKS-MCP Expert ## Overview Deploy the AKS Model Context Protocol (MCP) server to enable AI-assisted Kubernetes operations. The AKS-MCP server acts as a bridge between AI assistants (Claude, GitHub Copilot, Cursor) and Azure Kubernetes Service clusters, translating natural language requests into AKS operations. ## When to Use This Skill Use this skill when: - Deploying AKS-MCP server to an AKS cluster with Workload Identity - Connecting MCP to VS Code GitHub Copilot or Claude Code - Configuring MCP clients for local terminal access - Troubleshooting Kubernetes clusters using AI-assisted diagnostics - Building event-driven self-healing automation with MCP tools - Setting up multi-cluster management with AI assistance ## Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ AI Assistant │────▶│ AKS-MCP │────▶│ Azure/AKS │ │ (Claude, etc.) │◀────│ Server │◀────│ Resources │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Azure CLI │ │ kubectl │ │ helm/cilium │ └─────────────────┘ ``` **Key Point:** AKS-MCP server does NOT use any LLM itself. It is purely a tool provider - the intelligence comes from the MCP client (Claude, Copilot, etc.). ## Lessons Learned **Important deployment notes from production experience:** ### Version & Configuration 1. **Use v0.0.12+**: Earlier versions (v0.0.9) fail on workload identity auth. v0.0.12 handles auth errors gracefully. 2. **Grant Contributor role**: The managed identity needs **Contributor** role at resource group level, not just Reader/Cluster User roles. 3. **Tool API changed**: In v0.0.12, use `call_kubectl` with `args` parameter (not `command`). ### Health Probes 4. **Use TCP socket probes, not HTTP**: The MCP `/mcp` endpoint requires POST requests. HTTP GET probes fail. Use `tcpSocket` probes on port 8000 instead. ### Volume Mounts 5. **v0.0.12 runs as 'mcp' user**: Volume mounts must use `/home/mcp/.azure` and `/home/mcp/.kube`, NOT `/home/nonroot/`. This is critical for Azure CLI to work. ### Session Management 6. **Session-based API**: The MCP API requires session management. Initialize with `method: "initialize"` and use the `Mcp-Session-Id` header for subsequent calls. ### Istio & TLS 7. **Istio TLS requires SNI**: When exposing via Istio HTTPS, clients must send SNI (Server Name Indication). Direct IP access fails; use DNS or `--resolve` flag in curl. 8. **Gateway namespace for TLS**: Istio Gateway must be in the same namespace as the TLS secret (e.g., `aks-istio-ingress`) to access the certificate. 9. **Use external gateway for public HTTPS**: For Let's Encrypt certificates, use the **external** Istio gateway with a public IP. Point your domain's A record directly to this IP via your registrar. 10. **Avoid Azure DNS zone complexity**: Creating an Azure DNS zone doesn't make it authoritative - you must update NS records at your domain registrar. For simplicity, just create an A record at your registrar pointing to the gateway's external IP. ### cert-manager with Istio 11. **HTTP-01 challenge with Istio requires VirtualService**: cert-manager creates an Ingress for challenges, but Istio uses Gateway/VirtualService. Create a VirtualService to route `/.well-known/acme-challenge/` to the solver service. 12. **Don't use HTTPS redirect during cert issuance**: Gateway must allow HTTP traffic on port 80 for ACME HTTP-01 challenge to work. Add redirect later. 13. **CoreDNS custom zone for internal resolution**: cert-manager does a self-check from inside the cluster. If your DNS record is external, add a CoreDNS custom zone to resolve the hostname: ```yaml apiVersion: v1 kind: ConfigMap metadata: name: coredns-custom namespace: kube-system data: custom.server: | your-domain.com:53 { hosts { aks-mcp.your-domain.com fallthrough } } ``` ### Node Scheduling 14. **System node tolerations**: If using Karpenter or system node pools with taints, add tolerations for `CriticalAddonsOnly`. ## Core Workflows ### 1. Deploy AKS-MCP to AKS Cluster #### Prerequisites Verify the AKS cluster meets requirements: ```bash # Verify Workload Identity is enabled az aks show -g -n \ --query "oidcIssuerProfile.enabled" -o tsv # Get the OIDC issuer URL az aks show -g -n \ --query "oidcIssuerProfile.issuerUrl" -o tsv ``` #### Step 1: Create Managed Identity ```bash # Create User-Assigned Managed Identity for AKS-MCP az identity create \ --name uami-aks-mcp \ --resource-group \ --location # Get the Client ID export CLIENT_ID=$(az identity show \ --name uami-aks-mcp \ --resource-group \ --query clientId -o tsv) # Get the Principal ID for role assignments export PRINCIPAL_ID=$(az identity show \ --name uami-aks-mcp \ --resource-group \ --query principalId -o tsv) ``` #### Step 2: Assign Azure Permissions ```bash # Grant Contributor role at resource group level (required for full functionality) az role assignment create \ --assignee-object-id $PRINCIPAL_ID \ --assignee-principal-type ServicePrincipal \ --role "Contributor" \ --scope /subscriptions//resourceGroups/ # For readonly deployments, you can use more restrictive roles: # az role assignment create \ # --assignee-object-id $PRINCIPAL_ID \ # --role "Reader" \ # --scope /subscriptions/ ``` #### Step 3: Create Federated Credential ```bash # Get OIDC issuer export OIDC_ISSUER=$(az aks show -g -n \ --query "oidcIssuerProfile.issuerUrl" -o tsv) # Create federated credential az identity federated-credential create \ --name fc-aks-mcp \ --identity-name uami-aks-mcp \ --resource-group \ --issuer $OIDC_ISSUER \ --subject system:serviceaccount:aks-mcp:aks-mcp \ --audiences api://AzureADTokenExchange ``` #### Step 4: Deploy AKS-MCP Server Use the deployment script: ```bash ./scripts/deploy_aks_mcp.sh \ --client-id $CLIENT_ID \ --access-level readonly \ --domain aks-mcp.internal.example.com ``` Or apply the manifest directly after updating the CLIENT_ID: ```bash # Update the manifest with your Client ID sed -i "s/\${MANAGED_IDENTITY_CLIENT_ID}/$CLIENT_ID/g" assets/aks-mcp-deployment.yaml # Apply to cluster kubectl apply -f assets/aks-mcp-deployment.yaml ``` #### Step 5: Verify Deployment ```bash # Check pods are running kubectl get pods -n aks-mcp # Check service kubectl get svc -n aks-mcp # Test MCP endpoint (via port-forward) kubectl port-forward svc/aks-mcp 8000:8000 -n aks-mcp & curl -X POST http://localhost:8000/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}' ``` ### 2. Connect MCP to VS Code GitHub Copilot #### Option A: Port Forward (Development) ```bash # Start port forward kubectl port-forward svc/aks-mcp 8000:8000 -n aks-mcp ``` Create `.vscode/mcp.json` in your project: ```json { "servers": { "aks-mcp": { "type": "http", "url": "http://localhost:8000/mcp" } } } ``` #### Option B: Via Ingress (Production) If exposed via Istio or another ingress: ```json { "servers": { "aks-mcp": { "type": "http", "url": "https://aks-mcp.internal.yourcompany.com/mcp" } } } ``` #### Option C: Docker MCP Toolkit (Local Development) 1. Open Docker Desktop 2. Click MCP Toolkit in sidebar 3. Go to Catalog -> Search "aks" 4. Click "+" to enable Configure in Docker Desktop: | Setting | Value | |---------|-------| | `azure_dir` | `/Users//.azure` | | `kubeconfig` | `/Users//.kube/config` | | `access_level` | `readonly` | | `additional_tools` | `helm,cilium` | ### 3. Connect MCP to Claude Code Add to your `.mcp.json` or Claude Code configuration: ```json { "mcpServers": { "aks-mcp": { "type": "http", "url": "http://localhost:8000/mcp" } } } ``` Or for direct binary usage: ```json { "mcpServers": { "aks-mcp": { "type": "stdio", "command": "/path/to/aks-mcp", "args": ["--transport", "stdio", "--access-level", "readonly"] } } } ``` ### 4. Expose via Istio HTTPS (AKS Add-on) For production deployments, expose AKS-MCP via Istio with TLS termination. #### Prerequisites ```bash # Verify AKS Istio add-on is enabled kubectl get pods -n aks-istio-system # Check internal ingress gateway is deployed kubectl get svc -n aks-istio-ingress -l istio=aks-istio-ingressgateway-internal ``` #### Step 1: Enable Istio Sidecar Injection ```bash kubectl label namespace aks-mcp istio-injection=enabled --overwrite # Restart pods to inject sidecar kubectl rollout restart deployment/aks-mcp -n aks-mcp ``` #### Step 2: Create TLS Certificate ```bash # Generate self-signed certificate (for testing) openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout /tmp/aks-mcp-tls.key \ -out /tmp/aks-mcp-tls.crt \ -subj "/CN=aks-mcp.internal/O=aks-mcp" \ -addext "subjectAltName=DNS:aks-mcp.internal,DNS:*.aks-mcp.internal,IP:10.0.0.29" # Create TLS secret in ingress namespace (required for Istio gateway) kubectl create secret tls aks-mcp-tls \ --cert=/tmp/aks-mcp-tls.crt \ --key=/tmp/aks-mcp-tls.key \ -n aks-istio-ingress ``` #### Step 3: Create Gateway and VirtualService ```yaml # Gateway must be in aks-istio-ingress namespace to access TLS secret apiVersion: networking.istio.io/v1 kind: Gateway metadata: name: aks-mcp-gateway namespace: aks-istio-ingress spec: selector: istio: aks-istio-ingressgateway-internal servers: - port: number: 443 name: https protocol: HTTPS tls: mode: SIMPLE credentialName: aks-mcp-tls hosts: - "aks-mcp.internal" - port: number: 80 name: http protocol: HTTP hosts: - "aks-mcp.internal" --- # VirtualService references gateway with namespace prefix apiVersion: networking.istio.io/v1 kind: VirtualService metadata: name: aks-mcp namespace: aks-mcp spec: hosts: - "aks-mcp.internal" gateways: - aks-istio-ingress/aks-mcp-gateway http: - match: - uri: prefix: /mcp route: - destination: host: aks-mcp.aks-mcp.svc.cluster.local port: number: 8000 timeout: 600s ``` #### Step 4: Test HTTPS Access **Important:** Istio requires SNI (Server Name Indication) for TLS routing. Use `--resolve` with curl: ```bash # Get internal gateway IP export GATEWAY_IP=$(kubectl get svc -n aks-istio-ingress \ aks-istio-ingressgateway-internal -o jsonpath='{.status.loadBalancer.ingress[0].ip}') # Test with SNI (required for Istio TLS) curl -sk --resolve "aks-mcp.internal:443:$GATEWAY_IP" \ -X POST https://aks-mcp.internal/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc": "2.0", "method": "initialize", "params": {"protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {"name": "test", "version": "1.0"}}, "id": 1}' # HTTP also works curl -s -X POST http://$GATEWAY_IP/mcp \ -H "Host: aks-mcp.internal" \ -H "Content-Type: application/json" \ -d '{"jsonrpc": "2.0", "method": "initialize", "params": {}, "id": 1}' ``` **Note:** Direct IP access without SNI will fail for HTTPS. SNI is set during TLS handshake, so the `--resolve` flag or DNS resolution is required. ### 5. Expose via External Gateway with Let's Encrypt (Recommended) This is the **simplest production HTTPS setup** - uses external Istio gateway with public IP, A record at your domain registrar (e.g., GoDaddy), and HTTP-01 challenge for Let's Encrypt. #### Prerequisites ```bash # AKS cluster with: # - Istio AKS add-on enabled (external gateway) # - Workload Identity enabled # - A domain you own (e.g., davidmarkgardiner.co.uk via GoDaddy) # Verify external gateway exists kubectl get svc -n aks-istio-ingress -l istio=aks-istio-ingressgateway-external ``` #### Step 1: Get External Gateway IP ```bash # Get the external gateway's public IP export GATEWAY_IP=$(kubectl get svc -n aks-istio-ingress \ aks-istio-ingressgateway-external -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo "External Gateway IP: $GATEWAY_IP" ``` #### Step 2: Create A Record at Domain Registrar Go to your domain registrar (GoDaddy, Netlify, Name.com, etc.) and create an A record: | Type | Name | Value | TTL | |------|------|-------|-----| | A | aks-mcp | `` | 600 | This creates `aks-mcp.yourdomain.com` pointing to your gateway. **Verify DNS propagation:** ```bash dig aks-mcp.yourdomain.com +short # Should return your gateway IP ``` #### Step 3: Install cert-manager with Workload Identity ```bash # Create managed identity for cert-manager az identity create --name uami-cert-manager --resource-group --location export CERT_MANAGER_CLIENT_ID=$(az identity show --name uami-cert-manager \ --resource-group --query clientId -o tsv) # Create federated credential az identity federated-credential create \ --name fc-cert-manager \ --identity-name uami-cert-manager \ --resource-group \ --issuer $OIDC_ISSUER \ --subject system:serviceaccount:cert-manager:cert-manager \ --audiences api://AzureADTokenExchange # Install cert-manager helm repo add jetstack https://charts.jetstack.io helm repo update cat < /tmp/cert-manager-values.yaml crds: enabled: true serviceAccount: annotations: azure.workload.identity/client-id: "$CERT_MANAGER_CLIENT_ID" podLabels: azure.workload.identity/use: "true" tolerations: - key: CriticalAddonsOnly operator: Exists effect: NoSchedule EOF helm upgrade --install cert-manager jetstack/cert-manager \ --namespace cert-manager --create-namespace \ -f /tmp/cert-manager-values.yaml ``` #### Step 4: Create ClusterIssuer with HTTP-01 Challenge HTTP-01 is simpler than DNS-01 - no Azure DNS zone setup required: ```yaml apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: your-email@example.com privateKeySecretRef: name: letsencrypt-prod-account-key solvers: - http01: ingress: class: istio ``` #### Step 5: Create Gateway and Certificate ```yaml # Gateway with HTTPS (cert-manager will manage the certificate) apiVersion: networking.istio.io/v1 kind: Gateway metadata: name: aks-mcp-gateway namespace: aks-istio-ingress spec: selector: istio: aks-istio-ingressgateway-external servers: - port: number: 443 name: https protocol: HTTPS tls: mode: SIMPLE credentialName: aks-mcp-tls # cert-manager creates this hosts: - "aks-mcp.yourdomain.com" - port: number: 80 name: http protocol: HTTP hosts: - "aks-mcp.yourdomain.com" --- # Certificate request - cert-manager handles the HTTP-01 challenge apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: aks-mcp-cert namespace: aks-istio-ingress spec: secretName: aks-mcp-tls issuerRef: name: letsencrypt-prod kind: ClusterIssuer dnsNames: - "aks-mcp.yourdomain.com" --- # VirtualService apiVersion: networking.istio.io/v1 kind: VirtualService metadata: name: aks-mcp namespace: aks-mcp spec: hosts: - "aks-mcp.yourdomain.com" gateways: - aks-istio-ingress/aks-mcp-gateway http: - match: - uri: prefix: /mcp route: - destination: host: aks-mcp.aks-mcp.svc.cluster.local port: number: 8000 timeout: 600s ``` #### Step 6: Verify Certificate and Test ```bash # Check certificate status kubectl get certificate -n aks-istio-ingress aks-mcp-cert # Wait for READY=True kubectl wait --for=condition=Ready certificate/aks-mcp-cert \ -n aks-istio-ingress --timeout=300s # Test HTTPS access (no --insecure flag needed with real cert) curl -X POST https://aks-mcp.yourdomain.com/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc": "2.0", "method": "initialize", "params": {"protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {"name": "test", "version": "1.0"}}, "id": 1}' ``` **Why This Works:** - External gateway has a public IP accessible from the internet - Let's Encrypt HTTP-01 challenge can reach the gateway directly - No Azure DNS zone delegation or NS record changes needed - Simple A record at your registrar is all that's required ### 6. Local Terminal MCP Access #### Install the Binary ```bash # Download binary (macOS ARM64) curl -sL https://github.com/Azure/aks-mcp/releases/latest/download/aks-mcp-darwin-arm64 -o ~/bin/aks-mcp chmod +x ~/bin/aks-mcp # Verify installation ~/bin/aks-mcp --help ``` #### Run in HTTP Mode ```bash # Start local MCP server ~/bin/aks-mcp --transport streamable-http --port 8000 --access-level readonly ``` Then configure your MCP client to connect to `http://localhost:8000/mcp`. ## Available MCP Tools ### Cluster Management (`az_aks_operations`) | Operation | Access Level | Description | |-----------|--------------|-------------| | `show` | readonly | Show cluster details | | `list` | readonly | List clusters in subscription | | `get-versions` | readonly | Get available K8s versions | | `nodepool-list` | readonly | List node pools | | `create` | readwrite | Create cluster | | `scale` | readwrite | Scale node count | | `upgrade` | readwrite | Upgrade K8s version | ### Kubernetes Tools (`kubectl_*`) | Tool | Operations | |------|------------| | `kubectl_resources` | get, describe, create, delete, apply, patch | | `kubectl_diagnostics` | logs, events, top, exec | | `kubectl_cluster` | cluster-info, api-resources, explain | ### Monitoring (`az_monitoring`) | Operation | Description | |-----------|-------------| | `metrics` | Query Azure Monitor metrics | | `resource_health` | Get resource health events | | `control_plane_logs` | Query control plane logs | | `diagnostics` | Check diagnostic settings | ### Real-time Observability (`inspektor_gadget_observability`) | Gadget | Description | |--------|-------------| | `observe_dns` | Monitor DNS requests | | `observe_tcp` | Monitor TCP connections | | `observe_file_open` | Monitor file operations | | `top_tcp` | Top TCP connections | | `tcpdump` | Packet capture | ## Access Levels | Level | Capabilities | |-------|--------------| | `readonly` | View cluster info, logs, events, metrics - no changes | | `readwrite` | Above + scale, restart, apply, cordon, drain | | `admin` | Above + get cluster credentials | ## Self-Healing Automation ### Event-Driven Architecture To build self-healing with MCP, combine Argo Events + AKS-MCP: ```yaml # EventSource: Watch for specific events apiVersion: argoproj.io/v1alpha1 kind: EventSource metadata: name: k8s-events spec: resource: k8s-events: namespace: default group: "" version: v1 resource: events filter: afterEventTime: "now" fields: - key: reason operation: "==" value: "CrashLoopBackOff" --- # Sensor: Trigger remediation workflow apiVersion: argoproj.io/v1alpha1 kind: Sensor metadata: name: crashloop-remediation spec: dependencies: - name: crashloop eventSourceName: k8s-events eventName: k8s-events triggers: - template: name: remediate argoWorkflow: operation: submit source: resource: apiVersion: argoproj.io/v1alpha1 kind: Workflow spec: entrypoint: diagnose-and-remediate templates: - name: diagnose-and-remediate script: image: curlimages/curl:latest command: [sh] source: | # Call MCP to diagnose curl -X POST http://aks-mcp.aks-mcp:8000/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "kubectl_diagnostics", "arguments": { "operation": "logs", "resource": "pod/{{inputs.parameters.pod}}", "namespace": "{{inputs.parameters.namespace}}" } }, "id": 1 }' ``` ### Common Self-Healing Patterns 1. **CrashLoopBackOff** - Collect logs, check resource limits, restart if OOM 2. **ImagePullBackOff** - Verify image exists, check registry auth 3. **PodPending** - Check node resources, PVC binding, affinity rules 4. **NodeNotReady** - Check kubelet, network, disk pressure **Reference:** `references/self-healing-patterns.md` for detailed remediation workflows. ## Example Prompts ### Cluster Operations ``` List all my AKS clusters Show me the health of cluster What Kubernetes versions are available for upgrade? ``` ### Troubleshooting ``` Why is my node in NotReady state? Check the control plane logs for the last hour Show me events in namespace Get logs from pod ``` ### Network Analysis ``` Show me the VNet and NSG configuration Is DNS traffic being blocked? What are the route tables for this cluster? ``` ## Troubleshooting ### MCP Connection Failed ```bash # Verify pod is running kubectl get pods -n aks-mcp # Check logs kubectl logs -n aks-mcp -l app=aks-mcp # Verify Workload Identity kubectl describe pod -n aks-mcp -l app=aks-mcp | grep -A5 "azure.workload.identity" ``` ### Azure Auth Issues ```bash # Verify federated credential az identity federated-credential list \ --identity-name uami-aks-mcp \ --resource-group # Check role assignments az role assignment list \ --assignee \ --output table ``` ### Kubectl Operations Failing ```bash # Verify RBAC kubectl auth can-i get pods --as=system:serviceaccount:aks-mcp:aks-mcp # Check ClusterRoleBinding kubectl describe clusterrolebinding aks-mcp ``` ## Resources ### scripts/ - `deploy_aks_mcp.sh` - Deploy AKS-MCP server to cluster - `setup_workload_identity.sh` - Configure Workload Identity and federated credentials - `test_mcp_connection.sh` - Test MCP endpoint connectivity ### references/ - `self-healing-patterns.md` - Event-driven remediation patterns - `mcp-tools-reference.md` - Complete AKS-MCP tool documentation ### assets/ - `aks-mcp-deployment.yaml` - Production deployment manifest - `vscode-mcp-config.json` - VS Code MCP configuration template