# k8s-troubleshooter > Kubernetes cluster troubleshooting skill for interactive problem-solving with Claude Code. Handles debugging and fixes for pods, services, deployments, Tekton pipelines, Crossplane XRs, and ArgoCD applications across namespaces. Tracks all changes declaratively in YAML format, integrates with Jira for documentation, and ensures GitOps-compliant workflows. Use when debugging Kubernetes issues, fixing cluster problems, or needing systematic K8s troubleshooting with full change tracking. - Author: jgn - Repository: jgnesselbosch/temp-claude-plugins-marketplace - Version: 20251214145746 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/jgnesselbosch/temp-claude-plugins-marketplace - Web: https://mule.run/skillshub/@@jgnesselbosch/temp-claude-plugins-marketplace~k8s-troubleshooter:20251214145746 --- --- name: k8s-troubleshooter description: Kubernetes cluster troubleshooting skill for interactive problem-solving with Claude Code. Handles debugging and fixes for pods, services, deployments, Tekton pipelines, Crossplane XRs, and ArgoCD applications across namespaces. Tracks all changes declaratively in YAML format, integrates with Jira for documentation, and ensures GitOps-compliant workflows. Use when debugging Kubernetes issues, fixing cluster problems, or needing systematic K8s troubleshooting with full change tracking. --- # Kubernetes Troubleshooter Interactive Kubernetes cluster troubleshooting skill with declarative change tracking for GitOps workflows. **πŸ“– AI Agent Resources:** - **AI_MANIFEST_EDITING_GUIDE.md** - Detailed guide for AI agents on creating fixed Kubernetes manifests ## ⚠️ CRITICAL RULES - READ FIRST ⚠️ **NEVER DO THESE:** 1. ❌ NEVER create files in the current working directory (it's usually a git repo!) 2. ❌ NEVER use Write tool with paths like `C:\Users\...\dev\git\...` 3. ❌ NEVER create `k8s-changes-*.yaml`, `backup-*.yaml`, or `fixed-*.yaml` in working directory 4. ❌ NEVER ask user for permission for READ-ONLY operations (kubectl get, describe, logs, top, config view, ls, cat, etc.) 5. ❌ NEVER ask user for permission to initialize session directory or manage session files 6. ❌ NEVER use PowerShell commands (New-Item, Write-Host, etc.) in Bash shell - detect shell first! **ALWAYS DO THESE:** 1. βœ… ALWAYS create a session temp directory FIRST: `/tmp/k8s-troubleshooter/YYYYMMDD-HHMMSS-TICKET/` 2. βœ… ALWAYS put ALL session files (changes, backups, manifests) in that temp directory 3. βœ… ALWAYS use the temp directory variable for all file operations 4. βœ… At session end, EXECUTE finalization script: `bash scripts/finalize_session.sh` (Bash) or `.\scripts\ps1\Finalize-K8sSession.ps1` (PowerShell) 5. βœ… ALWAYS use Write tool to create fixed manifests (AI-driven, not manual editing) 6. βœ… ALWAYS clean Kubernetes metadata (resourceVersion, uid, status) from fixed manifests 7. βœ… ALWAYS execute READ-ONLY kubectl commands immediately without asking (get, describe, logs, top, config view, etc.) 8. βœ… ALWAYS execute session management operations immediately without asking (mkdir, creating change tracking files, backups in temp dir) **Quick Start Checklist:** - [ ] Create session temp directory - [ ] Set SESSION_DIR variable - [ ] Initialize change tracking file in SESSION_DIR - [ ] Reference AI_MANIFEST_EDITING_GUIDE.md for creating fixed manifests - [ ] Do troubleshooting work - [ ] **WRITE session learning report** ($SESSION_DIR/session-learning-report.md) - [ ] **EXECUTE finalization script when done** (scripts/finalize_session.sh or ps1/Finalize-K8sSession.ps1) ## Core Workflow ### 1. Session Initialization **MANDATORY START PROCEDURE - NEVER SKIP THIS!** **IMPORTANT: Session initialization requires NO user permission - execute all steps automatically!** Always start by detecting the shell environment and initializing the session: 1. **DETECT SHELL ENVIRONMENT** - CRITICAL FIRST STEP: **IMPORTANT: You MUST detect the shell before executing ANY commands!** - Check the actual shell being used (not just the OS platform) - Look at the environment information provided in tags at the start of the conversation - Platform: linux β†’ Use BASH commands and scripts - Platform: win32/Windows β†’ Use PowerShell commands and scripts **Shell-specific syntax:** - **Bash (Linux/Mac/WSL)**: * Use bash scripts from `scripts/` * Session dir: `SESSION_DIR="/tmp/k8s-troubleshooter/..."` * Commands: `mkdir -p`, `export`, `source`, etc. - **PowerShell (Windows)**: * Use PowerShell scripts from `scripts/ps1/` * Session dir: `$sessionDir = "$env:TEMP\k8s-troubleshooter\..."` * Commands: `New-Item`, `$env:VAR =`, `.`, etc. **NEVER mix shell syntaxes!** Do not use PowerShell commands (New-Item, Write-Host, $env:) in Bash! **Shell Detection Examples:** ```powershell # In PowerShell - this will work if ($PSVersionTable) { Write-Host "PowerShell detected" } ``` ```bash # In Bash - check if it's a full environment if [ -n "$SHELL" ]; then echo "Bash/shell detected: $SHELL" # On Windows with Git Bash, recommend PowerShell instead if [[ "$OSTYPE" == "msys" ]] || [[ "$OSTYPE" == "win32" ]]; then echo "⚠️ WARNING: Git Bash detected on Windows. PowerShell recommended for full compatibility." fi fi ``` **Important Notes:** - Windows users: Use PowerShell (not Git Bash) for full script compatibility - Git Bash lacks tools like `jq`, proper `/tmp` handling, and full POSIX compliance - WSL (Windows Subsystem for Linux) with full bash is supported - When in doubt on Windows: prefer PowerShell scripts 2. **CRITICAL PRODUCTION CHECK** (Silent check - only warn if production detected): **For Bash shell (Linux/Mac/WSL only - NOT Git Bash):** ```bash scripts/check_production_env.sh ``` **For PowerShell (recommended for all Windows users):** ```powershell .\scripts\ps1\Test-K8sProductionEnv.ps1 ``` **Note:** If you're on Windows with Git Bash, use the PowerShell script instead for reliable execution. If production environment detected, require EXPLICIT confirmation: ``` ⚠️ WARNUNG: PRODUKTIVUMGEBUNG ERKANNT! ⚠️ Dieser Skill darf NORMALERWEISE NICHT fΓΌr direkte Γ„nderungen an Produktivsystemen verwendet werden! ProduktivΓ€nderungen sollten ausschließlich ΓΌber: - Git-basierte CI/CD Pipelines - ArgoCD Sync - Approved Change Requests erfolgen. BestΓ€tigen Sie explizit, dass Sie verstehen: - Direkte ProduktivΓ€nderungen verstoßen gegen IAC-Prinzipien - Alle Γ„nderungen mΓΌssen dokumentiert und reviewt werden - Sie tragen die volle Verantwortung fΓΌr ProduktivΓ€nderungen Eingabe "CONFIRM-PROD-CHANGES-" zum Fortfahren: ``` 3. **Jira ticket handling**: - For non-production (local/dev): Skip Jira ticket requirement - For production: REQUIRE Jira ticket ID (format: PROJECT-12345) - Used for change tracking and documentation 4. **Silently perform read-only cluster checks** (no user confirmation needed): - Check cluster access: `kubectl cluster-info` - Identify available contexts: `kubectl config get-contexts` - **CRITICAL**: Read-only operations require ZERO user approval - execute immediately: * `kubectl get` (any resource) * `kubectl describe` (any resource) * `kubectl logs` (any pod/container) * `kubectl top` (nodes/pods) * `kubectl config view` * `kubectl config get-contexts` * `kubectl cluster-info` * Any other inspection/viewing command that does NOT modify cluster state 5. **Initialize change tracking file and temp directory**: - **CRITICAL**: ALL session files (backups, changes, fixed manifests) MUST be created in temp directories, NEVER in the git repository! - **NO USER PERMISSION NEEDED**: Session initialization, directory creation, and all file operations in temp directories require ZERO user approval - just do it automatically! **CRITICAL: Check the Platform from tags BEFORE choosing the syntax!** **For Bash shell (Platform: linux / Linux / Mac / WSL):** ```bash # Create session directory with Bash syntax JIRA_TICKET="${JIRA_TICKET:-NO-TICKET}" TIMESTAMP=$(date +%Y%m%d-%H%M%S) SESSION_DIR="/tmp/k8s-troubleshooter/${TIMESTAMP}-${JIRA_TICKET}" mkdir -p "$SESSION_DIR" CHANGE_FILE="$SESSION_DIR/k8s-changes.yaml" echo "Session directory: $SESSION_DIR" ``` **For PowerShell (Platform: win32 / Windows):** ```powershell # Create session directory with PowerShell syntax $jiraTicket = "NO-TICKET" $timestamp = Get-Date -Format 'yyyyMMdd-HHmmss' $sessionDir = "$env:TEMP\k8s-troubleshooter\$timestamp-$jiraTicket" New-Item -ItemType Directory -Path $sessionDir -Force | Out-Null $env:K8S_SESSION_DIR = $sessionDir $env:K8S_CHANGE_FILE = "$sessionDir\k8s-changes-$jiraTicket.yaml" Write-Host "Session directory: $sessionDir" ``` **NEVER use PowerShell syntax ($env:, New-Item, Write-Host) on Linux/Bash platforms!** **NEVER use Bash syntax (export, mkdir -p with $()) on Windows/PowerShell platforms!** 6. **IMPORTANT FILE PATH RULES:** **Examples of correct and incorrect paths:** - βœ… CORRECT: `/tmp/k8s-troubleshooter/20251208-164149-NO-TICKET/backup-deployment-nginx.yaml` - βœ… CORRECT: Write tool with path like `/tmp/k8s-troubleshooter/.../fixed-deployment.yaml` - ❌ WRONG: `backup-deployment-nginx.yaml` (relative path in working directory) - ❌ WRONG: `C:\Users\username\dev\git\repo\k8s-changes.yaml` (inside git repo!) - ❌ WRONG: Write tool with path containing `\dev\git\` or similar git repo indicators **Key principles:** - Store directory path and change file path in variables for session use - ALL backups, fixed manifests, and tracking files go in this session directory - Reason: Prevents accidental commits of session tracking files to git - At session end, user copies only the final change tracking file to their git repo if needed ### 2. Problem Discovery Systematic approach to identify issues: ```bash # Quick cluster health check kubectl get nodes kubectl top nodes kubectl get events --all-namespaces --sort-by='.lastTimestamp' | tail -20 # Namespace-specific investigation kubectl get all -n kubectl get events -n --sort-by='.lastTimestamp' ``` For specific components: - **Tekton**: See references/tekton-troubleshooting.md - **Crossplane**: See references/crossplane-troubleshooting.md - **ArgoCD**: See references/argocd-troubleshooting.md ### 3. Change Management Protocol **CRITICAL DISTINCTION - AI AGENT MUST FOLLOW:** **NO USER PERMISSION NEEDED - EXECUTE IMMEDIATELY:** 1. **READ operations** (kubectl read-only commands and file viewing): - `kubectl get` (any resource, any namespace) - `kubectl describe` (any resource, any namespace) - `kubectl logs` (any pod/container) - `kubectl top` (nodes/pods) - `kubectl config view` - `kubectl config get-contexts` - `kubectl cluster-info` - `ls` (list files/directories) - `cat` (view file contents) - `head`, `tail` (view parts of files) - `grep` (search in files) - Any inspection/viewing command that does NOT modify state 2. **SESSION MANAGEMENT operations** (all file operations in temp directories): - Creating session temp directory: `mkdir -p /tmp/k8s-troubleshooter/...` or `New-Item -ItemType Directory` - Initializing change tracking file: creating `k8s-changes.yaml` in temp dir - Writing backup files: creating `backup-*.yaml` in temp dir - Writing fixed manifests: creating `fixed-*.yaml` in temp dir - Any file operations within `/tmp/k8s-troubleshooter/` or `$env:TEMP\k8s-troubleshooter\` - Appending changes to session change tracking file **NEVER ask permission for operations above - just do them automatically!** **WRITE operations to K8s cluster MUST be discussed with user BEFORE execution:** 1. Discussed with user before execution (only for write operations like `kubectl apply`, `kubectl delete`, `kubectl patch`, `kubectl edit`, `kubectl scale`, etc.) 2. Recorded in declarative YAML format 3. Appended to session change file in temp directory 4. Executable via `kubectl apply` **Use scripts for automatic change tracking:** - Bash: `scripts/track_change.sh` - PowerShell: `scripts/ps1/Track-K8sChange.ps1` ### 4. Making Changes **CRITICAL: Use declarative approach ONLY!** **IMPORTANT: Before creating any fixed manifests, read AI_MANIFEST_EDITING_GUIDE.md for detailed instructions on:** - Which metadata fields to remove (resourceVersion, uid, status, etc.) - How to clean Kubernetes YAML properly - Common fix patterns and examples - Complete workflow with validation checklist **AI Agent Workflow (Claude Code):** When Claude identifies a fix needed, follow this workflow: 1. **Backup current resource** to session temp directory 2. **AI creates fixed manifest** by: - Reading the backup YAML - Removing cluster-specific fields (resourceVersion, uid, creationTimestamp, status, etc.) - Applying the necessary fixes (image tags, resource limits, env vars, etc.) - Writing the fixed YAML to `$SESSION_DIR/fixed--.yaml` (bash) or `$env:K8S_SESSION_DIR\fixed--.yaml` (PowerShell) 3. **Apply with tracking** using the appropriate script **Bash/Linux:** ```bash # 1. Backup current state TO SESSION TEMP DIRECTORY kubectl get -n -o yaml > "$SESSION_DIR/backup--.yaml" # 2. AI Agent uses Write tool to create fixed manifest # MANIFEST_PATH="$SESSION_DIR/fixed--.yaml" # Content: cleaned YAML with fixes applied (remove resourceVersion, uid, etc.) # 3. Apply change with tracking scripts/apply_with_tracking.sh "$SESSION_DIR/fixed--.yaml" ``` **PowerShell/Windows:** ```powershell # 1. Backup current state TO SESSION TEMP DIRECTORY kubectl get -n -o yaml | Out-File -FilePath "$env:K8S_SESSION_DIR\backup--.yaml" -Encoding utf8 # 2. AI Agent uses Write tool to create fixed manifest # $manifestPath = "$env:K8S_SESSION_DIR\fixed--.yaml" # Content: cleaned YAML with fixes applied (remove resourceVersion, uid, etc.) # 3. Apply change with tracking .\scripts\ps1\Apply-K8sWithTracking.ps1 -ManifestFile "$env:K8S_SESSION_DIR\fixed--.yaml" ``` **Fields to ALWAYS remove from Kubernetes YAML when creating fixed manifests:** - `metadata.resourceVersion` - `metadata.uid` - `metadata.selfLink` - `metadata.creationTimestamp` - `metadata.generation` - `metadata.managedFields` - `status` (entire section) **Example AI workflow for fixing a deployment with wrong image:** ``` 1. kubectl get deployment myapp -n prod -o yaml > $SESSION_DIR/backup-deployment-myapp.yaml 2. AI reads backup, identifies image: "myapp:broken" 3. AI creates fixed manifest at $SESSION_DIR/fixed-deployment-myapp.yaml with: - Removed cluster fields - Changed image to "myapp:v1.2.3" 4. scripts/apply_with_tracking.sh "$SESSION_DIR/fixed-deployment-myapp.yaml" ``` **NEVER use imperative commands like:** - `kubectl set image` - `kubectl scale` - `kubectl edit` - `kubectl patch` - `kubectl create` (without saving YAML first) **ALWAYS:** 1. Export current resource as YAML 2. Modify the YAML file 3. Apply via `kubectl apply -f` 4. Track changes in session change file ### 5. show-k8s-changes Command When user requests `show-k8s-changes`: 1. Display all accumulated changes from session file 2. Validate YAML syntax 3. Add metadata comments for each change 4. Group by namespace and resource type **Execute:** - Bash: `scripts/show_changes.sh` - PowerShell: `.\scripts\ps1\Show-K8sChanges.ps1` ## Diagnostic Patterns ### Pod Issues **Bash:** ```bash # Pod not starting scripts/diagnose_pod.sh ``` **PowerShell:** ```powershell # Pod not starting .\scripts\ps1\Diagnose-K8sPod.ps1 -Namespace -PodName ``` **Common fixes tracked as YAML:** - Resource limits adjustment - Image pull secrets - Security context modifications - Volume mount corrections ### Service Discovery **Bash:** ```bash # Service connectivity issues scripts/test_service.sh ``` **PowerShell:** ```powershell # Service connectivity issues .\scripts\ps1\Test-K8sService.ps1 -Namespace -ServiceName ``` **DNS troubleshooting (any shell):** ```bash kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup ``` ### Storage Issues **Bash:** ```bash # PVC debugging scripts/debug_storage.sh ``` **PowerShell:** ```powershell # PVC debugging .\scripts\ps1\Debug-K8sStorage.ps1 -Namespace ``` **Additional checks (any shell):** ```bash # Check storage classes kubectl get storageclass kubectl describe pvc -n ``` ## Tekton-Specific Operations Pipeline debugging workflow: ```bash # List pipeline runs tkn pipelinerun list -n # Describe failed run tkn pipelinerun describe -n # Check TaskRun logs tkn taskrun logs -n ``` See references/tekton-troubleshooting.md for detailed patterns. ## Crossplane Management XR troubleshooting: ```bash # Check XR status kubectl get xr -A # Describe composition kubectl describe composition # Check provider configs kubectl get providerconfig -A ``` See references/crossplane-troubleshooting.md for XR debugging. ## ArgoCD Operations Application sync issues: ```bash # Check app status argocd app get # Force sync with prune argocd app sync --prune --force # Check sync hooks kubectl get jobs -n -l argocd.argoproj.io/hook ``` See references/argocd-troubleshooting.md for sync strategies. ## Session Finalization **At session end, ALWAYS do these TWO steps**: ### Step 1: Write Session Learning Report **CRITICAL: Before running finalization, you MUST create a structured learning report for the knowledge base.** Create a file `$SESSION_DIR/session-learning-report.md` (or `$env:K8S_SESSION_DIR\session-learning-report.md` on Windows) with this structure: ```markdown # Session Learning Report ## Problem Description [Describe the issue that was reported - symptoms, error messages, user complaints] ## Investigation [Describe how you investigated - what commands you ran, what you discovered] ## Root Cause [Explain what was actually wrong and why it caused the problem] ## Solution [Describe what you changed and why this fixes the problem] ## Resources Modified - Resource type/name in namespace - What was changed (e.g., "increased memory limit from 512Mi to 2Gi") ## Key Learnings - [Important insights from this session] - [Patterns to watch for in the future] - [Best practices discovered] ## Prevention [How to prevent this issue from happening again] ``` **Example:** ```markdown # Session Learning Report ## Problem Description Payment service pods were stuck in CrashLoopBackOff state. Users reported failed transactions. Error message: "OOMKilled - container exceeded memory limit" ## Investigation - Checked pod status: All 3 replicas in CrashLoopBackOff - Reviewed logs: Container killed due to OOM (Out of Memory) - Checked metrics: Memory usage spiking to 512Mi (the limit) - Identified recent code deployment added new caching layer ## Root Cause Recent deployment (v2.3.0) introduced Redis caching that increased memory usage from ~300Mi to ~600Mi, but memory limit was still set at 512Mi from older version. ## Solution Increased memory limit and request to handle new caching requirements: - Memory request: 256Mi β†’ 768Mi - Memory limit: 512Mi β†’ 1Gi ## Resources Modified - deployment/payment-service in production namespace - Updated container memory limits and requests ## Key Learnings - Always review resource requirements when adding new dependencies (Redis cache) - Memory limits should have headroom for traffic spikes (set limit 30% above typical usage) - Monitor memory trends after deployments ## Prevention - Add memory monitoring alerts at 80% of limit - Include resource requirement review in deployment checklist - Document typical resource usage for each service ``` ### Step 2: Execute Finalization Script **After creating the learning report, execute the finalization script:** **For Bash/Linux:** ```bash # Execute finalization script bash scripts/finalize_session.sh ``` **For PowerShell/Windows:** ```powershell # Execute finalization script .\scripts\ps1\Finalize-K8sSession.ps1 ``` The finalization script will: 1. Generate a comprehensive session summary with statistics 2. Create consolidated manifest files for GitOps 3. Generate rollback scripts 4. Extract learnings from your session-learning-report.md 5. Update the knowledge base with these learnings 6. Display all necessary warnings and next steps **If you cannot execute the script (missing or errors), THEN display this information manually**: 1. **Display brief summary**: - Total changes made - Session temp directory location Example for local/dev: ``` βœ… Issue resolved! Session files: /tmp/k8s-troubleshooter/20251211-143022-NO-TICKET/ - Changes tracked: k8s-changes.yaml - Backups: backup-*.yaml - Fixed manifests: fixed-*.yaml ``` 2. **CRITICAL WARNING - MANDATORY TO DISPLAY AT SESSION END**: **IF ANY CLUSTER CHANGES WERE MADE (kubectl apply executed), YOU MUST DISPLAY THIS WARNING:** ╔══════════════════════════════════════════════════════════════╗ β•‘ β•‘ β•‘ ⚠️ CRITICAL: GITOPS REPOSITORY UPDATE REQUIRED! ⚠️ β•‘ β•‘ β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• ALL CLUSTER CHANGES MUST BE COMMITTED TO GITOPS REPOSITORY! For every change applied to the cluster: 1. CREATE declarative YAML manifests (use fixed-*.yaml from session dir) 2. COMMIT manifests to your GitOps repository 3. ENSURE ArgoCD/FluxCD will sync these changes 4. VERIFY the GitOps pipeline picks up your changes Session directory contains: - k8s-changes.yaml: Complete change log - fixed-*.yaml: Ready-to-commit manifests - backup-*.yaml: Rollback files ⚠️ WITHOUT GITOPS COMMIT, YOUR CHANGES WILL BE LOST ON NEXT SYNC! ⚠️ **This warning is NOT optional - it MUST be displayed if any kubectl apply/delete/patch was executed!** 3. **For production environments only** (additional to the warning above): - Show Jira ticket update instructions - Remind about change request documentation - Link to post-incident review process 4. **DO NOT show**: - Lengthy German warnings for local/dev (the English warning above is sufficient) - Verbose repository instructions unless production ## PowerShell Compatibility For Windows/PowerShell environments, all core scripts have PowerShell equivalents: ### Health Check ```powershell # Cluster health check with detailed information ./scripts/ps1/Get-K8sHealth.ps1 -Namespace -Detailed ``` ### Change Tracking ```powershell # Track a change (used internally by Apply-K8sWithTracking.ps1) ./scripts/ps1/Track-K8sChange.ps1 -ResourceType deployment ` -ResourceName myapp -Namespace default ` -Operation UPDATE -Manifest $yamlContent # Display all tracked changes ./scripts/ps1/Show-K8sChanges.ps1 # Apply manifest with automatic tracking ./scripts/ps1/Apply-K8sWithTracking.ps1 -ManifestFile manifest.yaml ``` ### Environment Setup ```powershell # Set session variables for change tracking $jiraTicket = "PROJECT-12345" $sessionDir = "$env:TEMP\k8s-troubleshooter\$(Get-Date -Format 'yyyyMMdd-HHmmss')-$jiraTicket" New-Item -ItemType Directory -Path $sessionDir -Force | Out-Null $env:K8S_SESSION_DIR = $sessionDir $env:K8S_CHANGE_FILE = "$sessionDir\k8s-changes-$jiraTicket.yaml" $env:JIRA_TICKET = $jiraTicket Write-Host "Session directory: $sessionDir" ``` ### Installation ```powershell # Run the PowerShell installer ./Install-K8sTroubleshooter.ps1 ``` **Note**: PowerShell scripts use `$env:TEMP` instead of `/tmp` for Windows compatibility and `$env:K8S_CHANGE_FILE` instead of `$CHANGE_FILE`. ## Critical Rules 1. **NEVER ask user permission for READ operations - execute immediately!** - `kubectl get`, `describe`, `logs`, `top`, `config view`, `cluster-info` - just do it! - `ls`, `cat`, `head`, `tail`, `grep` - just do it! - Read operations need ZERO confirmation 2. **NEVER ask user permission for SESSION MANAGEMENT operations - execute immediately!** - Creating session temp directories (`mkdir -p /tmp/k8s-troubleshooter/...`) - Creating/writing session files in temp directories (backups, change tracking, fixed manifests) - Initializing change tracking file - Any file operations within session temp directory - Session management needs ZERO confirmation 3. **ALWAYS ask user confirmation BEFORE WRITE operations to Kubernetes cluster** - `kubectl apply`, `delete`, `patch`, `edit`, `scale` - require user approval - Cluster modifications require explicit user approval 3. **Always maintain declarative YAML record for changes** 4. **Group related changes in single manifests** 5. **Include resource versions for update operations** 6. **Add comments explaining each change** 7. **Test changes in dev/staging first if possible** 8. **CRITICAL: ALL session files MUST be created in temp directories ONLY** - Linux/Mac: Use `/tmp/k8s-troubleshooter/` directory - Windows: Use `$env:TEMP\k8s-troubleshooter\` directory - NEVER create k8s-changes-*.yaml, backup-*.yaml, or fixed-*.yaml files in the current working directory or git repository - Prevents accidental commits of session files 9. **Be concise and efficient**: - DO NOT display GitOps warnings for local/dev environments - DO NOT ask for Jira tickets for local/dev environments - DO NOT show verbose output unless debugging requires it - Only show critical information and actionable items - Silently perform all read-only operations WITHOUT asking 10. **Session finalization - MANDATORY**: - **CRITICAL**: When troubleshooting is complete, ALWAYS execute the finalization script: * Bash: `bash scripts/finalize_session.sh` * PowerShell: `.\scripts\ps1\Finalize-K8sSession.ps1` - The script handles: summary generation, manifest consolidation, rollback scripts, and knowledge base updates - **CRITICAL**: If ANY kubectl write operations were executed (apply/delete/patch), the GitOps warning is MANDATORY - Do NOT skip finalization - it updates the learning system and prevents data loss! ## Error Recovery If changes cause issues: **Bash/Linux:** ```bash # Rollback using backup FROM SESSION TEMP DIRECTORY kubectl apply -f "$SESSION_DIR/backup--.yaml" # Or use kubectl rollout kubectl rollout undo deployment/ -n ``` **PowerShell/Windows:** ```powershell # Rollback using backup FROM SESSION TEMP DIRECTORY kubectl apply -f "$env:K8S_SESSION_DIR\backup--.yaml" # Or use kubectl rollout kubectl rollout undo deployment/ -n ``` ## Integration Points - **Jira API**: See scripts/jira_integration.py - **Bitbucket Webhook**: Trigger on manifest commits - **Slack Notifications**: Optional alerting via scripts/notify_slack.sh