# k8s-pod-crash-handler > Invoke this skill when: - Pod status is `CrashLoopBackOff`, `Error`, or `ImagePullBackOff` - User mentions "pod crashing", "deployment failing", or "container restarting" - Kubernetes deployment not reaching Ready state - Author: UBAID RAZA - Repository: UBAIDRAZA1/Final-todo-app - Version: 20260209232927 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-09 - Source: https://github.com/UBAIDRAZA1/Final-todo-app - Web: https://mule.run/skillshub/@@UBAIDRAZA1/Final-todo-app~k8s-pod-crash-handler:20260209232927 --- # Skill: Kubernetes Pod Crash Handler **Purpose**: Diagnose and resolve Kubernetes pod crashes and deployment issues for Phase IV/V. ## Trigger Conditions Invoke this skill when: - Pod status is `CrashLoopBackOff`, `Error`, or `ImagePullBackOff` - User mentions "pod crashing", "deployment failing", or "container restarting" - Kubernetes deployment not reaching Ready state ## Capabilities ### 1. Diagnostic Commands ```bash # Check pod status kubectl get pods -n # Get detailed pod info kubectl describe pod -n # Check pod logs kubectl logs -n # Check previous container logs (if restarting) kubectl logs -n --previous # Check events kubectl get events -n --sort-by='.lastTimestamp' ``` ### 2. Common Crash Causes & Solutions #### CrashLoopBackOff | Cause | Diagnosis | Solution | |-------|-----------|----------| | App error on startup | Check logs for Python/Node errors | Fix application code | | Missing env vars | Logs show "KeyError" or "undefined" | Add ConfigMap/Secret | | DB connection fail | Logs show connection errors | Check DB service/credentials | | Wrong command | Container exits immediately | Fix Dockerfile CMD/ENTRYPOINT | **Fix ConfigMap for env vars**: ```yaml apiVersion: v1 kind: ConfigMap metadata: name: todo-backend-config data: CORS_ORIGINS: "https://your-frontend.vercel.app" DEBUG: "false" --- apiVersion: v1 kind: Secret metadata: name: todo-backend-secrets type: Opaque stringData: DATABASE_URL: "postgresql+asyncpg://user:pass@host/db?sslmode=require" BETTER_AUTH_SECRET: "your-secret-key-min-32-chars" ``` **Reference in Deployment**: ```yaml spec: containers: - name: backend envFrom: - configMapRef: name: todo-backend-config - secretRef: name: todo-backend-secrets ``` #### ImagePullBackOff | Cause | Diagnosis | Solution | |-------|-----------|----------| | Wrong image name | Check `describe pod` output | Fix image name in deployment | | Private registry | "unauthorized" in events | Add imagePullSecrets | | Image doesn't exist | "not found" in events | Build and push image | **Fix for private registry**: ```bash # Create secret for Docker Hub kubectl create secret docker-registry dockerhub-secret \ --docker-server=docker.io \ --docker-username= \ --docker-password= \ --docker-email= # Add to deployment spec: imagePullSecrets: - name: dockerhub-secret ``` #### OOMKilled (Out of Memory) **Solution - Increase memory limits**: ```yaml spec: containers: - name: backend resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "512Mi" cpu: "500m" ``` ### 3. Health Check Configuration ```yaml spec: containers: - name: backend livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 3 ``` ### 4. Quick Recovery Scripts **restart-deployment.sh**: ```bash #!/bin/bash NAMESPACE=${1:-default} DEPLOYMENT=${2:-todo-backend} echo "Restarting $DEPLOYMENT in $NAMESPACE..." kubectl rollout restart deployment/$DEPLOYMENT -n $NAMESPACE kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE echo "Restart complete!" ``` **debug-pod.sh**: ```bash #!/bin/bash POD=$1 NAMESPACE=${2:-default} echo "=== Pod Status ===" kubectl get pod $POD -n $NAMESPACE echo -e "\n=== Pod Description ===" kubectl describe pod $POD -n $NAMESPACE | tail -50 echo -e "\n=== Pod Logs ===" kubectl logs $POD -n $NAMESPACE --tail=100 echo -e "\n=== Previous Logs (if any) ===" kubectl logs $POD -n $NAMESPACE --previous --tail=50 2>/dev/null || echo "No previous logs" ``` ### 5. Pod Crash Decision Tree ``` Pod Not Running ├── CrashLoopBackOff │ ├── Check logs → App error → Fix code │ ├── "KeyError"/"undefined" → Missing env → Add ConfigMap/Secret │ └── Connection error → Check service/DB │ ├── ImagePullBackOff │ ├── "unauthorized" → Add imagePullSecrets │ ├── "not found" → Build and push image │ └── Wrong name → Fix image reference │ ├── OOMKilled │ └── Increase memory limits │ ├── Pending │ ├── "Insufficient cpu" → Reduce requests or scale cluster │ └── "Unschedulable" → Check node resources │ └── Error └── Check events with kubectl get events ``` ## Checklist - [ ] Check pod status: `kubectl get pods` - [ ] Read pod logs: `kubectl logs ` - [ ] Check events: `kubectl get events` - [ ] Verify environment variables in ConfigMap/Secret - [ ] Check resource limits (memory/CPU) - [ ] Verify image exists and is pullable - [ ] Check health probe configuration - [ ] Test locally with same env vars before deploying