# observability-implementation > Implements real observability infrastructure including Prometheus, Grafana, alerting rules, and dashboards. Use after observability-architect has designed the observability plan. Configures monitoring stacks, generates alert configurations, and creates operational dashboards. - Author: InformatiK-AI - Repository: InformatiK-AI/informatik-ai-framework - Version: 20260125174017 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/InformatiK-AI/informatik-ai-framework - Web: https://mule.run/skillshub/@@InformatiK-AI/informatik-ai-framework~observability-implementation:20260125174017 --- --- name: observability-implementation description: Implements real observability infrastructure including Prometheus, Grafana, alerting rules, and dashboards. Use after observability-architect has designed the observability plan. Configures monitoring stacks, generates alert configurations, and creates operational dashboards. --- # Observability Implementation Complete toolkit for implementing observability infrastructure with modern monitoring stacks. ## Quick Start ### Main Capabilities This skill provides three core capabilities through automated scripts: ```bash # Script 1: Monitoring Setup python scripts/monitoring_setup.py --stack prometheus --output ./monitoring # Script 2: Alerting Configurator python scripts/alerting_configurator.py --input observability_plan.md --output ./alerts # Script 3: Dashboard Generator python scripts/dashboard_generator.py --type api --output ./dashboards ``` ## Purpose This skill bridges the gap between **observability design** (done by `observability-architect` agent) and **practical implementation**. It transforms conceptual observability plans into working infrastructure: - Docker Compose configurations for monitoring stacks - Prometheus configuration files - AlertManager rules - Grafana dashboard JSON files ## When to Use 1. **After `observability-architect`** has created an observability plan 2. When setting up monitoring infrastructure for a new project 3. When migrating from one monitoring stack to another 4. When standardizing alerting across services ## Core Capabilities ### 1. Monitoring Setup Automated tool for configuring monitoring infrastructure. **Features:** - Docker Compose generation for Prometheus + Grafana - Prometheus configuration with scrape targets - Exporter configuration (node, postgres, redis, etc.) - Multi-stack support (Prometheus, DataDog, New Relic) **Usage:** ```bash python scripts/monitoring_setup.py --stack prometheus --output ./monitoring [--services web,api,db] ``` ### 2. Alerting Configurator Generates alerting rules from observability plans. **Features:** - Reads observability plans from `observability-architect` - Generates Prometheus/AlertManager rules - Configures notification channels (Slack, PagerDuty, email) - Creates SLO/SLI-based alerts **Usage:** ```bash python scripts/alerting_configurator.py --input observability_plan.md --output ./alerts [--severity critical,warning] ``` ### 3. Dashboard Generator Creates Grafana dashboards for various service types. **Features:** - Pre-built templates for API, Database, Frontend, Infrastructure - RED metrics (Rate, Errors, Duration) for APIs - USE metrics (Utilization, Saturation, Errors) for infrastructure - Business metrics dashboards **Usage:** ```bash python scripts/dashboard_generator.py --type api --output ./dashboards [--datasource prometheus] ``` ## Reference Documentation ### Monitoring Stack Guide Comprehensive guide available in `references/monitoring_stack_guide.md`: - Stack comparison (Prometheus vs DataDog vs New Relic) - Prometheus + Grafana architecture - Exporter configuration - Retention and storage strategies - High Availability setup ### Alerting Best Practices Complete alerting guide in `references/alerting_best_practices.md`: - Alert pyramid (Critical > Warning > Info) - SLOs and Error Budgets - Runbook linking - Alert fatigue prevention - On-call best practices ### Dashboard Patterns Technical reference in `references/dashboard_patterns.md`: - RED methodology for APIs - USE methodology for infrastructure - Golden Signals (latency, traffic, errors, saturation) - Dashboard hierarchy (Overview → Service → Debug) - Business metrics visualization ## Integration with Agents ### Workflow ``` 1. observability-architect (AGENT) └── Designs: observability_plan.md ├── Logging strategy ├── Metrics to collect ├── Tracing approach ├── Alerting rules (conceptual) └── Dashboard requirements 2. observability-implementation (SKILL) └── Implements: ├── monitoring_setup.py → docker-compose.yml, prometheus.yml ├── alerting_configurator.py → alert_rules.yml, alertmanager.yml └── dashboard_generator.py → *.json (Grafana dashboards) ``` ### Related Agents | Agent | Relationship | |-------|-------------| | `observability-architect` | Creates the observability plan that this skill implements | | `devops-architect` | Integrates monitoring into CI/CD pipelines | | `security-architect` | Reviews security monitoring and alerting | ## Supported Stacks ### Primary: Prometheus + Grafana ```yaml # Generated docker-compose.yml structure services: prometheus: image: prom/prometheus:latest ports: ["9090:9090"] volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml grafana: image: grafana/grafana:latest ports: ["3000:3000"] volumes: - ./dashboards:/var/lib/grafana/dashboards alertmanager: image: prom/alertmanager:latest ports: ["9093:9093"] ``` ### Alternative Stacks | Stack | Script Support | Notes | |-------|---------------|-------| | DataDog | `--stack datadog` | Generates datadog.yaml config | | New Relic | `--stack newrelic` | Generates newrelic.yml config | | CloudWatch | `--stack cloudwatch` | Generates AWS CloudWatch configs | ## Generated File Structure ``` monitoring/ ├── docker-compose.yml # Full stack deployment ├── prometheus/ │ ├── prometheus.yml # Main config │ └── rules/ │ └── alert_rules.yml # Alert rules ├── alertmanager/ │ └── alertmanager.yml # Notification routing ├── grafana/ │ ├── provisioning/ │ │ └── dashboards/ │ │ └── dashboards.yml # Dashboard provisioner │ └── dashboards/ │ ├── api-overview.json │ ├── database-metrics.json │ └── infrastructure.json └── exporters/ └── docker-compose.exporters.yml ``` ## Prerequisites - Python 3.8+ - Docker and Docker Compose (for deployment) - Access to observability plan from `observability-architect` ## Best Practices Summary ### Metrics Collection - Use appropriate scrape intervals (15s-60s) - Label metrics consistently - Avoid high cardinality labels - Use recording rules for expensive queries ### Alerting - Start with critical alerts only - Link alerts to runbooks - Use severity levels appropriately - Test alerts in staging first ### Dashboards - Follow hierarchy: Overview → Service → Debug - Use consistent color schemes - Include relevant timeframes - Add contextual annotations ## Common Commands ```bash # Setup monitoring stack python scripts/monitoring_setup.py --stack prometheus --output ./monitoring # Generate alerts from plan python scripts/alerting_configurator.py --input plan.md --output ./alerts # Create service dashboard python scripts/dashboard_generator.py --type api --output ./dashboards # Deploy stack docker-compose -f monitoring/docker-compose.yml up -d # Verify Prometheus targets curl http://localhost:9090/api/v1/targets ``` ## Troubleshooting ### Common Issues 1. **Prometheus not scraping targets** - Check firewall rules - Verify service discovery configuration - Check target endpoint health 2. **Grafana dashboards not loading** - Verify datasource configuration - Check provisioning paths - Review Grafana logs 3. **Alerts not firing** - Test query in Prometheus UI - Check AlertManager routing - Verify notification channel config ### Getting Help - Review reference documentation - Check script output messages - Consult Prometheus/Grafana documentation - Review generated configuration files ## Resources - Pattern Reference: `references/monitoring_stack_guide.md` - Alerting Guide: `references/alerting_best_practices.md` - Dashboard Patterns: `references/dashboard_patterns.md` - Tool Scripts: `scripts/` directory