# observability > Analyzes distributed systems using Prometheus (PromQL), Loki (LogQL), and Tempo (TraceQL). Constructs efficient queries for metrics, logs, and traces. Interprets results with token-efficient structured output. Use when debugging performance issues, investigating errors, analyzing latency, or correlating observability signals across metrics, logs, and traces. - Author: blueswen - Repository: blueswen/observability-with-llm - Version: 20251119194319 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/blueswen/observability-with-llm - Web: https://mule.run/skillshub/@@blueswen/observability-with-llm~observability:20251119194319 --- --- name: observability description: Analyzes distributed systems using Prometheus (PromQL), Loki (LogQL), and Tempo (TraceQL). Constructs efficient queries for metrics, logs, and traces. Interprets results with token-efficient structured output. Use when debugging performance issues, investigating errors, analyzing latency, or correlating observability signals across metrics, logs, and traces. --- # Observability Analysis Query construction and analysis for Prometheus, Loki, and Tempo. ## Core Principles Start with all available metrics then drill down to logs and traces for context. **Progressive Query Construction** - Start simple → Add filters → Add operations → Optimize - Test incrementally to validate each step - Adjust based on data characteristics **Multi-Signal Correlation** - **Metrics** → Identify anomaly (what/when/how much) - **Traces** → Map request flow (where/which services) - **Logs** → Extract details (why/error messages) - Use `trace_id`, `service.name`, timestamp for correlation **Token-Efficient Results** ``` ## Finding: [One-sentence summary] **Evidence**: [Specific values/metrics] **Impact**: [User/business effect] **Cause**: [Root issue if identified] **Action**: [Next step] ``` Target: <500 tokens for complete analysis ## Query Patterns **Common starting points** (adapt based on context): ```promql # Metrics: Error rate, latency percentiles, traffic patterns sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) histogram_quantile(0.95, sum by (le) (rate(http_duration_bucket[5m]))) sum(rate(http_requests_total[5m])) by (endpoint) ``` ```logql # Logs: Error details, slow operations {job="service"} |= "error" | json {job="service"} | json | unwrap duration_ms | duration_ms > threshold ``` ```traceql # Traces: Error traces, slow requests, request flow {status=error && service.name="service"} {duration > threshold && service.name="service"} {kind="server" && service.name="service"} ``` ## Query Construction Guidelines **Labels**: Use specific labels, avoid high cardinality aggregations **Time ranges**: Match analysis needs (5m for rate, adjust as needed) **Aggregations**: Filter first, then aggregate for efficiency ## Result Interpretation **Extract key information**: - Magnitude: Absolute values and comparisons - Trend: Direction and velocity of change - Scope: Affected components/users - Timing: When changes occurred **Quantify impact**: Convert metrics to business/user impact **Prioritize**: Focus on severity, scope, and trend ## Reference Documentation Consult references for detailed syntax, patterns, and workflows: - **references/promql.md** - PromQL functions, RED/USE methods, optimization patterns - **references/logql.md** - LogQL parsers, aggregations, pipeline optimization - **references/traceql.md** - TraceQL span filtering, structural queries, performance analysis - **references/semantic-conventions.md** - OpenTelemetry attribute standards and naming - **references/analysis-patterns.md** - Token-efficient templates, output formats, examples - **references/troubleshooting.md** - Investigation workflows, scenario-specific patterns **When to use references**: - Need specific syntax or advanced query patterns - Unfamiliar with query language features - Complex troubleshooting scenarios - Semantic convention lookups ## Behavior **DO**: - Construct queries progressively and test incrementally - Quantify findings with specific numbers and comparisons - Present insights in structured, token-efficient format - Focus on actionable, high-impact information - Lead with conclusions **DON'T**: - Over-explain investigation process or basic concepts - Include unnecessary query variations - Generate instrumentation code or alert rules - Overwhelm with excessive findings (prioritize top issues) ## Success Criteria Effective analysis provides: - Concise findings (<500 tokens for complete analysis) - Specific evidence (numbers, comparisons, trends) - Clear impact assessment - Actionable next steps - Structured presentation