# systematic-debug

> Use this skill for root cause analysis, debugging complex issues, and systematic troubleshooting. Triggers on: debug, root cause, investigate, troubleshoot, why is this failing, not working, unexpected behavior, diagnose, track down.

- Author: Yannick De Backer
- Repository: kobozo/crispy-doodle
- Version: 20260206073411
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/kobozo/crispy-doodle
- Web: https://mule.run/skillshub/@@kobozo/crispy-doodle~systematic-debug:20260206073411

---

---
name: systematic-debug
description: >-
  Use this skill for root cause analysis, debugging complex issues, and systematic troubleshooting.
  Triggers on: debug, root cause, investigate, troubleshoot, why is this failing,
  not working, unexpected behavior, diagnose, track down.
context: fork
---

# Systematic Debugging Methodology

A 4-phase approach to finding and fixing bugs. Based on Superpowers' systematic debugging pattern.

## The 4 Phases

```
1. REPRODUCE -> 2. GATHER CONTEXT -> 3. HYPOTHESIZE -> 4. TEST & VERIFY
```

---

## Phase 1: Reproduce the Issue

**Goal**: Reliably reproduce the bug before attempting to fix it.

### Checklist
- [ ] Can you reproduce the issue consistently?
- [ ] What are the exact steps to trigger it?
- [ ] What is the expected vs actual behavior?
- [ ] Does it happen in all environments (local, staging, prod)?
- [ ] Is it intermittent or consistent?

### Questions to Ask
- When did this start happening?
- What changed recently? (deploys, config, data)
- Does it affect all users or specific ones?
- Are there specific inputs that trigger it?

### Document the Reproduction
```markdown
## Bug: [Brief description]

**Steps to reproduce:**
1.
2.
3.

**Expected:**
**Actual:**
**Environment:**
**Frequency:** Always / Sometimes / Rare
```

---

## Phase 2: Gather Context

**Goal**: Collect all relevant information before forming hypotheses.

### Sources to Check

| Source | What to Look For |
|--------|------------------|
| Logs | Error messages, stack traces, timestamps |
| Metrics | Spikes in errors, latency, CPU, memory |
| Database | Recent data changes, constraint violations |
| Git history | Recent commits in affected area |
| Dependencies | Version updates, deprecations |
| Config | Environment variables, feature flags |

### Commands

```bash
# Search for related errors in logs
grep -r "ErrorName" logs/ | head -50

# Find recent changes to affected files
git log --oneline -20 -- path/to/file.ts

# Check what changed between working and broken
git diff v1.2.3..v1.2.4 -- src/

# Find all occurrences of a function
grep -r "functionName" src/
```

### Create a Timeline
```markdown
## Timeline

- **T-2 days**: Last known working state
- **T-1 day**: Deploy #1234 (feature X)
- **T-4 hours**: First error reported
- **T-0**: Bug confirmed and reproduced
```

---

## Phase 3: Form Hypotheses

**Goal**: Generate multiple possible causes before jumping to solutions.

### Hypothesis Template
```markdown
## Hypothesis 1: [Brief description]

**Why it might be true:**
-
-

**How to test:**
-

**Likelihood:** High / Medium / Low
```

### Common Categories

1. **Code Logic Errors**
   - Off-by-one errors
   - Null/undefined handling
   - Type coercion issues
   - Race conditions

2. **Data Issues**
   - Invalid data in database
   - Schema mismatches
   - Missing migrations

3. **Configuration**
   - Wrong environment variable
   - Missing secrets
   - Feature flag state

4. **Integration Issues**
   - API contract changes
   - Timeout misconfiguration
   - Network issues

5. **Resource Issues**
   - Memory leaks
   - Connection pool exhaustion
   - Disk space

### Rank Hypotheses
Order by:
1. Likelihood (based on evidence)
2. Ease of testing
3. Impact if true

---

## Phase 4: Test and Verify

**Goal**: Systematically test hypotheses until root cause is found.

### Testing Approach

```markdown
## Testing Hypothesis 1

**Test:** [What you'll do]
**Expected if true:** [What you'd see]
**Actual result:** [What happened]
**Conclusion:** Confirmed / Ruled out / Need more data
```

### Verification Checklist

Before declaring fixed:
- [ ] Original reproduction steps no longer trigger the bug
- [ ] Unit tests pass
- [ ] No new regressions introduced
- [ ] Edge cases considered
- [ ] Fix deployed and verified in staging
- [ ] Monitoring shows issue resolved

### Post-Mortem Questions

After fixing:
- Why wasn't this caught earlier?
- Can we add tests to prevent regression?
- Are there similar issues elsewhere?
- Should we add monitoring/alerting?

---

## Quick Reference

### When Stuck

1. **Rubber duck**: Explain the problem out loud
2. **Reduce scope**: Simplify to minimal reproduction
3. **Binary search**: Bisect commits or code
4. **Fresh eyes**: Take a break or ask for help
5. **Question assumptions**: What are you taking for granted?

### Git Bisect for Finding Breaking Commit

```bash
git bisect start
git bisect bad HEAD
git bisect good v1.2.0  # Known working version

# Test each commit, mark as good or bad
git bisect good  # or: git bisect bad

# When found:
git bisect reset
```

### Debug Logging Pattern

```typescript
// Temporary debug logging
console.log('[DEBUG] function entry', { arg1, arg2 });
console.log('[DEBUG] before operation', { state });
// ... operation ...
console.log('[DEBUG] after operation', { result });
```

```python
# Temporary debug logging
import logging
logging.debug(f"[DEBUG] function entry: {arg1=}, {arg2=}")
```

---

## Example Walkthrough

```markdown
## Bug: Users getting 500 error on profile save

### Phase 1: Reproduce
- Steps: Login -> Settings -> Change name -> Save
- Happens 100% for user ID "abc-123"
- Works for other users

### Phase 2: Context
- Error log: "TypeError: Cannot read property 'id' of null"
- Started after deploy #456
- Only affects users created before migration

### Phase 3: Hypotheses
1. **Migration didn't backfill old users** (High likelihood)
2. API response format changed (Medium)
3. Frontend sending wrong payload (Low)

### Phase 4: Test
- Query: SELECT * FROM users WHERE profile IS NULL
- Found 47 users with null profile
- Root cause: Migration added profile column but didn't backfill

### Fix
- Backfill migration for existing users
- Add null check in API as safety
- Add test for this scenario
```