# systematic-debugging > Use when encountering any bug, test failure, or unexpected behavior (including race conditions, deadlocks, concurrency issues) - four-phase framework (root cause investigation, pattern analysis, hypothesis testing, implementation) with specialized techniques for deep call stack tracing and concurrency debugging - Author: Holden - Repository: HTRamsey/claude-config - Version: 20260108135805 - Stars: 3 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/HTRamsey/claude-config - Web: https://mule.run/skillshub/@@HTRamsey/claude-config~systematic-debugging:20260108135805 --- --- name: systematic-debugging description: Use when encountering any bug, test failure, or unexpected behavior (including race conditions, deadlocks, concurrency issues) - four-phase framework (root cause investigation, pattern analysis, hypothesis testing, implementation) with specialized techniques for deep call stack tracing and concurrency debugging --- # Systematic Debugging **Persona:** Methodical diagnostician who never guesses - treats symptoms as clues, not targets. ## The Iron Law ``` NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST ``` If you haven't completed Phase 1, you cannot propose fixes. Symptom fixes are failure. ## Should NOT Attempt - Propose fixes before completing Phase 1 - Make multiple changes at once "to save time" - Copy solutions from StackOverflow without understanding - Add logging everywhere without hypothesis - "Clean up" unrelated code while debugging - Skip reproduction because "I know what happened" ## The Four Phases ### Phase 1: Root Cause Investigation **BEFORE attempting ANY fix:** 1. **Read Error Messages Carefully** - They often contain the exact solution 2. **Reproduce Consistently** - If not reproducible, gather more data 3. **Check Recent Changes** - Git diff, recent commits, new dependencies 4. **Binary Search Isolation** (when bug location unknown): ``` 1. Identify range: known-good → known-bad 2. Bisect: Add logging at midpoint 3. Narrow: Bug in first or second half? 4. Repeat until isolated ``` Use `git bisect` for regression bugs. 5. **Gather Evidence in Multi-Component Systems**: ``` For EACH component boundary: - Log what data enters - Log what data exits - Verify environment propagation ``` 6. **Trace Data Flow (Deep Call Stack)**: - Observe symptom at error point - Find immediate cause (what function?) - Trace up the call chain - Keep tracing to original trigger - Fix at source, not symptom **Adding Instrumentation:** ```typescript console.error('DEBUG:', { directory, cwd: process.cwd(), stack: new Error().stack }); ``` 7. **For Concurrency Bugs**: See [resources/references/concurrency.md](resources/references/concurrency.md) - Race conditions, deadlocks, livelocks - Shared state identification - Detection tools by language ### Phase 2: Pattern Analysis 1. **Find Working Examples** - Similar working code in same codebase 2. **Compare Against References** - Read reference implementation COMPLETELY 3. **Identify Differences** - List every difference, however small 4. **Understand Dependencies** - Components, config, environment ### Phase 3: Hypothesis and Testing 1. **Form Single Hypothesis** - "X is root cause because Y" 2. **Test Minimally** - SMALLEST possible change, one variable 3. **Verify Before Continuing** - Worked? → Phase 4. Didn't? → NEW hypothesis ### Phase 4: Implementation 1. **Create Failing Test Case** - REQUIRED (use `test-driven-development` skill) 2. **Implement Single Fix** - ONE change, no "while I'm here" improvements 3. **Verify Fix** - Test passes? No regressions? 4. **If Fix Doesn't Work** - Return to Phase 1 if < 3 attempts 5. **If 3+ Fixes Failed** - STOP. Question architecture. ## Red Flags - STOP and Return to Phase 1 - "Quick fix for now, investigate later" - "Just try changing X and see" - "It's probably X, let me fix that" - "I don't fully understand but this might work" - Proposing solutions before tracing data flow ## Quick Reference | Phase | Key Activities | Success Criteria | |-------|---------------|------------------| | **1. Root Cause** | Read errors, reproduce, check changes | Understand WHAT and WHY | | **2. Pattern** | Find working examples, compare | Identify differences | | **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis | | **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass | ## Escalation Triggers | Situation | Action | |-----------|--------| | Root cause spans multiple systems | Involve system owners | | 3+ fix attempts failed | Question architecture with user | | Race condition in 3+ locations | `orchestrator` agent for planning | | Cannot reproduce locally | Ask for exact reproduction steps | | Security vulnerability discovered | `code-reviewer` agent | **Format:** ``` BLOCKED: [description] Root cause: [what you found] Evidence: [key data points] Attempted: [what you tried] Recommendation: [path forward] ``` ## When Blocked If debugging stalls: 1. Document current hypotheses and what's been tested 2. Look for overlooked assumptions (threading, async, environment) 3. Try binary search on recent changes (git bisect) 4. Add more logging/tracing around the suspected area 5. Sleep on it - fresh eyes often see what tired ones miss 6. Ask for code review - another perspective helps ## Common Rationalizations | Excuse | Reality | |--------|---------| | "Issue is simple" | Simple issues have root causes too | | "Emergency, no time" | Systematic is FASTER than thrashing | | "Just try this first" | First fix sets the pattern | | "One more fix attempt" | 3+ failures = architectural problem | ## Integration **Required:** `test-driven-development` skill (Phase 4) **Related:** `verification-before-completion` skill, `code-reviewer` agent