# root-cause-tracing > Systematic root cause analysis for debugging errors. Use when an error appears, something breaks unexpectedly, or you need to trace a problem back to its source. Prevents panic-driven debugging. - Author: Marco - Repository: mmbianco78/signalroom - Version: 20251221203513 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/mmbianco78/signalroom - Web: https://mule.run/skillshub/@@mmbianco78/signalroom~root-cause-tracing:20251221203513 --- --- name: root-cause-tracing description: Systematic root cause analysis for debugging errors. Use when an error appears, something breaks unexpectedly, or you need to trace a problem back to its source. Prevents panic-driven debugging. --- # Root Cause Tracing ## The Core Principle **Trace backward through the call chain until you find the original trigger, then fix at the source.** Do NOT fix symptoms. Find the root cause. ## The Protocol ### 1. STOP When an error occurs: - Do not make changes yet - Do not guess at fixes - Do not deploy anything ### 2. READ Read the error carefully: - What is the actual error message? - Where does it occur (file, line, function)? - What is the stack trace showing? ### 3. TRACE BACKWARD Ask these questions in order: ``` 1. What function threw the error? 2. What called that function? 3. What provided the bad input? 4. What changed since this last worked? ``` **The answer is usually in step 4.** ### 4. CHECK THE DIFF ```bash # What changed recently? git diff # What changed in a specific file? git diff path/to/file.py # What changed since last known good state? git log --oneline -10 git diff ``` ### 5. ISOLATE Once you identify the likely cause: - Make ONE change to test your hypothesis - Test locally before any deployment - Verify the fix addresses the root cause, not just the symptom ## Decision Flow ``` Error Appears │ ▼ Can you trace backward through the stack? │ ├── YES → Trace to original trigger │ │ │ ▼ │ Fix at the source │ └── NO (dead end) → Fix at symptom point │ ▼ Add defense-in-depth ``` ## Common Root Cause Patterns ### Configuration Changes **Symptom**: "Connection refused" / "Authentication failed" **Trace**: Check git diff for config file changes **Example**: Removing `env_file=".env"` from pydantic settings breaks all credential loading ### Import Order / Dependencies **Symptom**: "Module not found" / "Attribute error" **Trace**: Check what imports changed, circular dependencies ### Environment Differences **Symptom**: "Works locally, fails in production" **Trace**: Compare env vars, check secrets, verify ports ### Data Shape Changes **Symptom**: "KeyError" / "TypeError" **Trace**: Check if API response format changed, verify source data ## Instrumentation (When Tracing Fails) If you can't trace manually, add logging: ```python import traceback try: result = suspicious_function() except Exception as e: print(f"Error: {e}") print(f"Traceback:\n{traceback.format_exc()}") raise ``` ## The Anti-Patterns (Don't Do These) ### Panic Debugging Making rapid changes without understanding the problem. **Result**: Multiple new bugs, production spam, lost time. ### Symptom Fixing Adding try/except around errors without understanding why they occur. **Result**: Hidden bugs that resurface worse later. ### Conflation Assuming two errors are related when they're not. **Result**: "Fixing" one thing breaks another. ### Sunk Cost Fallacy "I'm so close, one more fix will do it." **Result**: 45 minutes of failed deployments. ## SignalRoom-Specific Traces ### Database Connection Errors ``` "password authentication failed" │ ▼ Check SUPABASE_DB_USER format │ ▼ Must be: postgres.{project_ref} (not just "postgres") │ ▼ Check port: 6543 for pooler, 5432 for direct ``` ### Temporal Activity Failures ``` "Activity failed: " │ ▼ Check activity logs (fly logs or make logs-worker) │ ▼ Look at activity input in Temporal UI │ ▼ Run the underlying function locally to reproduce ``` ### Pipeline Load Failures ``` "Pipeline execution failed at step=sync" │ ▼ Check _dlt_loads table for partial loads │ ▼ Check source data format (API changed?) │ ▼ Run pipeline locally with --dry-run ``` ## Remember > "What was working before, and what did I change?" The answer is always in the git diff.