# testing-strategy > Comprehensive guide for implementing AIDB tests following E2E-first philosophy, - Author: jefflester - Repository: ai-debugger-inc/aidb - Version: 20260113030541 - Stars: 11 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/ai-debugger-inc/aidb - Web: https://mule.run/skillshub/@@ai-debugger-inc/aidb~testing-strategy:20260113030541 --- --- name: testing-strategy description: Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards version: 1.0.0 tags: - testing - e2e - integration - unit - mcp - framework - debugging --- # AIDB Testing Strategy **Priority:** E2E → Integration → Unit (Highest ROI First) ______________________________________________________________________ ## CRITICAL: Test Execution Command **All tests MUST be run via `./dev-cli test run`:** ```bash ./dev-cli test run -s {suite} [-k 'pattern'] [-l {lang}] ``` - Multiple `-k` and `-l` flags supported - **NEVER use `--local`** - suites know their natural execution environment; forcing local causes unexpected behavior - Direct `pytest` invocation is NOT supported ______________________________________________________________________ This skill guides you through creating and modifying tests for the AIDB project. The test infrastructure is complete - your job is to implement tests using proven patterns. ## Related Skills When implementing tests, you may also need: - **adapter-development** - Tests validate adapter behavior across languages - **mcp-tools-development** - MCP tools tested with DebugInterface abstraction - **dap-protocol-guide** - Tests exercise DAP protocol flows end-to-end - **ci-cd-workflows** - For testing CI/CD workflows themselves (not application code) ## Core Philosophy **For complete test architecture**, see test infrastructure in `src/tests/`. ### 1. E2E First, Unit Last **Why E2E First?** - Validates real user workflows - Exercises the full stack (catches integration bugs early) - Most bugs discovered when testing actual components - Higher ROI than unit tests initially **Testing Priority:** 1. **E2E Tests** - Complete workflows, real programs, full integration 1. **Integration Tests** - Component interactions, adapter lifecycle, state management 1. **Unit Tests** - Edge cases, specific logic, error handling ### 2. Framework Tests: Keep It Simple **Goal:** Verify we can launch and debug programs using various frameworks **Pattern:** 1. Connect to framework app (Django, Express, Spring Boot, etc.) 1. Quick smoke test: set breakpoint → inspect → step 1. That's it. Don't test framework internals. **We're testing that AIDB works WITH frameworks, not testing the frameworks themselves.** ### 3. MCP Responses: Structure + Content + Efficiency **Don't just validate structure** - validate content accuracy and payload efficiency. **Bad test:** ```python assert "locals" in response["data"] # Structure only ``` **Good test:** ```python # Structure assert "locals" in response["data"] # Content accuracy assert response["data"]["locals"]["x"]["value"] == 10 assert response["data"]["locals"]["x"]["line"] == 5 # Efficiency (no junk) assert len(response["data"]) <= 3 # No bloated payloads assert len(response["summary"]) <= 200 # Concise summaries ``` ### 4. VS Launch: Critical for Agent Workflows **Why critical:** - Primary entry point for agents using framework debugging - Enables use of existing workspace launch configs - Complex variable substitution must work (`${workspaceFolder}`, etc.) **Test thoroughly:** - Core launch.json parsing (language-independent) - Per-language config translation (Python/JavaScript/Java) - Variable substitution and resolution - Framework-specific launch configs **Critical Breakpoint Timing:** Set breakpoints when STARTING sessions (not after) to avoid race conditions with fast-executing programs: ```python # ✅ CORRECT await debug_interface.start_session(program=prog, breakpoints=[{"line": 10}]) # ❌ WRONG: Race condition await debug_interface.start_session(program=prog) await debug_interface.set_breakpoint(file=prog, line=10) # May be too late! ``` Exception: Long-running processes (servers) where you attach. Reference: `src/tests/aidb_shared/e2e/test_complex_workflows.py` ## DebugInterface Abstraction The cornerstone of our test strategy is the **DebugInterface abstraction** - a unified API that works with both MCP tools and the direct API. **For implementation details**, see: - [DebugInterface](resources/debug-interface.md) - Skill resource file - `src/tests/_helpers/debug_interface/` - Debug interface source and docstrings **Why?** One test validates both entry points. **Hypothetical Example** (illustrates pattern, not a real test file): ```python from tests._helpers.parametrization import parametrize_interfaces class TestBreakpoints(BaseE2ETest): @parametrize_interfaces # Runs twice: MCP and API @pytest.mark.asyncio async def test_set_breakpoint(self, debug_interface, simple_program): """Test works with BOTH MCP and API.""" await debug_interface.start_session(program=simple_program) bp = await debug_interface.set_breakpoint( file=simple_program, line=5 ) self.verify_bp.verify_breakpoint_verified(bp) await debug_interface.stop_session() ``` **Key Points:** - `@parametrize_interfaces` runs test with both MCP and API - Same test logic validates both entry points - No duplication, no drift - See [DebugInterface](resources/debug-interface.md) for details ## The Shared Suite: Testing Debug Fundamentals The **shared suite** is AIDB's language-agnostic test foundation that validates core debugging capabilities across all supported languages using normalized, programmatically generated test programs. **Key Innovation:** Semantic markers that map identical logic to language-specific line numbers. **Location:** `src/tests/aidb_shared/` (integration/ + e2e/) **What it tests:** - Debug primitives (breakpoints, stepping, variables) - Control flow across all 3 languages (Python, JavaScript, Java) - Zero duplication: One test → 6 execution paths (2 interfaces × 3 languages) **For complete details**, see [DebugInterface](resources/debug-interface.md). ### When to Use the Shared Suite **Use shared suite when:** - Testing core debug operations (breakpoint, step, inspect) - Validating adapter behavior across languages - Ensuring language parity (all adapters work identically) **Use framework tests when:** - Testing framework-specific debugging (Django ORM, Express middleware) - Validating launch.json configurations - Testing real-world application patterns ## Test Organization ``` src/tests/ ├── aidb_shared/ # ⭐ Shared suite: language-agnostic debug fundamentals │ ├── integration/ # Core debug operations (breakpoints, stepping, variables) │ └── e2e/ # Complex workflows, parallel sessions ├── aidb/ # Core API tests - organized by component │ ├── adapters/ # Adapter-specific tests │ ├── audit/ # Audit logging tests │ ├── common/ # Common utilities tests │ ├── dap/ # DAP client tests │ ├── models/ # Model tests │ ├── resources/ # Resource management tests │ ├── service/ # Service layer tests │ └── session/ # Session management tests ├── aidb_mcp/ # MCP server tests - organized by component ├── frameworks/ # Framework integration tests │ ├── python/ # Flask, FastAPI, pytest │ ├── javascript/ # Express, Jest │ └── java/ # Spring Boot, JUnit ├── _helpers/ # Test helpers and utilities ├── _fixtures/ # Shared fixtures │ └── unit/ # ⭐ Unit test infrastructure (see below) └── _assets/ # Test programs and data ├── framework_apps/ # Framework test applications └── test_programs/ # Generated programs for shared suite ``` ### Unit Test Infrastructure The centralized unit test infrastructure at `src/tests/_fixtures/unit/` provides reusable mocks, builders, and fixtures: ``` _fixtures/unit/ ├── builders/ # DAPRequestBuilder, DAPResponseBuilder, DAPEventBuilder ├── dap/ # Transport, events, receiver mocks ├── session/ # Registry, lifecycle, state, child_manager mocks ├── adapter/ # Port, process, launch_orchestrator mocks ├── mcp/ # DebugService, MCPSessionContext mocks ├── conftest.py # Master fixture re-exports ├── context.py # mock_ctx, null_ctx, tmp_storage └── assertions.py # UnitAssertions class ``` **Usage Pattern:** ```python # In domain conftest.py (e.g., src/tests/aidb/dap/unit/conftest.py) from tests._fixtures.unit.conftest import * # noqa: F401, F403 from tests._fixtures.unit.builders import DAPEventBuilder, DAPResponseBuilder # In test file def test_something(mock_ctx, mock_transport): event = DAPEventBuilder.stopped_event(reason="breakpoint") # ... ``` **Key Components:** - **Builders** - Fluent API for DAP protocol objects (requests, responses, events) - **mock_ctx** - Standard logging context mock with debug/info/warning/error methods - **UnitAssertions** - DAP-specific assertion helpers ### Test Execution Modes Test suites run in different environments based on their requirements: **Local-Only Suites** (no Docker): - `cli` - CLI command tests - `mcp` - MCP server unit/integration tests - `core` - Core AIDB API tests - `common` - Common utilities tests - `logging` - Logging framework tests - `ci_cd` - CI/CD workflow tests **Docker Suites** (require containers): - `shared` - Multi-language shared tests (parallel language containers) - `frameworks` - Framework integration tests (parallel language containers) - `launch` - Launch config tests (parallel language containers) **Why the split?** - **Local suites** test Python-only logic (handlers, validation, utils) - **Docker suites** test multi-language scenarios - Multi-language MCP functionality tested in `shared`/`frameworks`/`launch` **Running tests:** ```bash ./dev-cli test run -s mcp # Local execution ./dev-cli test run -s shared # Docker execution ``` ## Code Reuse: Don't Reinvent **Always use existing infrastructure:** - **Test Base Classes** - `BaseE2ETest`, `BaseIntegrationTest`, `FrameworkDebugTestBase` - **Parametrization Decorators** - `@parametrize_interfaces`, `@parametrize_languages` - **Helper Assertions** - `self.verify_bp`, `self.verify_exec`, `MCPAssertions` - **Constants** - `StopReason`, `TestTimeouts`, `MCPTool` **For complete details**, see [E2E Patterns](resources/e2e-patterns.md). ## Working Examples **Study these real tests before writing new ones:** ### Framework Tests (E2E) - **Python:** `test_flask_debugging.py`, `test_fastapi_debugging.py`, `test_pytest_debugging.py` - **JavaScript:** `test_express_debugging.py`, `test_jest_debugging.py` - **Java:** `test_springboot_debugging.py`, `test_junit_debugging.py` ### Core API Tests - **Launch Variable Resolution:** `test_launch_variable_resolution.py` - **Session Target Handling:** `test_session_target_handling.py` **For complete file paths and patterns**, see [E2E Patterns](resources/e2e-patterns.md). ## Common Patterns **For hypothetical examples illustrating common patterns**, see [E2E Patterns](resources/e2e-patterns.md). **Key patterns covered:** 1. Basic E2E Test 1. Breakpoint Test with Markers 1. Dual-Launch Equivalence Test 1. MCP Response Validation ## When Creating New Tests ### Step 1: Choose Test Type - **E2E?** Full workflow, real program, complete integration - **Integration?** Component interactions, lifecycle management - **Unit?** Specific function, edge case, error handling ### Step 2: Find Similar Test Look at [E2E Patterns](resources/e2e-patterns.md): - Django/Express for framework tests - Existing tests in the same directory - Similar test scenarios in other languages ### Step 3: Copy Pattern, Adapt Don't start from scratch: 1. Copy a working test 1. Adapt to your scenario 1. Use same helpers and assertions 1. Follow same structure ### Step 4: Use Existing Infrastructure **Don't create:** - New assertion helpers (use existing) - New fixtures (check `conftest.py` files first) - New constants (use `constants.py`) - New base classes (inherit from existing) **Do create:** - Tests using existing patterns - Scenario-specific test data - Framework-specific fixtures (if needed) ## Performance Testing **Current State:** No performance baselines exist yet **Phase 1:** Establish baselines - Analyze existing metrics/instrumentation - Determine "healthy" latencies - Document target times **Phase 2:** Regression testing - Monitor key operations (breakpoint set, variable inspect, step) - Alert on degradation **For now:** Focus on functional correctness, not performance. ## Success Criteria ### Test Quality Checklist - [ ] Test uses `@parametrize_interfaces` for MCP/API coverage - [ ] Test inherits from appropriate base class - [ ] Test uses helper assertions, not custom assertions - [ ] Test validates content accuracy, not just structure - [ ] MCP tests check efficiency (no bloated payloads) - [ ] Test has clear docstring explaining what it validates - [ ] Test follows working examples - [ ] Test is in correct directory (e2e/integration/unit) ### Framework Test Checklist - [ ] Inherits from `FrameworkDebugTestBase` - [ ] Implements `test_launch_via_api()` - [ ] Implements `test_launch_via_vscode_config()` - [ ] Implements `test_dual_launch_equivalence()` - [ ] Sets `framework_name` attribute - [ ] Uses simple smoke tests (no deep framework testing) ## Investigating Test Failures **CRITICAL:** When tests fail, check logs BEFORE attempting fixes. **See:** **[Debugging Failures](resources/debugging-failures.md)** for log locations, investigation workflow, and common patterns. **For CI test failures**, use the **ci-cd-workflows** skill's troubleshooting guide. ## Resources | Resource | Content | | ----------------------------------------------------- | ---------------------------------------------------- | | [E2E Patterns](resources/e2e-patterns.md) | Test patterns, markers, code reuse, working examples | | [Framework Tests](resources/framework-tests.md) | Dual-launch pattern, Flask/Express examples | | [DebugInterface](resources/debug-interface.md) | Unified API abstraction, shared suite architecture | | [Debugging Failures](resources/debugging-failures.md) | Log locations, investigation workflow, common issues | **Test Infrastructure:** `src/tests/` (see \_fixtures/, \_helpers/ for core components) ## Getting Started 1. **Read CONTEXT.md:** `wip/test-implementation-backlog/CONTEXT.md` 1. **Study working examples:** Flask (`test_flask_debugging.py`) and Express (`test_express_debugging.py`) 1. **Choose a test to implement:** Start with E2E (highest ROI) 1. **Copy a working test:** Don't start from scratch 1. **Adapt to your scenario:** Use same patterns, different data 1. **Validate:** Run test, ensure it passes with both MCP and API ## Questions? **Internal Documentation**: - `src/tests/` - Test infrastructure (see \_fixtures/, \_helpers/) - `docs/developer-guide/overview.md` - System architecture **Code References**: - **DAP Protocol:** See `src/aidb/dap/protocol/` (fully typed, types.py + requests.py + responses.py + events.py) - **Test Infrastructure:** See `src/tests/_helpers/` and `src/tests/_fixtures/` - **Working Examples:** See Flask/Express framework tests ______________________________________________________________________ **Remember:** - E2E first, validate content accuracy - Use shared suite for debug fundamentals, framework tests for integration - Keep framework tests simple (no framework internals) - Always use the DebugInterface abstraction (zero duplication)