# testing-strategy

> Comprehensive guide for implementing AIDB tests following E2E-first philosophy,

- Author: jefflester
- Repository: ai-debugger-inc/aidb
- Version: 20260113030541
- Stars: 11
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/ai-debugger-inc/aidb
- Web: https://mule.run/skillshub/@@ai-debugger-inc/aidb~testing-strategy:20260113030541

---

---
name: testing-strategy
description: Comprehensive guide for implementing AIDB tests following E2E-first philosophy,
  DebugInterface abstraction, and MCP response health standards
version: 1.0.0
tags:
  - testing
  - e2e
  - integration
  - unit
  - mcp
  - framework
  - debugging
---

# AIDB Testing Strategy

**Priority:** E2E → Integration → Unit (Highest ROI First)

______________________________________________________________________

## CRITICAL: Test Execution Command

**All tests MUST be run via `./dev-cli test run`:**

```bash
./dev-cli test run -s {suite} [-k 'pattern'] [-l {lang}]
```

- Multiple `-k` and `-l` flags supported
- **NEVER use `--local`** - suites know their natural execution environment; forcing local causes unexpected behavior
- Direct `pytest` invocation is NOT supported

______________________________________________________________________

This skill guides you through creating and modifying tests for the AIDB project. The test infrastructure is complete - your job is to implement tests using proven patterns.

## Related Skills

When implementing tests, you may also need:

- **adapter-development** - Tests validate adapter behavior across languages
- **mcp-tools-development** - MCP tools tested with DebugInterface abstraction
- **dap-protocol-guide** - Tests exercise DAP protocol flows end-to-end
- **ci-cd-workflows** - For testing CI/CD workflows themselves (not application code)

## Core Philosophy

**For complete test architecture**, see test infrastructure in `src/tests/`.

### 1. E2E First, Unit Last

**Why E2E First?**

- Validates real user workflows
- Exercises the full stack (catches integration bugs early)
- Most bugs discovered when testing actual components
- Higher ROI than unit tests initially

**Testing Priority:**

1. **E2E Tests** - Complete workflows, real programs, full integration
1. **Integration Tests** - Component interactions, adapter lifecycle, state management
1. **Unit Tests** - Edge cases, specific logic, error handling

### 2. Framework Tests: Keep It Simple

**Goal:** Verify we can launch and debug programs using various frameworks

**Pattern:**

1. Connect to framework app (Django, Express, Spring Boot, etc.)
1. Quick smoke test: set breakpoint → inspect → step
1. That's it. Don't test framework internals.

**We're testing that AIDB works WITH frameworks, not testing the frameworks themselves.**

### 3. MCP Responses: Structure + Content + Efficiency

**Don't just validate structure** - validate content accuracy and payload efficiency.

**Bad test:**

```python
assert "locals" in response["data"]  # Structure only
```

**Good test:**

```python
# Structure
assert "locals" in response["data"]

# Content accuracy
assert response["data"]["locals"]["x"]["value"] == 10
assert response["data"]["locals"]["x"]["line"] == 5

# Efficiency (no junk)
assert len(response["data"]) <= 3  # No bloated payloads
assert len(response["summary"]) <= 200  # Concise summaries
```

### 4. VS Launch: Critical for Agent Workflows

**Why critical:**

- Primary entry point for agents using framework debugging
- Enables use of existing workspace launch configs
- Complex variable substitution must work (`${workspaceFolder}`, etc.)

**Test thoroughly:**

- Core launch.json parsing (language-independent)
- Per-language config translation (Python/JavaScript/Java)
- Variable substitution and resolution
- Framework-specific launch configs

**Critical Breakpoint Timing:**

Set breakpoints when STARTING sessions (not after) to avoid race conditions with fast-executing programs:

```python
# ✅ CORRECT
await debug_interface.start_session(program=prog, breakpoints=[{"line": 10}])

# ❌ WRONG: Race condition
await debug_interface.start_session(program=prog)
await debug_interface.set_breakpoint(file=prog, line=10)  # May be too late!
```

Exception: Long-running processes (servers) where you attach. Reference: `src/tests/aidb_shared/e2e/test_complex_workflows.py`

## DebugInterface Abstraction

The cornerstone of our test strategy is the **DebugInterface abstraction** - a unified API that works with both MCP tools and the direct API.

**For implementation details**, see:

- [DebugInterface](resources/debug-interface.md) - Skill resource file
- `src/tests/_helpers/debug_interface/` - Debug interface source and docstrings

**Why?** One test validates both entry points.

**Hypothetical Example** (illustrates pattern, not a real test file):

```python
from tests._helpers.parametrization import parametrize_interfaces

class TestBreakpoints(BaseE2ETest):
    @parametrize_interfaces  # Runs twice: MCP and API
    @pytest.mark.asyncio
    async def test_set_breakpoint(self, debug_interface, simple_program):
        """Test works with BOTH MCP and API."""
        await debug_interface.start_session(program=simple_program)

        bp = await debug_interface.set_breakpoint(
            file=simple_program,
            line=5
        )

        self.verify_bp.verify_breakpoint_verified(bp)
        await debug_interface.stop_session()
```

**Key Points:**

- `@parametrize_interfaces` runs test with both MCP and API
- Same test logic validates both entry points
- No duplication, no drift
- See [DebugInterface](resources/debug-interface.md) for details

## The Shared Suite: Testing Debug Fundamentals

The **shared suite** is AIDB's language-agnostic test foundation that validates core debugging capabilities across all supported languages using normalized, programmatically generated test programs.

**Key Innovation:** Semantic markers that map identical logic to language-specific line numbers.

**Location:** `src/tests/aidb_shared/` (integration/ + e2e/)

**What it tests:**

- Debug primitives (breakpoints, stepping, variables)
- Control flow across all 3 languages (Python, JavaScript, Java)
- Zero duplication: One test → 6 execution paths (2 interfaces × 3 languages)

**For complete details**, see [DebugInterface](resources/debug-interface.md).

### When to Use the Shared Suite

**Use shared suite when:**

- Testing core debug operations (breakpoint, step, inspect)
- Validating adapter behavior across languages
- Ensuring language parity (all adapters work identically)

**Use framework tests when:**

- Testing framework-specific debugging (Django ORM, Express middleware)
- Validating launch.json configurations
- Testing real-world application patterns

## Test Organization

```
src/tests/
├── aidb_shared/               # ⭐ Shared suite: language-agnostic debug fundamentals
│   ├── integration/          # Core debug operations (breakpoints, stepping, variables)
│   └── e2e/                  # Complex workflows, parallel sessions
├── aidb/                      # Core API tests - organized by component
│   ├── adapters/             # Adapter-specific tests
│   ├── audit/                # Audit logging tests
│   ├── common/               # Common utilities tests
│   ├── dap/                  # DAP client tests
│   ├── models/               # Model tests
│   ├── resources/            # Resource management tests
│   ├── service/              # Service layer tests
│   └── session/              # Session management tests
├── aidb_mcp/                  # MCP server tests - organized by component
├── frameworks/                # Framework integration tests
│   ├── python/               # Flask, FastAPI, pytest
│   ├── javascript/           # Express, Jest
│   └── java/                 # Spring Boot, JUnit
├── _helpers/                  # Test helpers and utilities
├── _fixtures/                 # Shared fixtures
│   └── unit/                 # ⭐ Unit test infrastructure (see below)
└── _assets/                   # Test programs and data
    ├── framework_apps/       # Framework test applications
    └── test_programs/        # Generated programs for shared suite
```

### Unit Test Infrastructure

The centralized unit test infrastructure at `src/tests/_fixtures/unit/` provides reusable mocks, builders, and fixtures:

```
_fixtures/unit/
├── builders/           # DAPRequestBuilder, DAPResponseBuilder, DAPEventBuilder
├── dap/               # Transport, events, receiver mocks
├── session/           # Registry, lifecycle, state, child_manager mocks
├── adapter/           # Port, process, launch_orchestrator mocks
├── mcp/               # DebugService, MCPSessionContext mocks
├── conftest.py        # Master fixture re-exports
├── context.py         # mock_ctx, null_ctx, tmp_storage
└── assertions.py      # UnitAssertions class
```

**Usage Pattern:**

```python
# In domain conftest.py (e.g., src/tests/aidb/dap/unit/conftest.py)
from tests._fixtures.unit.conftest import *  # noqa: F401, F403
from tests._fixtures.unit.builders import DAPEventBuilder, DAPResponseBuilder

# In test file
def test_something(mock_ctx, mock_transport):
    event = DAPEventBuilder.stopped_event(reason="breakpoint")
    # ...
```

**Key Components:**

- **Builders** - Fluent API for DAP protocol objects (requests, responses, events)
- **mock_ctx** - Standard logging context mock with debug/info/warning/error methods
- **UnitAssertions** - DAP-specific assertion helpers

### Test Execution Modes

Test suites run in different environments based on their requirements:

**Local-Only Suites** (no Docker):

- `cli` - CLI command tests
- `mcp` - MCP server unit/integration tests
- `core` - Core AIDB API tests
- `common` - Common utilities tests
- `logging` - Logging framework tests
- `ci_cd` - CI/CD workflow tests

**Docker Suites** (require containers):

- `shared` - Multi-language shared tests (parallel language containers)
- `frameworks` - Framework integration tests (parallel language containers)
- `launch` - Launch config tests (parallel language containers)

**Why the split?**

- **Local suites** test Python-only logic (handlers, validation, utils)
- **Docker suites** test multi-language scenarios
- Multi-language MCP functionality tested in `shared`/`frameworks`/`launch`

**Running tests:**

```bash
./dev-cli test run -s mcp      # Local execution
./dev-cli test run -s shared   # Docker execution
```

## Code Reuse: Don't Reinvent

**Always use existing infrastructure:**

- **Test Base Classes** - `BaseE2ETest`, `BaseIntegrationTest`, `FrameworkDebugTestBase`
- **Parametrization Decorators** - `@parametrize_interfaces`, `@parametrize_languages`
- **Helper Assertions** - `self.verify_bp`, `self.verify_exec`, `MCPAssertions`
- **Constants** - `StopReason`, `TestTimeouts`, `MCPTool`

**For complete details**, see [E2E Patterns](resources/e2e-patterns.md).

## Working Examples

**Study these real tests before writing new ones:**

### Framework Tests (E2E)

- **Python:** `test_flask_debugging.py`, `test_fastapi_debugging.py`, `test_pytest_debugging.py`
- **JavaScript:** `test_express_debugging.py`, `test_jest_debugging.py`
- **Java:** `test_springboot_debugging.py`, `test_junit_debugging.py`

### Core API Tests

- **Launch Variable Resolution:** `test_launch_variable_resolution.py`
- **Session Target Handling:** `test_session_target_handling.py`

**For complete file paths and patterns**, see [E2E Patterns](resources/e2e-patterns.md).

## Common Patterns

**For hypothetical examples illustrating common patterns**, see [E2E Patterns](resources/e2e-patterns.md).

**Key patterns covered:**

1. Basic E2E Test
1. Breakpoint Test with Markers
1. Dual-Launch Equivalence Test
1. MCP Response Validation

## When Creating New Tests

### Step 1: Choose Test Type

- **E2E?** Full workflow, real program, complete integration
- **Integration?** Component interactions, lifecycle management
- **Unit?** Specific function, edge case, error handling

### Step 2: Find Similar Test

Look at [E2E Patterns](resources/e2e-patterns.md):

- Django/Express for framework tests
- Existing tests in the same directory
- Similar test scenarios in other languages

### Step 3: Copy Pattern, Adapt

Don't start from scratch:

1. Copy a working test
1. Adapt to your scenario
1. Use same helpers and assertions
1. Follow same structure

### Step 4: Use Existing Infrastructure

**Don't create:**

- New assertion helpers (use existing)
- New fixtures (check `conftest.py` files first)
- New constants (use `constants.py`)
- New base classes (inherit from existing)

**Do create:**

- Tests using existing patterns
- Scenario-specific test data
- Framework-specific fixtures (if needed)

## Performance Testing

**Current State:** No performance baselines exist yet

**Phase 1:** Establish baselines

- Analyze existing metrics/instrumentation
- Determine "healthy" latencies
- Document target times

**Phase 2:** Regression testing

- Monitor key operations (breakpoint set, variable inspect, step)
- Alert on degradation

**For now:** Focus on functional correctness, not performance.

## Success Criteria

### Test Quality Checklist

- [ ] Test uses `@parametrize_interfaces` for MCP/API coverage
- [ ] Test inherits from appropriate base class
- [ ] Test uses helper assertions, not custom assertions
- [ ] Test validates content accuracy, not just structure
- [ ] MCP tests check efficiency (no bloated payloads)
- [ ] Test has clear docstring explaining what it validates
- [ ] Test follows working examples
- [ ] Test is in correct directory (e2e/integration/unit)

### Framework Test Checklist

- [ ] Inherits from `FrameworkDebugTestBase`
- [ ] Implements `test_launch_via_api()`
- [ ] Implements `test_launch_via_vscode_config()`
- [ ] Implements `test_dual_launch_equivalence()`
- [ ] Sets `framework_name` attribute
- [ ] Uses simple smoke tests (no deep framework testing)

## Investigating Test Failures

**CRITICAL:** When tests fail, check logs BEFORE attempting fixes.

**See:** **[Debugging Failures](resources/debugging-failures.md)** for log locations, investigation workflow, and common patterns.

**For CI test failures**, use the **ci-cd-workflows** skill's troubleshooting guide.

## Resources

| Resource                                              | Content                                              |
| ----------------------------------------------------- | ---------------------------------------------------- |
| [E2E Patterns](resources/e2e-patterns.md)             | Test patterns, markers, code reuse, working examples |
| [Framework Tests](resources/framework-tests.md)       | Dual-launch pattern, Flask/Express examples          |
| [DebugInterface](resources/debug-interface.md)        | Unified API abstraction, shared suite architecture   |
| [Debugging Failures](resources/debugging-failures.md) | Log locations, investigation workflow, common issues |

**Test Infrastructure:** `src/tests/` (see \_fixtures/, \_helpers/ for core components)

## Getting Started

1. **Read CONTEXT.md:** `wip/test-implementation-backlog/CONTEXT.md`
1. **Study working examples:** Flask (`test_flask_debugging.py`) and Express (`test_express_debugging.py`)
1. **Choose a test to implement:** Start with E2E (highest ROI)
1. **Copy a working test:** Don't start from scratch
1. **Adapt to your scenario:** Use same patterns, different data
1. **Validate:** Run test, ensure it passes with both MCP and API

## Questions?

**Internal Documentation**:

- `src/tests/` - Test infrastructure (see \_fixtures/, \_helpers/)
- `docs/developer-guide/overview.md` - System architecture

**Code References**:

- **DAP Protocol:** See `src/aidb/dap/protocol/` (fully typed, types.py + requests.py + responses.py + events.py)
- **Test Infrastructure:** See `src/tests/_helpers/` and `src/tests/_fixtures/`
- **Working Examples:** See Flask/Express framework tests

______________________________________________________________________

**Remember:**

- E2E first, validate content accuracy
- Use shared suite for debug fundamentals, framework tests for integration
- Keep framework tests simple (no framework internals)
- Always use the DebugInterface abstraction (zero duplication)