# perform-sweep > Design, configure, launch, and analyze ablation sweeps for GRPO training. Use for hypothesis testing, hyperparameter experiments, and systematic comparisons. - Author: ben - Repository: bglick13/diplomacy-v2 - Version: 20251228134550 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/bglick13/diplomacy-v2 - Web: https://mule.run/skillshub/@@bglick13/diplomacy-v2~perform-sweep:20251228134550 --- --- name: perform-sweep description: Design, configure, launch, and analyze ablation sweeps for GRPO training. Use for hypothesis testing, hyperparameter experiments, and systematic comparisons. --- # Perform Sweep End-to-end workflow for running ablation experiments on the Diplomacy GRPO training pipeline. ## Quick Reference | Phase | Action | Command | |-------|--------|---------| | **Configure** | Create sweep.yaml | See [YAML Reference](yaml-reference.md) | | **Validate** | Dry run | `python scripts/launch_sweep.py --dry-run` | | **Info** | Show config | `python scripts/launch_sweep.py --info` | | **Launch** | Start sweep | `python scripts/launch_sweep.py ` | | **Status** | Check progress | `python scripts/launch_sweep.py --status` | | **List** | List all sweeps | `python scripts/launch_sweep.py --list` | | **Analyze** | Compare results | Use `experiment-analysis` skill | ## Workflow ### 1. Hypothesis Design - Review recent experiments in `experiments/experiment-tracker.md` - Identify one variable to test (e.g., horizon length, scoring function) - Predict expected outcome - Document reasoning in sweep.yaml `hypothesis` field ### 2. YAML Configuration Create `experiments/sweeps//sweep.yaml`: ```yaml metadata: name: "my-ablation" description: "Testing hypothesis X" hypothesis: "Longer horizons should improve strategic play" experiment_tag_prefix: "my-ablation" defaults: total_steps: 100 runs: A: name: "control" description: "Baseline configuration" config: experiment_tag: "${metadata.experiment_tag_prefix}-A" B: name: "treatment" description: "With longer horizon" config: rollout_horizon_years: 8 experiment_tag: "${metadata.experiment_tag_prefix}-B" ``` See [YAML Reference](yaml-reference.md) for full schema. ### 3. Validate Configuration ```bash # Show sweep info python scripts/launch_sweep.py experiments/sweeps// --info # Dry run (validates config, shows what would run) python scripts/launch_sweep.py experiments/sweeps// --dry-run ``` ### 4. Launch and Monitor ```bash # Launch (fire-and-forget - runs in cloud) python scripts/launch_sweep.py experiments/sweeps// # Check status anytime python scripts/launch_sweep.py experiments/sweeps// --status # List all sweeps python scripts/launch_sweep.py --list ``` ### 5. Analysis After sweep completes, use the `experiment-analysis` skill: ```bash # Full analysis for each run uv run python .claude/skills/experiment-analysis/analyze_elo.py # Compare in WandB # Filter by experiment_tag_prefix (e.g., "my-ablation") ``` ## Key Features - **Fire-and-forget**: Launch and close laptop - sweep runs in Modal cloud - **Auto-resume**: If Modal times out (24hr max), sweep automatically respawns - **Sequential execution**: Runs one training at a time (infra constraint) - **Progress tracking**: State saved after each run for recovery ## Example Sweeps See existing sweeps in `experiments/sweeps/`: - `longer-horizon-inverted-weight-ablation/` - 2x2 ablation on horizon and scoring ## Integration - Use `experiment-analysis` skill for post-sweep metrics analysis - Results logged to WandB with `experiment_tag` for filtering - Document findings in sweep directory's `results.md`