# openevolve-experiment

> Set up an evolutionary optimization experiment using the openevolve framework. Generates `config.yaml`, `evaluator.py`, and `initial_program.py` based on a user's optimization problem.

- Author: Erick Eduardo Ramirez Torres
- Repository: eramireztorres/google_adk_chatbot
- Version: 20260131143700
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/eramireztorres/google_adk_chatbot
- Web: https://mule.run/skillshub/@@eramireztorres/google_adk_chatbot~openevolve-experiment:20260131143700

---

---
name: openevolve-experiment
description: Set up an evolutionary optimization experiment using the openevolve framework. Generates `config.yaml`, `evaluator.py`, and `initial_program.py` based on a user's optimization problem.
---

# Goal
Generate the required configuration and code files (`config.yaml`, `evaluator.py`, `initial_program.py`) to run an OpenEvolve evolutionary optimization experiment.

# Usage
Use this skill when the user asks to "set up an openevolve experiment", "optimize code using evolution", or "create an evolutionary optimizer" for a specific task.

# Instructions

## 1. Analyze the Problem
Determine the nature of the optimization task:
*   **Small Modifications** (e.g., tuning parameters, optimizing specific functions):
    *   Use `diff_based_evolution: true`.
    *   The LLM will emit diff blocks to edit parts of the code/text.
*   **Full Rewrites** (e.g., rewriting prompts, changing entire algorithms):
    *   Use `diff_based_evolution: false`.
    *   The LLM will rewrite the entire text/file.

## 2. Generate `initial_program.py` (or `initial_prompt.txt`)
Create a baseline implementation.
*   **For Code (Diff-based)**:
    *   Must be a valid Python file.
    *   **CRITICAL**: If `diff_based_evolution: false`, you MUST wrap the code to be evolved with `# EVOLVE-BLOCK-START` and `# EVOLVE-BLOCK-END` to define the rewrite target.
    *   Example (for `diff_based_evolution: false`):
        ```python
        # EVOLVE-BLOCK-START
        def slow_function(x):
            time.sleep(1)
            return x * 2
        # EVOLVE-BLOCK-END
        ```
*   **For Text/Prompts (Full rewrite)**:
    *   Just the raw text content to be optimized.

## 3. Generate `evaluator.py`
Create the scoring logic.
*   **Function Signature**: Must implement `evaluate(program_path) -> dict`.
    *   `program_path` is the absolute path to the candidate program file.
*   **Return Value**:
    *   A dictionary containing at least `"combined_score"`.
    *   **Score Direction**: Higher is BETTER. Normalize your metrics so that the goal is maximization.
    *   Can include other metrics like `"accuracy"`, `"speed"`, `"cost"`, etc., for the MAP-Elites grid.
*   **Robustness**:
    *   Import the candidate program dynamically using `importlib`.
    *   Wrap execution in `try/except` blocks.
    *   Implement timeouts (use `concurrent.futures` or similar) to prevent infinite loops in bad candidates.
    *   If the candidate fails, return `{"combined_score": 0.0, "error": "..."}`.
*   **Complex Logic (e.g., RAG/Evidently)**:
    *   The `evaluate` function should import and call your complex evaluation logic (e.g., `from my_rag_eval import run_evaluation`).
    *   It MUST verify the result is a number and wrap it in the required dictionary format.
    *   Do NOT put all logic inside `evaluate` if it requires massive imports; keep it modular.

## 4. Generate `config.yaml`
Define the evolution hyperparameters.
*   **Basic Config**:
    ```yaml
    max_iterations: 50
    diff_based_evolution: true  # or false
    max_code_length: 10000
    max_code_length: 10000
    log_level: INFO  # DEBUG, INFO, WARNING, ERROR, CRITICAL
    ```
*   **LLM Config (STRICT)**:
    *   MUST be nested under `llm`.
    *   MUST include `api_base` and a non-empty `models` list.
    *   Do NOT use top-level `llm_model` or `parameters`.
    *   WARNING: If `models` is empty or mis-specified, the LLM ensemble will be empty and generation will fail.
    ```yaml
    llm:
      api_base: "https://api.openai.com/v1"
      models:
        - name: "gpt-4.1-mini"
          weight: 1.0
      temperature: 0.7
      max_tokens: 16000
      timeout: 120
      retries: 3
    ```
*   **Prompt Config (REQUIRED)**:
    *   MUST be nested under `prompt`.
    *   MUST include `system_message`.
    *   `include_artifacts`: `true` (highly recommended for error feedback).
    ```yaml
    prompt:
      system_message: "You are an expert optimizer..."
      include_artifacts: true
    ```
*   **Database Config**:
    *   `feature_dimensions`: Choose 2-3 metric names returned by your evaluator (e.g., `["complexity", "speed"]`).
    *   `exploitation_ratio`: Set low (e.g., 0.2) to favor diversity.
    *   `population_size`: Typicall 50-100.
*   **Evaluator Config**:
    *   `timeout`: Max seconds per evaluation.
    *   `parallel_evaluations`: Number of parallel workers (e.g., 4).
    *   `max_tasks_per_child`: Restart workers occasionally to prevent memory leaks (e.g., 10).
    *   WARNING: If your evaluator uses GPU-backed metrics (e.g., sentence-transformers),
        forked multiprocessing can crash. Prefer LLM-only metrics or force CPU.

## 5. Generate `validate_setup.py`
Create a script to verify the setup before running evolution.
*   **Purpose**: Test that `evaluator.py` can successfully evaluate `initial_program.py` (or the initial text).
*   **Requirements**:
    *   Import `evaluate` from `evaluator`.
    *   Run `evaluate()` on the initial program path.
    *   Print the result.
    *   Assert that the result contains `combined_score` and no errors.

```python
import sys
import os
import importlib.util

def validate():
    print("Validating setup...")
    if not os.path.exists("evaluator.py"): sys.exit("evaluator.py missing")
    
    spec = importlib.util.spec_from_file_location("evaluator", "evaluator.py")
    eval = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(eval)
    
    try:
        res = eval.evaluate(os.path.abspath("initial_program.py"))
        print(f"Result: {res}")
        if isinstance(res, dict):
            score = res.get("combined_score")
        else: # EvaluationResult object
            score = res.metrics.get("combined_score")
            
        if score is None: raise ValueError("No combined_score in result")
        print("✅ Setup Valid")
    except Exception as e:
        print(f"❌ Validation Failed: {e}")
        sys.exit(1)

if __name__ == "__main__":
    validate()
```


# Example Output Structures

## Evaluator Template
```python
import importlib.util
import time

def evaluate(program_path):
    try:
        # Load candidate
        spec = importlib.util.spec_from_file_location("candidate", program_path)
        module = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(module)
        
        # Test candidate
        start = time.time()
        result = module.solve_problem(test_input) # Assuming 'solve_problem' is the entry point
        duration = time.time() - start
        
        # Prepare artifacts (optional but recommended)
        artifacts = {
            "execution_time": duration,
            "stdout": "Success"
        }

        # Return dict or EvaluationResult
        return {
            "combined_score": score, # CRITICAL: Must be present
            "duration": duration,
            "complexity": len(open(program_path).read()),
            "artifacts": artifacts
        }
    except Exception as e:
        import traceback
        return {
            "combined_score": 0.0, 
            "error": str(e),
            "artifacts": {"traceback": traceback.format_exc()}
        }
```

## Complex Evaluator Template (e.g. RAG)
```python
import importlib.util
# Import your specific evaluation logic helper 
# (assuming it exists or is generated alongside)
from my_rag_eval import run_evaluation_logic 

def evaluate(program_path):
    try:
        # 1. Load the candidate program (RAG pipeline)
        spec = importlib.util.spec_from_file_location("candidate", program_path)
        module = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(module)
        
        # 2. Run the complex evaluation
        # Pass the module or functions to your evaluator
        score, detailed_metrics = run_evaluation_logic(module)
        
        # 3. Return compatible result
        return {
            "combined_score": float(score),
            "metrics": detailed_metrics
        }
    except Exception as e:
        return {"combined_score": 0.0, "error": str(e)}
```

## Config Template
```yaml
max_iterations: 20
diff_based_evolution: true

llm:
  api_base: "https://api.openai.com/v1"
  models:
    - name: "gpt-4.1-mini"
      weight: 1.0
  temperature: 0.7
  max_tokens: 16000
  timeout: 120
  retries: 3

prompt:
  system_message: "You are an expert optimizer. Improve the code to run faster."
  include_artifacts: true

database:
  feature_dimensions: ["complexity", "duration"]

evaluator:
  timeout: 5
  parallel_evaluations: 2
  max_tasks_per_child: 10
```