# using-braintrust > Enables AI agents to use Braintrust for LLM evaluation, logging, and observability. Includes scripts for querying logs with SQL, running evals, and logging data. - Author: Ankur Goyal - Repository: braintrustdata/braintrust-claude-plugin - Version: 20260103150725 - Stars: 12 - Forks: 5 - Last Updated: 2026-02-06 - Source: https://github.com/braintrustdata/braintrust-claude-plugin - Web: https://mule.run/skillshub/@@braintrustdata/braintrust-claude-plugin~using-braintrust:20260103150725 --- --- name: using-braintrust description: | Enables AI agents to use Braintrust for LLM evaluation, logging, and observability. Includes scripts for querying logs with SQL, running evals, and logging data. version: 1.0.0 --- # Using Braintrust Braintrust is a platform for evaluating, logging, and monitoring LLM applications. ## Listing projects Use `scripts/list_projects.py` to see all available projects: ```bash uv run /path/to/scripts/list_projects.py ``` ## Querying logs with SQL Use the `query_logs.py` script to run SQL queries against Braintrust logs. **Always share the SQL query you used** when reporting results, so the user understands what was executed. **Script location:** `scripts/query_logs.py` (relative to this file) **Run from the user's project directory** (where `.env` with `BRAINTRUST_API_KEY` exists): ```bash uv run /path/to/scripts/query_logs.py --project "Project Name" --query "SQL_QUERY" ``` ### Common queries **Count logs from last 24 hours:** ```sql SELECT count(*) as count FROM logs WHERE created > now() - interval 1 day ``` **Get recent logs:** ```sql SELECT input, output, created FROM logs ORDER BY created DESC LIMIT 10 ``` **Filter by metadata:** ```sql SELECT input, output FROM logs WHERE metadata.user_id = 'user123' LIMIT 20 ``` **Filter by time range:** ```sql SELECT * FROM logs WHERE created > now() - interval 7 day LIMIT 50 ``` **Aggregate by field:** ```sql SELECT metadata.model, count(*) as count FROM logs GROUP BY metadata.model ``` **Group by hour:** ```sql SELECT hour(created) as hr, count(*) as count FROM logs GROUP BY hour(created) ``` ### SQL quirks in Braintrust - **Time functions**: Use `hour()`, `day()`, `month()`, `year()` instead of `date_trunc()` - ✅ `hour(created)` - ❌ `date_trunc('hour', created)` - **Intervals**: Use `interval 1 day`, `interval 7 day`, `interval 1 hour` (no quotes, singular unit) - **Nested fields**: Use dot notation: `metadata.user_id`, `scores.Factuality`, `metrics.duration` - **Table name**: Always use `FROM logs` (the script handles project scoping) ### SQL reference **Operators:** - `=`, `!=`, `>`, `<`, `>=`, `<=` - `IS NULL`, `IS NOT NULL` - `LIKE 'pattern%'` - `AND`, `OR`, `NOT` **Aggregations:** - `count(*)`, `count(field)` - `avg(field)`, `sum(field)` - `min(field)`, `max(field)` **Time filters:** - `created > now() - interval 1 day` - `created > now() - interval 7 day` - `created > now() - interval 1 hour` ## Logging data Use `scripts/log_data.py` to log data to a project: ```bash uv run /path/to/scripts/log_data.py --project "Project Name" --input "query" --output "response" ``` With metadata: ```bash --input "query" --output "response" --metadata '{"user_id": "123"}' ``` Batch from JSON: ```bash --data '[{"input": "a", "output": "b"}, {"input": "c", "output": "d"}]' ``` ## Running evaluations Use `scripts/run_eval.py` to run evaluations: ```bash uv run /path/to/scripts/run_eval.py --project "Project Name" --data '[{"input": "test", "expected": "test"}]' ``` From file: ```bash --data-file test_cases.json --scorer factuality ``` ## Setup Create a `.env` file in your project directory: ``` BRAINTRUST_API_KEY=your-api-key-here ``` ## Writing evaluation code (SDK) For custom evaluation logic, use the SDK directly. **IMPORTANT**: First argument to `Eval()` is the project name (positional). ```python import braintrust from autoevals import Factuality braintrust.Eval( "My Project", # Project name (required, positional) data=lambda: [{"input": "What is 2+2?", "expected": "4"}], task=lambda input: my_llm_call(input), scores=[Factuality], ) ``` **Common mistakes:** - ❌ `Eval(project_name="My Project", ...)` - Wrong! - ❌ `Eval(name="My Project", ...)` - Wrong! - ✅ `Eval("My Project", data=..., task=..., scores=...)` - Correct! ## Writing logging code (SDK) ```python import braintrust logger = braintrust.init_logger(project="My Project") logger.log(input="query", output="response", metadata={"user_id": "123"}) logger.flush() # Always flush! ``` ## Common issues - **"Eval() got an unexpected keyword argument 'project_name'"**: Use positional argument - **Logs not appearing**: Call `logger.flush()` after logging - **Authentication errors**: Create `.env` file with `BRAINTRUST_API_KEY=your-key`