# evals

> Use when adding automated tests for LLM contracts, schema compliance, or regression checks.

- Author: Andreas Göldi
- Repository: angoeldi/conversationalGSGcourt
- Version: 20260201013324
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/angoeldi/conversationalGSGcourt
- Web: https://mule.run/skillshub/@@angoeldi/conversationalGSGcourt~evals:20260201013324

---

---
name: evals
description: Use when adding automated tests for LLM contracts, schema compliance, or regression checks.
---

# Evals skill

## Objective
Make LLM behavior measurable and regressions obvious.

## Trigger examples
- "Decision parser started producing invalid JSON"
- "Builder scenario output violates schema"

## Workflow
1. Test contracts at the boundary
   - Validate JSON against Zod schemas in `packages/shared`.

2. Golden test corpora
   - Store small fixtures of (input → expected JSON) in `apps/server/src/evals/fixtures`.

3. Add non-LLM property tests where possible
   - Engine invariants: no negative populations, treasury updates consistent.

4. Run tests
   - `pnpm -r test`

## Output expectations
- Tests fail loudly on schema violations.
- Tests are deterministic (no live API calls in CI by default).