# langfuse-dataset-setup > Use when the user wants to set up a new Langfuse dataset with evaluation dimensions, score configs, and judge prompts (LLM-as-judge or human review). - Author: MB9012 - Repository: mberto10/claude-marketplace - Version: 20260206230906 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/mberto10/claude-marketplace - Web: https://mule.run/skillshub/@@mberto10/claude-marketplace~langfuse-dataset-setup:20260206230906 --- --- name: langfuse-dataset-setup description: Use when the user wants to set up a new Langfuse dataset with evaluation dimensions, score configs, and judge prompts (LLM-as-judge or human review). --- # Langfuse Dataset Setup Guide the user through dataset creation and evaluation configuration. ## Step 1: Gather Requirements Ask: - Dataset purpose (regression, A/B, golden set, edge cases) - Evaluation dimensions (accuracy, helpfulness, relevance, safety, tone, completeness) - Evaluation method (LLM-as-judge, human review, or both) - Dataset name, description, and target size ## Step 2: Create Dataset ```bash python3 ~/.codex/skills/langfuse-dataset-management/scripts/dataset_manager.py \ create --name "" --description "" \ --metadata '{"purpose": "", "evaluation_method": "", "dimensions": [""]}' ``` ## Step 3: Configure Score Types Have the user create score configs in Langfuse settings. See `references/score-configs.md` for defaults. ## Step 4: Create Judge Prompts (if LLM-as-judge) Use `langfuse-prompt-management` and templates from `references/judge-prompts.md`. ## Step 5: Confirm Next Steps - If the dataset will be populated from traces, use `langfuse-data-retrieval` + `langfuse-dataset-management`. - If the user has test cases, convert and add them as dataset items.