Generate labeled training examples for the BERT canonicalization classifier. Use to collect seed data from public APIs (OpenAPI specs), LLM tool-use datasets (ToolBench, API-Bank), or create synthetic variations. Outputs stratified JSONL with action, resource_type, and sensitivity labels.