# extraction-skill > Extract structured data from text using JSON schemas or NER-style entities; use when the user asks to extract fields, entities, or structured JSON from unstructured input. - Author: runoob - Repository: yasv1231/career-path-agent - Version: 20260201145154 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/yasv1231/career-path-agent - Web: https://mule.run/skillshub/@@yasv1231/career-path-agent~extraction-skill:20260201145154 --- --- name: extraction-skill description: Extract structured data from text using JSON schemas or NER-style entities; use when the user asks to extract fields, entities, or structured JSON from unstructured input. --- # Extraction Skill ## Goal Convert `last_answer` into structured JSON for downstream use (storage, tool calls, evaluation). ## Node Context - Node: `extract_node` - Input: `last_answer` - Output: `extracted` ## Workflow 1. Read target schema: if provided, follow exactly; if not, use the default schema below. 2. Extract entities/fields: map raw text to schema fields. 3. Normalize values (dates, numbers, units, names). 4. Validate against schema (required fields, types, enums). 5. If invalid, **retry** extraction with corrections (up to 2 retries). 6. Return JSON only unless the user asked for explanations. ## JSON Extraction Pattern If a schema is given, output JSON that matches it exactly. Otherwise use: ``` { "items": [ { "field": "...", "value": "...", "normalized": "...", "evidence": "short quote" } ] } ``` ## Default Schema (for structured evaluation) Use this schema when none is provided. It is optimized for downstream storage, tool calls, and evaluation. ``` { "type": "object", "required": ["schema_version", "task", "intent", "entities", "constraints", "actions", "confidence"], "properties": { "schema_version": { "type": "string", "enum": ["v1"] }, "task": { "type": "string" }, "intent": { "type": "string", "enum": ["ask", "request", "inform", "confirm", "refine", "other"] }, "entities": { "type": "array", "items": { "type": "object", "required": ["type", "text"], "properties": { "type": { "type": "string" }, "text": { "type": "string" }, "normalized": { "type": "string" } } } }, "constraints": { "type": "array", "items": { "type": "string" } }, "actions": { "type": "array", "items": { "type": "object", "required": ["action", "target"], "properties": { "action": { "type": "string" }, "target": { "type": "string" }, "params": { "type": "object" } } } }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 } } } ``` ## Validation + Retry - Run schema validation after extraction. - If invalid: fix missing fields, type mismatches, enum violations, or empty required fields. - Retry up to **2 times**; if still invalid, return the closest valid JSON and mark low confidence (<= 0.4). ## NER Pattern For entity extraction, use: ``` { "entities": [ { "type": "PERSON|ORG|LOC|DATE|MONEY|TITLE|SKILL|OTHER", "text": "...", "normalized": "...", "start": 0, "end": 0, "confidence": 0.0 } ] } ``` ## Normalization Rules - Dates: ISO 8601. - Money: numeric plus currency code when possible. - Titles/roles: canonical casing (e.g., "Software Engineer"). - Skills: singular form; keep original if ambiguous. ## Evidence Rules - Keep evidence quotes under 12 words. - If the text is long, select the shortest unique span. ## Uncertainty Handling - If unsure, include a lower confidence score and leave normalized blank. - Do not invent entities or fields not supported by evidence.