# warp-parse > Generate WPL (Warp Parse Language) parsing rules and OML (Object Mapping Language) transformations from raw log samples. Use this skill when converting raw logs to structured JSON, creating WPL rules for log formats (Apache/Nginx/syslog/JSON/KV/custom), creating OML transformations, debugging or optimizing existing WPL/OML code, or understanding Warp Parse syntax. - Author: zuowenping - Repository: cloney0/wp-skill - Version: 20260124152121 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/cloney0/wp-skill - Web: https://mule.run/skillshub/@@cloney0/wp-skill~warp-parse:20260124152121 --- --- name: warp-parse description: "Generate WPL (Warp Parse Language) parsing rules and OML (Object Mapping Language) transformations from raw log samples. Use this skill when converting raw logs to structured JSON, creating WPL rules for log formats (Apache/Nginx/syslog/JSON/KV/custom), creating OML transformations, debugging or optimizing existing WPL/OML code, or understanding Warp Parse syntax." --- # Warp Parse Code Generation Generate WPL parsing rules and OML transformations from raw log samples to produce structured JSON output. ## Workflow Overview Log-to-JSON transformation involves two phases: 1. **WPL (Parsing)**: Extract fields from raw logs using pattern matching 2. **OML (Transformation)**: Map/transform extracted fields to final JSON structure ``` Raw Log → [WPL Rule] → Parsed Fields → [OML Transform] → JSON Output ``` ## Step 1: Analyze the Raw Log Examine the log sample to identify: - **Field types**: IP addresses, timestamps, numbers, text, structured data (JSON/KV) - **Separators/delimiters**: Spaces, commas, brackets, quotes - **Patterns**: Repeated fields, optional sections, conditional formats - **Edge cases**: Missing fields, variable formats, escape sequences Ask clarifying questions if the log format is ambiguous. ## ⚠️ Core Principle: Field Order WPL parsing **MUST strictly follow the original log field order**. ``` Log: 192.168.1.1 - - [15/Jan/2024:10:30:00 +0000] "GET /path HTTP/1.1" 200 1234 ✅ Correct: (ip, 2*_, time/clf<[,]>, http/request", http/status, digit, chars") ❌ Wrong: (time/clf<[,]>, ip, 2*_, http/request", http/status, digit, chars") # Wrong order ``` **Why**: WPL uses sequential parsing - fields are matched left-to-right. Incorrect order causes parsing failures or incorrect data extraction. ## Step 2: Generate WPL Rule ### ⚠️ CRITICAL: Use ONLY Declared Types **NEVER invent or infer types that don't exist in WPL.** - **ONLY use types explicitly defined** in [WPL Syntax](references/wpl-syntax.md) - **DO NOT create** types like `http/referrer`, `http/header`, `http/cookie`, etc. - **DO NOT assume** that because `http/request`, `http/status`, `http/agent` exist, other `http/*` types also exist **If a type is not listed in the reference, use generic types:** - For text fields: `chars` - For numbers: `digit` or `float` - For key-value: `kv` - For structured data: `json` ### ⚠️ CRITICAL: Compatibility Mode (Strict `seq` Only) Some WPL deployments treat `()` as `seq()` and **do not allow nesting group operators (e.g. `alt()`, `opt()`, `some_of()`) inside `seq()`**. To keep generated parsers portable: - **Default to strict `seq` rules**: inside `()` use only flat fields and field-level sub-parsing pipes. - **Do not emit `alt(...)` / `opt(...)` inside `()` unless the user confirms their WPL supports it**. - For fields that can be `-`, `""`, or otherwise polymorphic (digit-or-dash, ip:port-or-dash): 1. **Parse as `chars` in WPL** to avoid parse failures. 2. **Normalize in OML** (keep `*_raw` fields, or type them as `auto`). 3. If you must strongly type them in WPL, **split into multiple rules** (one per variant) and attach the same OML model to all those rules. **Example (split rules instead of nested `alt`)**: ```wpl package /mixed/ { rule flexible_code { (ip:src, digit:code, time_iso:ts) } rule flexible_error { (ip:src, chars:error, time_iso:ts) } } ``` ```oml name : mixed_flexible rule : /mixed/flexible_code, /mixed/flexible_error --- * : auto = take() ; ``` ### ⚠️ CRITICAL: Field Count & Position Validation **WPL rule field count MUST exactly match log field count.** ``` Log: [20/Feb/2018:12:12:14 +0800] 112.195.209.90 - - "GET / HTTP/1.1" 200 190 "-" "Mozilla/5.0" "-" Log fields by position: 1. [time] 2. ip 3. - 4. - 5. "request" 6. 200 7. 190 8. "-" 9. "UA" 10. "-" ✅ Correct WPL (10 fields): (time/clf<[,]>, ip, 2*_, http/request", http/status, digit, chars", chars", chars") ❌ Wrong WPL (9 fields - missing last field): (time/clf<[,]>, ip, 2*_, http/request", http/status, digit, chars", chars") ❌ Wrong WPL (11 fields - extra field): (time/clf<[,]>, ip, 2*_, http/request", http/status, digit, chars", chars", chars", chars") ``` **Validation Checklist**: 1. **Count log fields** first: mark each field position with a number 2. **Count WPL fields**: ensure exact 1:1 mapping 3. **Each log field gets exactly ONE WPL field** - no duplicates, no omissions 4. **Skip placeholders** with `N*_` (e.g. `2*_` for `- -`). 5. **Quote marks are separators**, not content: `"value"` counts as ONE field, not three **Example validation process**: ``` Step 1: Number log fields [time] ip - - "req" 200 100 "-" "ua" "-" 1 2 3 4 5 6 7 8 9 10 Step 2: Write WPL and verify count (time/clf<[,]>, ip, 2*_, http/request", http/status, digit, chars", chars", chars") 1 2 3 4 5 6 7 8 9 10 ✓ Match Step 3: Name each field for OML reference (time/clf:ts<[,]>, ip:client_ip, 2*_, http/request:req", http/status:st, digit:bytes, chars:ref", chars:ua", chars:extra") ``` ### Basic WPL Structure ```wpl package { rule { () } } ``` ### Field Syntax ``` [repeat] DataType [(subfields)] [:name] [[length]] [format] [sep] [| pipe] ``` **Key elements:** - `DataType`: `ip`, `time`, `digit`, `chars`, `http/request`, `kv`, `json`, etc. (see [WPL Syntax](references/wpl-syntax.md)) - `:name`: Assign field name for OML reference - `[length]`: Max length limit - `` or `"quote"`: Delimited scope - `\sep`: Backslash-escaped separator - `| pipe`: Field-level functions ### Group Semantics - `seq` (default): Match all fields in order - `alt`: Try alternatives, first success wins - `opt`: Optional field, failure OK - `some_of`: Match as many as possible ### Common Patterns **Nginx/Apache access log:** ```wpl (ip:client_ip, 2*_, time/clf:timestamp<[,]>, http/request", http/status, digit:bytes, chars", http/agent") ``` **Key-Value pairs:** ```wpl kv(time@ts, ip@src, digit@code) ``` **JSON payload:** ```wpl json(chars@message, ip@src_ip, array/digit@ports) ``` See [WPL Syntax](references/wpl-syntax.md) for complete data type reference and examples. ## Step 3: Generate OML Transformation ### ⚠️ CRITICAL: Use ONLY Declared Functions and Types **NEVER invent or infer functions, types, or parameters that don't exist in OML.** - **ONLY use functions explicitly defined** in [OML Syntax](references/oml-syntax.md) - **DO NOT assume** function parameters based on similar functions or naming patterns - **ALWAYS verify** the function signature before using it **Common Mistakes to Avoid:** ❌ **Wrong**: `url(method)`, `url(path)`, `url(version)` - These parameters DO NOT exist ✅ **Correct**: Use WPL sub-parsing to extract HTTP request components: ```wpl http/request:request" | (http/method:method, chars:path, chars:version) ``` ❌ **Wrong**: Assuming `Time::parse(format)` exists because `Time::to_ts` exists ✅ **Correct**: Check [OML Syntax](references/oml-syntax.md) for actual Time functions ❌ **Wrong**: Using `url_get(protocol)` when only `url(domain)`, `url(path)`, `url(params)` are defined ✅ **Correct**: Use only documented parameters **Verification Process:** 1. Before using any function, check if it's listed in [OML Syntax](references/oml-syntax.md) 2. Verify the exact parameter names and values 3. If a function doesn't exist for your use case, consider: - Using WPL sub-parsing to extract the data differently - Using generic types and transformations - Asking the user for clarification ### Basic OML Structure ```oml name : rule : --- ``` ### Read vs Take - `read(field)`: Non-destructive, can read same field multiple times - `take(field)`: Destructive, removes field after reading **Lookup priority**: Both `read` and `take` check destination first, then source. ### Assignment Patterns **Direct mapping:** ```oml client_ip = read(client_ip) ; ``` **Default value:** ```oml country = read(country) { _ : chars(CN) } ; ``` **Type conversion:** ```oml # New syntax (recommended) event_time : time = read(timestamp) | Time::to_ts_zone(0, ms) ; # Legacy syntax (still works) event_time : time = read(timestamp) | to_timestamp_zone(0,ms) ; ``` **URL extraction** (for URL strings, NOT HTTP request lines): ```oml # New syntax (recommended) request_path = read(url_field) | url(path) ; domain = read(url_field) | url(domain) ; query_params = read(url_field) | url(params) ; # Legacy syntax (still works) request_path = read(url_field) | url_get(path) ; ``` **Note**: For HTTP request lines like `"GET /path HTTP/1.1"`, use WPL sub-parsing: ```wpl http/request:request" | (http/method:method, chars:path, chars:version) ``` **Base64 decode:** ```oml payload = read(data) | base64_decode(Utf8) ; ``` **HTML escape:** ```oml safe = read(text) | html_escape ; ``` **Object aggregation:** ```oml network : obj = object { src_ip : ip = read(client_ip), src_port : digit = read(sport) }; ``` **Array collection:** ```oml all_ports : array = collect read(keys:[sport,dport]) ; ``` **Wildcard pass-through:** ```oml * : auto = take() ; ``` See [OML Syntax](references/oml-syntax.md) for complete transformation patterns. ## Step 4: Validate Output Verify the generated code: 1. **Field count validation**: WPL field count MUST exactly match log field count - Number each log field position first - Count WPL fields and verify 1:1 mapping - No duplicate positions, no missing fields 2. **WPL validation**: Check field order, separators, data types match log format 3. **OML validation**: Ensure field references match WPL output names 4. **Edge cases**: Handle missing/optional fields with `opt()` or default bodies ## Reference Documentation - **[WPL Syntax](references/wpl-syntax.md)**: Complete data types, field syntax, group semantics, preprocessing pipes, field validation - **[OML Syntax](references/oml-syntax.md)**: Read/take semantics, assignments, pipe functions, built-in functions - **[Common Patterns](references/common-patterns.md)**: Nginx, Apache, JSON, KV, and syslog examples ## New Function Summary ### WPL New Functions - `f_chars_not_has(name, val)` - Check field not equals string - `f_digit_in(name, [...])` - Check field in number list - `chars_not_has(val)` - Check active field not equals string - `digit_in([...])` - Check active field in number list - `base64_decode()` - Base64 decode active field ### OML New Functions - `Now::time()`, `Now::date()`, `Now::hour()` - Built-in time functions - `Time::to_ts`, `Time::to_ts_ms`, `Time::to_ts_us` - Time conversion shorthands - `html_unescape`, `str_escape` - String escaping - `path(name)`, `path(path)` - File path extraction - `sxf_get(field)` - Special format field extraction ### Updated Function Syntax - `base64_decode(encoding)` - Now supports Utf8/Gbk/Imap parameters - `url(part)` - Replaces `url_get(part)` - `get(key)` - Replaces `obj_get(key)` - `skip_empty` - Replaces `skip_if_empty`