# pipeline-design

> Design ETL/ELT pipeline architectures with proper patterns for reliability and scalability.

- Author: Vitaly D.
- Repository: timequity/plugins
- Version: 20251210174826
- Stars: 3
- Forks: 0
- Last Updated: 2026-02-07
- Source: https://github.com/timequity/plugins
- Web: https://mule.run/skillshub/@@timequity/plugins~pipeline-design:20251210174826

---

---
name: pipeline-design
description: Design ETL/ELT pipeline architectures with proper patterns for reliability and scalability.
---

# Pipeline Design

## ETL vs ELT

| Approach | When to Use |
|----------|-------------|
| **ETL** | Transform before load, limited warehouse compute |
| **ELT** | Modern warehouses (Snowflake, BigQuery, Redshift) |

## Pipeline Patterns

### Batch

```
Source → Extract → Stage → Transform → Load → Target
         │                    │
         └── Checkpoint ──────┘
```

- Scheduled intervals (hourly, daily)
- Full or incremental loads
- Idempotent operations

### Streaming

```
Source → Kafka/Kinesis → Process → Sink
              │
              └── State Store
```

- Real-time requirements
- Event-driven architecture
- Exactly-once semantics

## Design Principles

1. **Idempotent** - Safe to re-run
2. **Incremental** - Process only new/changed data
3. **Observable** - Metrics, logs, alerts
4. **Testable** - Unit tests for transformations
5. **Recoverable** - Checkpoints, retry logic

## Staging Pattern

```sql
-- 1. Land raw data
COPY INTO raw.source_data FROM @stage;

-- 2. Deduplicate
CREATE TABLE staging.deduped AS
SELECT * FROM raw.source_data
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY _loaded_at DESC) = 1;

-- 3. Transform to target
MERGE INTO target.dim_customer
USING staging.deduped
ON target.id = staging.id
WHEN MATCHED THEN UPDATE ...
WHEN NOT MATCHED THEN INSERT ...;
```

## Error Handling

- Dead letter queues for failed records
- Retry with exponential backoff
- Alert on threshold breaches
- Quarantine bad data for review