# fhir-data-transformation

> Transform FHIR (Fast Healthcare Interoperability Resources) data into relational tables for analytics. Use when parsing FHIR bundles, extracting resources (Patient, Observation, Condition, MedicationRequest, etc.), flattening nested JSON, or loading healthcare data into Snowflake. Triggers include FHIR, HL7, healthcare interoperability, Patient resource, Observation, Condition, Bundle, ndjson, healthcare JSON.

- Author: Bracken Eddy
- Repository: sfc-gh-beddy/coco-healthcare-skills
- Version: 20260206155909
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/sfc-gh-beddy/coco-healthcare-skills
- Web: https://mule.run/skillshub/@@sfc-gh-beddy/coco-healthcare-skills~fhir-data-transformation:20260206155909

---

---
name: fhir-data-transformation
description: Transform FHIR (Fast Healthcare Interoperability Resources) data into relational tables for analytics. Use when parsing FHIR bundles, extracting resources (Patient, Observation, Condition, MedicationRequest, etc.), flattening nested JSON, or loading healthcare data into Snowflake. Triggers include FHIR, HL7, healthcare interoperability, Patient resource, Observation, Condition, Bundle, ndjson, healthcare JSON.
---

# FHIR Data Transformation

Transform FHIR R4 resources into analytics-ready relational tables in Snowflake.

## When to Use This Skill

- Parsing FHIR Bundles (JSON or NDJSON)
- Extracting specific resource types (Patient, Observation, Condition, etc.)
- Flattening nested FHIR structures for SQL analytics
- Loading FHIR data into Snowflake tables
- Building healthcare data pipelines from EHR exports

## Quick Start

```python
# Parse a FHIR Bundle and load to Snowflake
python scripts/fhir_to_tables.py bundle.json --output-dir ./output

# Parse NDJSON (one resource per line)
python scripts/fhir_to_tables.py patients.ndjson --format ndjson

# Load directly to Snowflake
python scripts/fhir_to_tables.py bundle.json --snowflake --database HEALTHCARE --schema FHIR_RAW
```

## Supported Resource Types

| Resource | Table Name | Key Fields Extracted |
|----------|------------|---------------------|
| Patient | `patient` | id, name, birthDate, gender, address, telecom, identifier |
| Observation | `observation` | id, patient_id, code, value, effectiveDateTime, status |
| Condition | `condition` | id, patient_id, code, clinicalStatus, onsetDateTime |
| MedicationRequest | `medication_request` | id, patient_id, medication, authoredOn, status, dosage |
| Encounter | `encounter` | id, patient_id, type, period_start, period_end, status |
| Procedure | `procedure` | id, patient_id, code, performedDateTime, status |
| DiagnosticReport | `diagnostic_report` | id, patient_id, code, effectiveDateTime, conclusion |
| Immunization | `immunization` | id, patient_id, vaccineCode, occurrenceDateTime, status |
| AllergyIntolerance | `allergy_intolerance` | id, patient_id, code, clinicalStatus, type |
| CarePlan | `care_plan` | id, patient_id, status, intent, category, period |

## Workflow

### Step 1: Analyze FHIR Data

First, understand what's in your FHIR file:

```python
python scripts/analyze_fhir.py input.json
```

Output shows:
- Resource type counts
- Date ranges
- Patient count
- Code systems used (SNOMED, LOINC, ICD-10, etc.)

### Step 2: Transform to Tables

```python
python scripts/fhir_to_tables.py input.json \
    --resources Patient,Observation,Condition \
    --output-dir ./fhir_tables
```

Outputs CSV files per resource type, ready for Snowflake COPY INTO.

### Step 3: Load to Snowflake

```sql
-- Create stage
CREATE OR REPLACE STAGE fhir_stage;

-- Put files
PUT file://./fhir_tables/*.csv @fhir_stage;

-- Create and load Patient table
CREATE OR REPLACE TABLE patient (
    id VARCHAR,
    family_name VARCHAR,
    given_name VARCHAR,
    birth_date DATE,
    gender VARCHAR,
    address_city VARCHAR,
    address_state VARCHAR,
    address_postal_code VARCHAR,
    phone VARCHAR,
    email VARCHAR,
    mrn VARCHAR,
    ssn VARCHAR,
    _source_file VARCHAR,
    _loaded_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

COPY INTO patient FROM @fhir_stage/patient.csv
    FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1);
```

## Handling Nested Data

FHIR resources contain deeply nested structures. This skill provides two approaches:

### Approach 1: Flattened Tables (Recommended for Analytics)

Each resource becomes a wide, denormalized table:

```
Observation → observation table with columns:
  - id, patient_id, status
  - code_system, code_code, code_display (from CodeableConcept)
  - value_quantity, value_unit (from valueQuantity)
  - effective_datetime (from effectiveDateTime)
```

### Approach 2: Raw JSON + Views (Recommended for Flexibility)

Load raw FHIR JSON into a VARIANT column, then create views:

```sql
-- Raw table
CREATE TABLE fhir_raw (
    resource_type VARCHAR,
    resource_id VARCHAR,
    data VARIANT,
    _loaded_at TIMESTAMP_NTZ
);

-- View for observations
CREATE VIEW v_observation AS
SELECT
    data:id::VARCHAR AS id,
    data:subject:reference::VARCHAR AS patient_ref,
    data:code:coding[0]:system::VARCHAR AS code_system,
    data:code:coding[0]:code::VARCHAR AS code_code,
    data:code:coding[0]:display::VARCHAR AS code_display,
    data:valueQuantity:value::FLOAT AS value,
    data:valueQuantity:unit::VARCHAR AS unit,
    data:effectiveDateTime::TIMESTAMP AS effective_datetime
FROM fhir_raw
WHERE resource_type = 'Observation';
```

## Code System Mappings

Common code systems in FHIR data:

| System URI | Name | Used For |
|------------|------|----------|
| `http://snomed.info/sct` | SNOMED CT | Conditions, procedures |
| `http://loinc.org` | LOINC | Lab observations |
| `http://hl7.org/fhir/sid/icd-10-cm` | ICD-10-CM | Diagnoses |
| `http://www.nlm.nih.gov/research/umls/rxnorm` | RxNorm | Medications |
| `http://hl7.org/fhir/sid/cvx` | CVX | Vaccines |
| `http://hl7.org/fhir/sid/ndc` | NDC | Drug products |

## Synthea Test Data

For testing, use [Synthea](https://github.com/synthetichealth/synthea) synthetic patient data:

```bash
# Download sample Synthea data
curl -L -o synthea_sample.zip https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_fhir_r4_sep2019.zip
unzip synthea_sample.zip
```

## Best Practices

1. **Preserve raw data**: Always keep original FHIR JSON in a raw table
2. **Track lineage**: Include `_source_file` and `_loaded_at` columns
3. **Handle nulls**: FHIR fields are often optional; use COALESCE or NVL
4. **Parse references**: Patient references like `Patient/123` need splitting
5. **Normalize codes**: Create lookup tables for code systems

## Reference Files

- `references/fhir_r4_resources.md` - Resource structure reference
- `references/common_code_systems.md` - Code system details
- `references/snowflake_schema.md` - Recommended Snowflake schema design

## Requirements

```
fhir.resources>=7.0.0
pandas>=2.0.0
snowflake-connector-python>=3.0.0
```