# transformation-design

> Build and validate transformation sets for invariance testing

- Author: Cem
- Repository: cemphlvn/research-engine
- Version: 20251226233250
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/cemphlvn/research-engine
- Web: https://mule.run/skillshub/@@cemphlvn/research-engine~transformation-design:20251226233250

---

---
name: transformation-design
description: "Build and validate transformation sets for invariance testing"
---

# transformation-design

## Purpose

Design transformation sets that are:
- Admissible (preserve evaluation + falsifiability)
- Diverse (cover multiple perturbation types)
- Graded (from mild to aggressive)

## Patterns

### Transform Families

| Family | Preserves | Destroys | Use When |
|--------|-----------|----------|----------|
| Paraphrase | Meaning | Surface form | Testing semantic invariance |
| Synonym | Denotation | Connotation sometimes | Testing lexical robustness |
| Context addition | Core claim | Focused attention | Testing distractor resistance |
| Reordering | Logical content | Sequential bias | Testing order independence |
| Negation | Form | Truth value | Contrast probe construction |
| Abstraction | Pattern | Specifics | Testing generalization |

### Admissibility Validation
```python
def validate_admissibility(transform_set: TransformSet) -> AdmissibilityResult:
    """
    Check:
    1. Evaluation preserved: apply to labeled examples, verify labels still meaningful
    2. Falsifiability preserved: apply to known-false, verify still detectable
    3. Information non-collapse: MI(original, transformed) > threshold
    """
    results = []
    for t in transform_set:
        eval_preserved = check_evaluation(t, labeled_examples)
        falsif_preserved = check_falsifiability(t, known_failures)
        info_preserved = check_information(t, test_corpus)
        results.append(TransformCheck(t, eval_preserved, falsif_preserved, info_preserved))

    return AdmissibilityResult(
        admissible=all(r.passes() for r in results),
        failing_transforms=[r.transform for r in results if not r.passes()],
        suggestions=generate_suggestions(results)
    )
```

### Graded Transform Chains
```python
# Mild → Aggressive ordering
GRADED_TRANSFORMS = [
    ("synonym_swap", 0.1),      # Very mild
    ("paraphrase", 0.3),        # Mild
    ("add_distractor", 0.5),    # Medium
    ("heavy_rewrite", 0.7),     # Strong
    ("adversarial_context", 0.9) # Aggressive
]

def find_breaking_point(hypothesis, transforms):
    """Binary search for the mildest transform that breaks invariance."""
    for name, severity in sorted(GRADED_TRANSFORMS, key=lambda x: x[1]):
        if not survives(hypothesis, name):
            return name, severity
    return None, 1.0  # Survives all
```

### Transform Composition
```python
def compose_transforms(*transforms) -> Transform:
    """Chain transforms: T1 ∘ T2 ∘ ... ∘ Tn"""
    def composed(x):
        for t in transforms:
            x = t(x)
        return x
    return Transform(
        name="+".join(t.name for t in transforms),
        apply=composed,
        expected_invariants=intersection(t.expected_invariants for t in transforms)
    )
```

## Examples

```python
# Build transform set for testing "concept stability"
concept_transforms = TransformSet([
    paraphrase_transform,
    synonym_swap_transform,
    context_distractor_transform,
])

# Validate before use
result = validate_admissibility(concept_transforms)
if not result.admissible:
    print(f"Remove: {result.failing_transforms}")
    print(f"Try: {result.suggestions}")
```

## References

- Ribeiro et al., "Beyond Accuracy: Behavioral Testing of NLP Models"
- Jia & Liang, "Adversarial Examples for Evaluating Reading Comprehension"