# oec-experimentation-framework

> Define and apply an Overall Evaluation Criterion (OEC) to measure the long-term impact of product changes. Use this skill when designing A/B tests, setting quarterly OKRs, or when short-term wins (like revenue) are potentially harming long-term user retention.

- Author: Samarvir singh
- Repository: samarv/Shanon
- Version: 20260125165455
- Stars: 13
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/samarv/Shanon
- Web: https://mule.run/skillshub/@@samarv/Shanon~oec-experimentation-framework:20260125165455

---

---
name: oec-experimentation-framework
description: Define and apply an Overall Evaluation Criterion (OEC) to measure the long-term impact of product changes. Use this skill when designing A/B tests, setting quarterly OKRs, or when short-term wins (like revenue) are potentially harming long-term user retention.
---

The Overall Evaluation Criterion (OEC) is a single composite metric used to determine the success of an experiment by balancing short-term gains against long-term health. It prevents teams from "gaming" the system by optimizing for narrow metrics that inadvertently hurt the user experience.

## The OEC Construction Workflow

### 1. Identify Primary Success Metrics
Identify the immediate behavior you want to encourage (e.g., clicks, revenue, bookings). This is your "top-line" metric.

### 2. Identify Countervailing (Guardrail) Metrics
Determine what "bad" behaviors the top-line metric might encourage. These metrics act as a check on the primary goal.
- **Revenue goals?** Check for churn or user sentiment.
- **Click-through goals?** Check for bounce rate or "successful sessions" (e.g., clicks that stay on the page for >30 seconds).
- **Email volume?** Check for unsubscribe rates.

### 3. Model Lifetime Value (LTV)
Assign a dollar value to negative long-term actions to create a balanced formula. 
- **Example Calculation:** `OEC = (Short-term Revenue) - (Unsubscribe Rate * Estimated Value of a Subscriber)`.
- If an email campaign generates $1,000 but causes 500 unsubscribes, and each subscriber is worth $3/year, the campaign is net-negative (-$500).

### 4. Set Constraint Optimizations
If a composite formula is too complex, use the "Fixed Budget" approach.
- Allow the team to optimize a metric (e.g., Revenue) only if a guardrail metric (e.g., Page Load Latency or Ad Real Estate) remains within a fixed "budget."

## Validity Testing with Twyman’s Law
"Any figure that looks interesting or different is usually wrong." Before celebrating a "home run" result:
- **Check for Sample Ratio Mismatch (SRM):** If you designed a 50/50 split but get 50.2/49.8 with 1M users, the experiment is likely invalid due to bot interference or data pipeline issues.
- **Hold the Celebration:** If a metric moves by 10% when you expected 1%, assume it is a bug (e.g., double-logging revenue) until proven otherwise.
- **Replicate:** Rerun the experiment. If the P-value stays significant (e.g., <0.01), trust increases.

## The Experimentation Portfolio
Manage your product roadmap like a stock portfolio:
- **Incremental (70-80%):** Small, "inch-by-inch" improvements with high probability of success.
- **High Risk/High Reward (10-20%):** Radical redesigns or new features. Expect an 80-90% failure rate here, but aim for "home runs" that move the needle by 5-12%.

## Examples

**Example 1: Search Engine Monetization**
- **Context:** A team wants to increase revenue by adding more ads to the search results page.
- **Input:** Primary metric = Ad Revenue. Countervailing metric = Session Success Rate (did the user find what they wanted?).
- **Application:** Create an OEC where revenue gains are only "wins" if the Session Success Rate does not drop by more than 0.1%.
- **Output:** The team realizes that while 4 ads make more money today, they increase churn. They settle on 3 ads as the optimal OEC balance.

**Example 2: Retention Marketing**
- **Context:** An email team is measured by "Revenue from Email Clicks."
- **Input:** Team increases email frequency to 5x per week.
- **Application:** Apply the "Cost of Spam" model. Subtract $5 from the "success" total for every unsubscribe.
- **Output:** The data shows that 50% of the high-frequency campaigns are actually destroying long-term value. The team reduces frequency to 2x per week, increasing total LTV.

## Common Pitfalls
- **Shipping on "Flat" Results:** Never ship a feature if the results are not statistically significant. This introduces "code debt" and maintenance costs for zero user benefit.
- **Ignoring the Sunk Cost Fallacy:** Avoid shipping a 6-month project just because the team spent time on it. If the A/B test is negative, the project failed.
- **Peeking at P-values:** Do not stop an experiment early just because the P-value hit 0.05. This leads to "Type 1" errors (false positives). Wait for the full designated duration.
- **Trusting One-off Successes:** If a result is "too good to be true," it is likely a tracking bug or a bot. Apply Twyman's Law immediately.