# eval-model

> Evaluate model performance on test data

- Author: maminul007
- Repository: maminul007/trading-platform
- Version: 20260208000611
- Stars: 1
- Forks: 0
- Last Updated: 2026-02-07
- Source: https://github.com/maminul007/trading-platform
- Web: https://mule.run/skillshub/@@maminul007/trading-platform~eval-model:20260208000611

---

---
name: eval-model
description: Evaluate model performance on test data
argument-hint: "[model-path|--compare|--backtest]"
---

# Model Evaluator

Evaluate trained model performance and compare models.

## Usage

- `/eval-model models/tsmom_latest.pt` - Evaluate specific model
- `/eval-model --latest tsmom` - Evaluate latest model of type
- `/eval-model --compare model1.pt model2.pt` - Compare models
- `/eval-model --backtest` - Run backtest evaluation
- `/eval-model --walk-forward` - Walk-forward validation

## Related Code

- `services/ml-server/src/validation/walk_forward_validator.py` - Walk-forward
- `services/ml-server/src/backtest/engine.py` - Backtest engine
- `services/ml-server/src/backtest/metrics.py` - Performance metrics

## Evaluation Metrics

| Category | Metrics |
|----------|---------|
| Returns | Total return, CAGR, annualized return |
| Risk | Volatility, max drawdown, VaR, CVaR |
| Risk-adjusted | Sharpe, Sortino, Calmar, Information ratio |
| Trading | Win rate, profit factor, avg win/loss |
| Statistical | t-stat, p-value, stability |

## Instructions

When this skill is invoked:

1. Parse arguments:
   - Model path: Evaluate specific model
   - `--latest <type>`: Find latest model of type
   - `--compare`: Side-by-side comparison
   - `--backtest`: Full backtest evaluation
   - `--walk-forward`: Rolling window validation
   - `--period`: Evaluation time period

2. Load model and test data:
   - Load model checkpoint
   - Prepare test dataset (out-of-sample)
   - Generate predictions

3. Run evaluation:
   ```bash
   cd services/ml-server
   python -m src.validation.walk_forward_validator \
     --model $MODEL_PATH \
     --data $TEST_DATA \
     $FLAGS
   ```

4. Display evaluation results:
   ```
   ═══════════════════════════════════════════════════════════
                MODEL EVALUATION: tsmom_v2
   ═══════════════════════════════════════════════════════════

   Test Period: 2023-01-01 to 2023-12-31

   PERFORMANCE
   ─────────────────────────────────────────────────────────
   Total Return:       +24.5%
   Annualized Return:  +24.5%
   Volatility:         12.3%
   Max Drawdown:       -8.7%

   RISK-ADJUSTED
   ─────────────────────────────────────────────────────────
   Sharpe Ratio:       1.99  ★★★★☆
   Sortino Ratio:      2.85
   Calmar Ratio:       2.82

   TRADING STATISTICS
   ─────────────────────────────────────────────────────────
   Total Trades:       1,247
   Win Rate:           57.2%
   Profit Factor:      1.68
   Avg Win / Avg Loss: 1.24

   STATISTICAL SIGNIFICANCE
   ─────────────────────────────────────────────────────────
   t-statistic:        3.45
   p-value:            0.0006 ✓ Significant
   ```

5. For `--compare`:
   ```
   Metric              Model A     Model B     Winner
   ─────────────────────────────────────────────────────
   Sharpe Ratio        1.99        1.75        A (+14%)
   Max Drawdown        -8.7%       -12.1%      A
   Win Rate            57.2%       54.8%       A
   Profit Factor       1.68        1.52        A
   ```

6. For `--walk-forward`:
   - Run rolling window validation
   - Show performance stability across windows
   - Identify regime-dependent behavior
   - Flag potential overfitting

7. Generate artifacts:
   - Evaluation report: `reports/eval_<model>_<date>.html`
   - Equity curve plot
   - Drawdown chart
   - Monthly returns heatmap