# eval-model > Evaluate model performance on test data - Author: maminul007 - Repository: maminul007/trading-platform - Version: 20260208000611 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/maminul007/trading-platform - Web: https://mule.run/skillshub/@@maminul007/trading-platform~eval-model:20260208000611 --- --- name: eval-model description: Evaluate model performance on test data argument-hint: "[model-path|--compare|--backtest]" --- # Model Evaluator Evaluate trained model performance and compare models. ## Usage - `/eval-model models/tsmom_latest.pt` - Evaluate specific model - `/eval-model --latest tsmom` - Evaluate latest model of type - `/eval-model --compare model1.pt model2.pt` - Compare models - `/eval-model --backtest` - Run backtest evaluation - `/eval-model --walk-forward` - Walk-forward validation ## Related Code - `services/ml-server/src/validation/walk_forward_validator.py` - Walk-forward - `services/ml-server/src/backtest/engine.py` - Backtest engine - `services/ml-server/src/backtest/metrics.py` - Performance metrics ## Evaluation Metrics | Category | Metrics | |----------|---------| | Returns | Total return, CAGR, annualized return | | Risk | Volatility, max drawdown, VaR, CVaR | | Risk-adjusted | Sharpe, Sortino, Calmar, Information ratio | | Trading | Win rate, profit factor, avg win/loss | | Statistical | t-stat, p-value, stability | ## Instructions When this skill is invoked: 1. Parse arguments: - Model path: Evaluate specific model - `--latest `: Find latest model of type - `--compare`: Side-by-side comparison - `--backtest`: Full backtest evaluation - `--walk-forward`: Rolling window validation - `--period`: Evaluation time period 2. Load model and test data: - Load model checkpoint - Prepare test dataset (out-of-sample) - Generate predictions 3. Run evaluation: ```bash cd services/ml-server python -m src.validation.walk_forward_validator \ --model $MODEL_PATH \ --data $TEST_DATA \ $FLAGS ``` 4. Display evaluation results: ``` ═══════════════════════════════════════════════════════════ MODEL EVALUATION: tsmom_v2 ═══════════════════════════════════════════════════════════ Test Period: 2023-01-01 to 2023-12-31 PERFORMANCE ───────────────────────────────────────────────────────── Total Return: +24.5% Annualized Return: +24.5% Volatility: 12.3% Max Drawdown: -8.7% RISK-ADJUSTED ───────────────────────────────────────────────────────── Sharpe Ratio: 1.99 ★★★★☆ Sortino Ratio: 2.85 Calmar Ratio: 2.82 TRADING STATISTICS ───────────────────────────────────────────────────────── Total Trades: 1,247 Win Rate: 57.2% Profit Factor: 1.68 Avg Win / Avg Loss: 1.24 STATISTICAL SIGNIFICANCE ───────────────────────────────────────────────────────── t-statistic: 3.45 p-value: 0.0006 ✓ Significant ``` 5. For `--compare`: ``` Metric Model A Model B Winner ───────────────────────────────────────────────────── Sharpe Ratio 1.99 1.75 A (+14%) Max Drawdown -8.7% -12.1% A Win Rate 57.2% 54.8% A Profit Factor 1.68 1.52 A ``` 6. For `--walk-forward`: - Run rolling window validation - Show performance stability across windows - Identify regime-dependent behavior - Flag potential overfitting 7. Generate artifacts: - Evaluation report: `reports/eval__.html` - Equity curve plot - Drawdown chart - Monthly returns heatmap