# numerai-model-implementation > Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`. - Author: Noah Harasz - Repository: numerai/example-scripts - Version: 20260202152956 - Stars: 1083 - Forks: 301 - Last Updated: 2026-02-06 - Source: https://github.com/numerai/example-scripts - Web: https://mule.run/skillshub/@@numerai/example-scripts~numerai-model-implementation:20260202152956 --- --- name: numerai-model-implementation description: Add a new Numerai model type to the agents training pipeline. Use when you need to register a model in `agents/code/modeling/utils/model_factory.py`, handle fit/predict quirks in `agents/code/modeling/utils/numerai_cv.py`, and update configs so the model can run via `python -m agents.code.modeling`. --- # Numerai Model Implementation ## Overview Add a new model type so it can be selected in configs and trained/evaluated by the base pipeline. Note: run commands from `numerai/` (so `agents` is importable), or from repo root with `PYTHONPATH=numerai`. ## Implement a New Model Type 1. Define the model API and output shape. - Implement `fit(X, y, sample_weight=...)` and `predict(X)`. - Put custom wrappers in `agents/code/modeling/models/` so model-specific code stays isolated. - Accept pandas DataFrames or convert to NumPy inside the model wrapper. 2. Register the model constructor in `agents/code/modeling/utils/model_factory.py`. - Use lazy imports so optional dependencies do not break other workflows. - Raise a clear ImportError when the dependency is missing. ```python if model_type == "XGBRegressor": try: from xgboost import XGBRegressor except ImportError as exc: raise ImportError( "xgboost is required for XGBRegressor. Install with `.venv/bin/pip install xgboost`." ) from exc return XGBRegressor(**model_params) ``` 3. Add or update a config to use the new model type. ```python CONFIG = { "model": {"type": "XGBRegressor", "params": {"n_estimators": 500}}, "training": {"cv": {"n_splits": 5}}, "data": {"data_version": "v5.2", "feature_set": "small", "target_col": "target", "era_col": "era"}, "output": {}, "preprocessing": {}, } ``` 4. Add extra data columns if the model needs them. - Update `load_and_prepare_data` in `agents/code/modeling/utils/pipeline.py` to pass extra columns into `load_full_data`. - Add corresponding config entries so experiments stay reproducible. ## Validate - Run a smoke test: `.venv/bin/python -m agents.code.modeling --config `. - Run metrics on the smoke test and make sure corr_mean is > 0.005 and < 0.04. If it's less then something is probably fundamentally wrong. If it's higher than there is likely leakage and you need to find the problem. - Double check that any early stopping mechanisms or modifications to the fit/predict loop don't over-estimate accuracy. Accurately estimating performance is of paramount importance on Numerai because we need to be able to decide if we should stake or not. - Run unit tests after refactors: `.venv/bin/python -m unittest`. ## Next Steps After validating the model implementation: 1. Use the `numerai-experiment-design` skill to run **multiple rounds** of experiments (4–5 configs per round), then **scale winners** until you hit a plateau. 2. Use the `numerai-model-upload` skill to create a pkl file **only after** you have a stable, scaled “best model” you intend to deploy. 3. Deploy to Numerai using the MCP server (see `numerai-model-upload` skill for deployment workflow).