# preprocess > Preprocess training images to extract DINOv2 features and depth maps - Author: Caleb Gross - Repository: CalebisGross/fresnel - Version: 20260128234825 - Stars: 2 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/CalebisGross/fresnel - Web: https://mule.run/skillshub/@@CalebisGross/fresnel~preprocess:20260128234825 --- --- name: preprocess description: Preprocess training images to extract DINOv2 features and depth maps argument-hint: [data_dir] disable-model-invocation: true allowed-tools: Bash(source .venv/*, HSA_OVERRIDE_GFX_VERSION=*, python scripts/preprocessing/preprocess_training_data.py *) --- Extract DINOv2 features (37x37x384) and Depth Anything V2 depth maps from images. ## Output Creates in `{data_dir}/features/`: - `{name}_dinov2.bin` - DINOv2 feature vectors (37x37x384 per image) - `{name}_depth.bin` - Depth maps (256x256 per image) ## Basic Usage ```bash source .venv/bin/activate && \ HSA_OVERRIDE_GFX_VERSION=11.0.0 python scripts/preprocessing/preprocess_training_data.py \ --data_dir images/training_diverse ``` ## Options ### Background removal (recommended for objects): ```bash python scripts/preprocessing/preprocess_training_data.py \ --data_dir images/training_diverse \ --remove_background ``` ### With VLM semantic density maps: ```bash python scripts/preprocessing/preprocess_training_data.py \ --data_dir images/training_diverse \ --use_vlm \ --vlm_url http://localhost:1234/v1/chat/completions ``` ### Different DINOv2 model size: ```bash python scripts/preprocessing/preprocess_training_data.py \ --data_dir images/training_diverse \ --model_size base # small (default), base, or large ``` ## Requirements - ONNX models in `models/`: - `depth_anything_v2_small.onnx` - `dinov2_small.onnx` (or base/large) - For `--remove_background`: rembg package - For `--use_vlm`: LM Studio running locally ## When to Run - Run once per dataset, then features are cached - Re-run if you change `--model_size` or add new images - Cloud training can preprocess on-the-fly (adds ~30 min) ## Troubleshooting - **OOM**: Reduce batch size or process fewer images at once - **Missing ONNX model**: Download from project releases - **rembg not available**: `pip install rembg`