# deep-learning-vision > Complete PyTorch/Lightning computer vision pipeline for image classification and object detection. Use when users need to (1) download and preprocess image datasets, (2) train deep learning models (ResNet, EfficientNet, ViT, Swin Transformer, ConvNeXt, etc.) with easy model experimentation, (3) set up training environments (local GPU with CUDA/Apple M1-M4, AWS, GCP, Colab), (4) track experiments with WandB, or (5) evaluate and optimize vision models. Includes both vanilla PyTorch and PyTorch Lightning implementations, 25+ model architectures, optimized Apple Silicon support for M3/M4, and document understanding models (DiT, LayoutLMv3). Supports full workflow from data collection to model deployment with clean, production-ready code. - Author: WanYoung-Oh - Repository: WanYoung-Oh/cc-system - Version: 20260209141752 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-09 - Source: https://github.com/WanYoung-Oh/cc-system - Web: https://mule.run/skillshub/@@WanYoung-Oh/cc-system~deep-learning-vision:20260209141752 --- --- name: deep-learning-vision description: Complete PyTorch/Lightning computer vision pipeline for image classification and object detection. Use when users need to (1) download and preprocess image datasets, (2) train deep learning models (ResNet, EfficientNet, ViT, Swin Transformer, ConvNeXt, etc.) with easy model experimentation, (3) set up training environments (local GPU with CUDA/Apple M1-M4, AWS, GCP, Colab), (4) track experiments with WandB, or (5) evaluate and optimize vision models. Includes both vanilla PyTorch and PyTorch Lightning implementations, 25+ model architectures, optimized Apple Silicon support for M3/M4, and document understanding models (DiT, LayoutLMv3). Supports full workflow from data collection to model deployment with clean, production-ready code. --- # Deep Learning Vision End-to-end PyTorch/Lightning workflow for computer vision tasks with emphasis on easy model experimentation, clean code structure, and multi-environment support. ## Training Frameworks This skill provides **two training frameworks**: 1. **Vanilla PyTorch** (`train.py`, `train_with_wandb.py`) - Full control, explicit training loops 2. **PyTorch Lightning** (`train_lightning.py`, `train_lightning_wandb.py`) - Clean code, less boilerplate, production-ready See [references/pytorch_lightning.md](references/pytorch_lightning.md) for detailed comparison and migration guide. ## Prerequisites **Essential packages:** ```bash pip install -r assets/project_template/requirements.txt ``` **For WandB tracking:** ```bash wandb login ``` ## Quick Start ### 1. Setup Environment Detect and configure your training environment (local, Colab, AWS, GCP): ```bash python scripts/setup_environment.py --output config/environment.json ``` Output includes: - Environment type and GPU availability - Recommended data paths - Distributed training configuration ### 2. Download Dataset Download popular vision datasets: ```bash # CIFAR-10 (60k images, 10 classes) python scripts/download_dataset.py --dataset cifar10 --data-dir ./data # CIFAR-100 (60k images, 100 classes) python scripts/download_dataset.py --dataset cifar100 --data-dir ./data # MNIST python scripts/download_dataset.py --dataset mnist --data-dir ./data # List all available datasets python scripts/download_dataset.py --help ``` For COCO or custom datasets, see script output for instructions. ### 3. Configure Preprocessing Generate preprocessing configuration: ```bash python scripts/preprocess_data.py \ --preset default \ --image-size 224 \ --dataset-type imagenet \ --output config/preprocessing.json ``` **Augmentation presets:** - `none`: Resize and normalize only - `default`: Standard augmentation (random crop, flip) - `strong`: Color jittering + rotation - `autoaugment`: AutoAugment policy List all presets: `python scripts/preprocess_data.py --list-presets` ### 4. Train Model **Choose your training framework:** #### A. PyTorch Lightning (Recommended - Less code, more features) ```bash # Basic Lightning training python scripts/train_lightning.py \ --model resnet50 \ --num-classes 10 \ --data-dir ./data \ --dataset cifar10 \ --epochs 100 \ --batch-size 32 # Lightning with WandB (cleanest approach) python scripts/train_lightning_wandb.py \ --model convnext_small \ --num-classes 10 \ --data-dir ./data \ --dataset cifar10 \ --wandb-project vision-experiments \ --wandb-name convnext-baseline \ --precision 16 # Mixed precision! --devices 4 # Multi-GPU automatic! ``` **Lightning benefits:** - 80% less boilerplate code - Automatic multi-GPU/TPU support - Built-in mixed precision training - Automatic checkpointing and logging - Production-ready code structure #### B. Vanilla PyTorch (Full control) ```bash # Basic PyTorch training python scripts/train.py \ --model resnet50 \ --num-classes 10 \ --data-dir ./data \ --dataset cifar10 \ --epochs 100 \ --batch-size 32 \ --lr 0.001 # PyTorch with WandB python scripts/train_with_wandb.py \ --model resnet50 \ --num-classes 10 \ --data-dir ./data \ --dataset cifar10 \ --wandb-project vision-experiments \ --wandb-name resnet50-cifar10 ``` **Both frameworks log:** - Training/validation metrics (loss, accuracy) - Learning rate schedule - Model architecture - Confusion matrices (WandB) - Hyperparameters ### 5. Evaluate Model ```bash python scripts/evaluate.py \ --checkpoint ./checkpoints/best_model.pth \ --data-dir ./data \ --dataset cifar10 ``` ## Model Experimentation ### Available Models **Quick reference** (see [references/model_architectures.md](references/model_architectures.md) for complete guide): | Model | Best For | Speed | Accuracy | |-------|----------|-------|----------| | resnet18/34/50/101 | General purpose, baseline | Fast | Good | | efficientnet_b0-b4 | Efficient deployment | Medium | Excellent | | vit_b_16/b_32 | Large datasets, SOTA | Slow | Best | | swin_t/s/b | Hierarchical, multi-scale | Medium | Excellent | | convnext_tiny/small/base | Modern CNN, efficient | Fast | Excellent | | mobilenet_v2/v3 | Mobile, real-time | Very Fast | Moderate | | densenet121/161 | Feature reuse, medical | Medium | Good | | dit_base, layoutlmv3 | Document understanding | Medium | N/A | ### Experiment Workflow **Try different models easily:** ```bash # With Lightning (recommended) python scripts/train_lightning_wandb.py --model resnet18 --wandb-name exp-resnet18 python scripts/train_lightning_wandb.py --model convnext_small --wandb-name exp-convnext python scripts/train_lightning_wandb.py --model swin_s --wandb-name exp-swin # With vanilla PyTorch python scripts/train_with_wandb.py --model efficientnet_b0 --wandb-name exp-efficientnet python scripts/train_with_wandb.py --model vit_b_16 --wandb-name exp-vit ``` **Compare hyperparameters:** ```bash # Experiment with learning rates python scripts/train_with_wandb.py --lr 0.1 --wandb-name lr-0.1 python scripts/train_with_wandb.py --lr 0.01 --wandb-name lr-0.01 python scripts/train_with_wandb.py --lr 0.001 --wandb-name lr-0.001 # Experiment with optimizers python scripts/train_with_wandb.py --optimizer sgd --wandb-name opt-sgd python scripts/train_with_wandb.py --optimizer adam --wandb-name opt-adam python scripts/train_with_wandb.py --optimizer adamw --wandb-name opt-adamw ``` View all experiments in WandB dashboard with comparison charts. ## Multi-Environment Support ### Local GPU (CUDA) Automatically detected. Use `--device cuda` or `--device auto`. **Optimize for NVIDIA GPUs:** - Maximize batch size until GPU memory 80-90% full - Use all GPUs: Will auto-detect and use DataParallel - Enable mixed precision for faster training (future feature) ### Apple Silicon (M1/M2/M3/M4) Automatically detected. Use `--device mps` or `--device auto`. **Setup:** ```bash # Detect your chip and get recommendations python scripts/setup_environment.py # Outputs: # Apple Chip: M3 # Recommended Batch Size: 48 ``` **Optimization by chip:** | Chip | Recommended Batch Size | Recommended Models | |------|------------------------|-------------------| | M1/M2 | 16-32 | resnet18, mobilenet_v3, efficientnet_b0 | | M3/M3 Max | 32-48 | resnet50, convnext_small, swin_t | | M3 Ultra/M4 | 48-64 | resnet101, convnext_base, swin_b | **Training example:** ```bash # M3 optimized training python scripts/train_with_wandb.py \ --model convnext_small \ --batch-size 48 \ --device mps \ --wandb-project m3-experiments ``` **Performance notes:** - M3/M4 GPUs are 5-12x faster than CPU - Unified memory allows larger models than discrete GPUs - Monitor Activity Monitor → GPU History for utilization - See [references/environment_setup.md](references/environment_setup.md) for detailed optimization ### Google Colab **Setup:** ```python # Mount Google Drive from google.colab import drive drive.mount('/content/drive') # Clone your code or install skill !git clone https://github.com/your-repo.git %cd your-repo # Run setup !python scripts/setup_environment.py ``` **Training:** ```bash !python scripts/train_with_wandb.py \ --data-dir /content/drive/MyDrive/data \ --checkpoint-dir /content/drive/MyDrive/checkpoints \ --wandb-project colab-experiments ``` Save checkpoints to Drive to survive session restarts. ### AWS SageMaker **Use managed training:** ```python from sagemaker.pytorch import PyTorch estimator = PyTorch( entry_point='scripts/train.py', source_dir='.', role=role, instance_type='ml.p3.2xlarge', # 1x V100 instance_count=1, framework_version='2.0.0', py_version='py310', hyperparameters={ 'model': 'resnet50', 'num-classes': 10, 'epochs': 100, 'batch-size': 64, } ) estimator.fit({'training': 's3://bucket/data'}) ``` See [references/environment_setup.md](references/environment_setup.md) for instance types and pricing. ### Google Cloud Platform **Vertex AI training:** ```python from google.cloud import aiplatform job = aiplatform.CustomTrainingJob( display_name='vision-training', script_path='scripts/train.py', container_uri='gcr.io/cloud-aiplatform/training/pytorch-gpu.1-13:latest', requirements=['torchvision', 'wandb'], ) job.run( args=[ '--model', 'resnet50', '--num-classes', '10', '--epochs', '100', ], replica_count=1, machine_type='n1-standard-8', accelerator_type='NVIDIA_TESLA_V100', accelerator_count=1, ) ``` ## Typical Workflows ### Workflow 1: Quick Classification Baseline (Lightning) ```bash # 1. Setup environment python scripts/setup_environment.py # 2. Download data python scripts/download_dataset.py --dataset cifar10 --data-dir ./data # 3. Train with Lightning (minimal code!) python scripts/train_lightning_wandb.py \ --model resnet18 \ --num-classes 10 \ --data-dir ./data \ --dataset cifar10 \ --epochs 50 \ --precision 16 \ --wandb-project quick-baseline ``` **Vanilla PyTorch alternative:** ```bash python scripts/train_with_wandb.py \ --model resnet18 \ --num-classes 10 \ --data-dir ./data \ --dataset cifar10 \ --epochs 50 \ --wandb-project quick-baseline ``` ### Workflow 2: Model Comparison Study (Lightning) ```bash # Train multiple models with Lightning models=("resnet50" "convnext_small" "swin_s" "efficientnet_b2") for model in "${models[@]}"; do python scripts/train_lightning_wandb.py \ --model $model \ --num-classes 10 \ --data-dir ./data \ --dataset cifar10 \ --wandb-project model-comparison \ --wandb-name $model \ --wandb-tags comparison study-1 \ --precision 16 \ --early-stopping done ``` Compare in WandB dashboard with parallel coordinates plots and metric comparisons. ### Workflow 3: Hyperparameter Tuning **Option 1: Manual sweeps** ```bash for lr in 0.1 0.01 0.001; do for bs in 32 64 128; do python scripts/train_with_wandb.py \ --lr $lr \ --batch-size $bs \ --wandb-name lr-${lr}-bs-${bs} done done ``` **Option 2: WandB Sweeps (recommended)** Create `sweep_config.yaml`: ```yaml program: scripts/train_with_wandb.py method: bayes metric: name: val/acc goal: maximize parameters: lr: distribution: log_uniform_values min: 0.0001 max: 0.1 batch_size: values: [32, 64, 128] model: values: [resnet18, resnet50, efficientnet_b0] ``` Run sweep: ```bash wandb sweep sweep_config.yaml wandb agent ``` ### Workflow 4: Custom Dataset For ImageFolder-compatible datasets: ``` data/ ├── train/ │ ├── class1/ │ │ ├── img1.jpg │ │ └── img2.jpg │ └── class2/ └── val/ ├── class1/ └── class2/ ``` Then: ```bash python scripts/train_with_wandb.py \ --model resnet50 \ --num-classes \ --data-dir ./data \ --dataset folder ``` Note: Current scripts have simplified dataset loading. For custom datasets, modify `train.py` or `train_with_wandb.py` to add `ImageFolder` loading. ## Advanced Configuration ### Using Config Files Copy template: ```bash cp assets/project_template/config.yaml my_config.yaml ``` Edit `my_config.yaml` with your settings, then (future feature): ```bash python scripts/train.py --config my_config.yaml ``` ### Model Selection Guide **Quick decision tree:** 1. **Need fast inference?** → mobilenet_v3, convnext_tiny, efficientnet_b0 2. **Need high accuracy?** → convnext_base, swin_b, efficientnet_b3, vit_b_16 3. **Large dataset (>100k)?** → swin_b, vit_b_16, convnext_large 4. **Small dataset (<10k)?** → resnet18, convnext_tiny with pretrained weights 5. **Limited GPU memory?** → mobilenet_v2, resnet18, efficientnet_b0 6. **Medical/scientific imaging?** → densenet121, swin_b, resnet50 7. **Multi-scale/hierarchical tasks?** → swin_s, swin_b, convnext_base 8. **Document understanding?** → DiT-base, LayoutLMv3 (requires HuggingFace) 9. **Using Apple M3/M4?** → convnext_small, swin_s, resnet50 See [references/model_architectures.md](references/model_architectures.md) for detailed recommendations. ### Environment-Specific Best Practices See [references/environment_setup.md](references/environment_setup.md) for: - GPU setup (CUDA, MPS) - Cloud platform configuration - Distributed training - Cost optimization - Troubleshooting ## Bundled Resources **PyTorch Lightning Scripts (Recommended):** - `lightning_module.py`: LightningModule with 25+ models - `lightning_data.py`: DataModule for data loading - `train_lightning.py`: Clean Lightning training - `train_lightning_wandb.py`: Lightning + WandB integration **Vanilla PyTorch Scripts:** - `train.py`: Basic PyTorch training - `train_with_wandb.py`: PyTorch + WandB integration - `evaluate.py`: Model evaluation **Utilities:** - `download_dataset.py`: Download popular datasets - `preprocess_data.py`: Configure augmentation - `setup_environment.py`: Environment detection **References:** - `pytorch_lightning.md`: Lightning guide, comparison, best practices - `model_architectures.md`: Complete model guide with selection criteria - `environment_setup.md`: Multi-platform setup and optimization **Assets:** - `project_template/`: Starter files (requirements.txt, config.yaml, README template) ## Common Use Cases **User request:** "Train a model to classify my images" 1. Run `setup_environment.py` to detect GPU 2. Run `download_dataset.py` or prepare custom data 3. Run `train_with_wandb.py` with resnet50 baseline 4. Iterate on model selection based on results **User request:** "Compare ResNet50 vs EfficientNet" 1. Train both models with same config using `train_with_wandb.py` 2. Use different `--wandb-name` for each 3. Compare metrics in WandB dashboard **User request:** "Set up training on AWS" 1. Read `references/environment_setup.md` for SageMaker setup 2. Create SageMaker estimator with `scripts/train.py` 3. Configure S3 data paths **User request:** "My training is slow" 1. Check GPU utilization: `nvidia-smi` 2. Review `environment_setup.md` troubleshooting section 3. Try smaller model or larger batch size ## Tips - **Use PyTorch Lightning** for cleaner code and automatic features (multi-GPU, mixed precision) - **Always use pretrained weights** unless dataset is very different from ImageNet - **Start with resnet50 or convnext_small** as baseline, then experiment - **Use WandB** to track all experiments systematically - **Enable mixed precision** with `--precision 16` for 2-3x faster training - **Monitor GPU usage** to optimize batch size (target 80-90% utilization) - **Use early stopping** to prevent overtraining and save compute - **Save checkpoints frequently** especially on cloud platforms with time limits - **Leverage cloud spot instances** for cost savings (70-80% cheaper) ### Quick Comparison: When to use what? | Use Case | Framework | Script | |----------|-----------|--------| | Production training | Lightning | `train_lightning_wandb.py` | | Multi-GPU training | Lightning | `train_lightning.py --devices 4` | | Learning fundamentals | PyTorch | `train.py` | | Quick experiments | Lightning | `train_lightning.py --fast-dev-run` | | Custom training loop | PyTorch | `train.py` (modify as needed) |