# aws-s3 > Handles S3 bucket operations for data storage and transfer between EC2 instances in the EC2 Ops Kit project. Use when: working with S3 buckets, uploading/downloading data for training workflows, syncing files between instances, managing checkpoints, or adding S3 integration to the CLI. - Author: Yannik - Repository: pitcany/aws-setup - Version: 20260205073012 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/pitcany/aws-setup - Web: https://mule.run/skillshub/@@pitcany/aws-setup~aws-s3:20260205073012 --- --- name: aws-s3 description: | Handles S3 bucket operations for data storage and transfer between EC2 instances in the EC2 Ops Kit project. Use when: working with S3 buckets, uploading/downloading data for training workflows, syncing files between instances, managing checkpoints, or adding S3 integration to the CLI. allowed-tools: Read, Edit, Write, Glob, Grep, Bash --- # AWS S3 Skill S3 is used in this project as the data bridge between CPU and GPU EC2 instances. There is no `ec2 s3` CLI command — all S3 operations use the AWS CLI directly. The project documents S3 workflows in `docs/setup-guide.md` and references S3 in the legacy config at `config/.env.example` (`S3_BUCKET_NAME`). ## Quick Start ### Upload data before GPU training ```bash # Single file aws s3 cp data.parquet s3://my-bucket/ # Directory (recursive) aws s3 cp ./data/ s3://my-bucket/data/ --recursive # Sync (only changed files — faster for repeated transfers) aws s3 sync ./data/ s3://my-bucket/data/ ``` ### Download on GPU instance ```bash aws s3 cp s3://my-bucket/data.parquet . aws s3 sync s3://my-bucket/data/ ./data/ ``` ### Checkpoint to S3 during training (Python) ```python import boto3 s3 = boto3.client('s3') s3.upload_file('checkpoint.pt', 'my-bucket', 'checkpoints/epoch_10.pt') ``` ## Key Concepts | Concept | Usage | Example | |---------|-------|---------| | `aws s3 cp` | Single file transfer | `aws s3 cp model.pt s3://bucket/` | | `aws s3 sync` | Incremental directory sync | `aws s3 sync ./data/ s3://bucket/data/` | | `aws s3 ls` | List bucket contents | `aws s3 ls s3://bucket/ --recursive` | | `aws s3 rm` | Delete objects | `aws s3 rm s3://bucket/old/ --recursive` | | `aws s3api` | Low-level API (create bucket, etc.) | `aws s3api create-bucket --bucket name` | | Same-region transfer | Free between EC2 and S3 | Keep bucket and instances in same region | ## Common Patterns ### Data science pipeline (CPU -> S3 -> GPU) **When:** Training models on GPU spot instances with data prepared on CPU. ```bash # On CPU instance: upload training data aws s3 sync ./prepared-data/ s3://my-bucket/training/ # Launch GPU spot ec2 up --preset gpu-t4 --name train --spot --ttl-hours 8 ec2 ssh train # On GPU instance: pull data, train, push results aws s3 sync s3://my-bucket/training/ ./data/ python train.py aws s3 cp model.pt s3://my-bucket/models/ ``` ### Bucket creation (one-time setup) ```bash BUCKET_NAME="my-datascience-$(date +%s)" aws s3api create-bucket \ --bucket "$BUCKET_NAME" \ --region us-west-2 \ --create-bucket-configuration LocationConstraint=us-west-2 ``` ## WARNING: No S3 CLI Integration This project has **no `ec2 s3` command**. S3 is documented but not wired into the CLI. If adding S3 commands, follow the pattern in `lib/cmd_instances.sh` — create `lib/cmd_s3.sh` with `cmd_s3()`, add routing in `bin/ec2`, and source it. See the **aws-ec2** skill for the command-addition workflow. ## See Also - [patterns](references/patterns.md) — Transfer patterns, cost, security, anti-patterns - [workflows](references/workflows.md) — Training pipeline, backup, bucket lifecycle ## Related Skills - See the **aws-cli** skill for `--query`, `--output`, profile/region flags - See the **aws-ec2** skill for instance lifecycle (launch GPU -> transfer data -> terminate) - See the **aws-spot** skill for spot interruption handling with S3 checkpoints - See the **bash** skill for scripting S3 operations with proper quoting and error handling