# substreams-dev > Expert knowledge for developing, building, and debugging Substreams projects on any blockchain. Use when working with substreams.yaml manifests, Rust modules, protobuf schemas, or blockchain data processing. - Author: Matthieu Vachon - Repository: streamingfast/substreams-skills - Version: 20260126163938 - Stars: 3 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/streamingfast/substreams-skills - Web: https://mule.run/skillshub/@@streamingfast/substreams-skills~substreams-dev:20260126163938 --- --- name: substreams-dev description: Expert knowledge for developing, building, and debugging Substreams projects on any blockchain. Use when working with substreams.yaml manifests, Rust modules, protobuf schemas, or blockchain data processing. license: Apache-2.0 compatibility: platforms: [claude-code, cursor, vscode, windsurf] metadata: version: 1.0.0 author: StreamingFast documentation: https://substreams.streamingfast.io --- # Substreams Development Expert Expert assistant for building Substreams projects - high-performance blockchain data indexing and transformation. ## Core Concepts ### What is Substreams? Substreams is a powerful blockchain indexing technology that enables: - **Parallel processing** of blockchain data with high performance - **Composable modules** written in Rust (map, store, index types) - **Protobuf schemas** for typed data structures - **Streaming-first** architecture with cursor-based reorg handling ### Key Components 1. **Manifest** (`substreams.yaml`): Defines modules, networks, dependencies 2. **Modules**: Map (transform), Store (aggregate), Index (filter) 3. **Protobuf**: Type-safe schemas for inputs and outputs 4. **WASM**: Rust code compiled to WebAssembly for execution ## Project Structure ``` my-substreams/ ├── substreams.yaml # Manifest ├── proto/ │ └── events.proto # Schema definitions ├── src/ │ └── lib.rs # Rust module code ├── Cargo.toml # Rust dependencies └── build/ # Generated files (gitignored) ``` ## Common Workflows ### Creating a New Project 1. **Initialize**: Use `substreams init` or create manifest manually 2. **Define schema**: Create `.proto` files for your data structures 3. **Implement modules**: Write Rust handlers in `src/lib.rs` 4. **Build**: Run `substreams build` to compile to `.spkg` 5. **Test**: Run `substreams run` with small block range (recommended: 1000 blocks) 6. **Deploy**: Publish to registry or deploy as service ### Module Types **Map Module** - Transforms input to output ```yaml - name: map_events kind: map inputs: - source: sf.ethereum.type.v2.Block output: type: proto:my.types.Events ``` **Store Module** - Aggregates data across blocks ```yaml - name: store_totals kind: store updatePolicy: add valueType: int64 inputs: - map: map_events ``` **Index Module** - Filters blocks for efficient querying ```yaml - name: index_transfers kind: index inputs: - map: map_events output: type: proto:sf.substreams.index.v1.Keys ``` ### Debugging Checklist When modules produce unexpected results: 1. **Validate manifest**: `substreams graph` to visualize dependencies 2. **Test small range**: Run 100-1000 blocks, inspect outputs carefully 3. **Check logs**: Look for WASM panics, protobuf decode errors 4. **Verify schema**: Ensure proto types match expected data 5. **Review inputs**: Confirm input modules produce correct data 6. **Initial block**: Check `initialBlock` is set appropriately ### Performance Optimization * **Use indexes** to skip irrelevant blocks * **Minimize store size** by storing only necessary data * **Production mode** enables parallel execution: `--production-mode` * **Module granularity**: Smaller, focused modules perform better * **Avoid deep nesting**: Flatten module dependencies when possible ## Manifest Reference See [references/manifest-spec.md](./references/manifest-spec.md) for complete specification. ### Key Sections **Package metadata**: ```yaml specVersion: v0.1.0 package: name: my-substreams version: v1.0.0 doc: Description of what this substreams does ``` **Protobuf imports**: ```yaml protobuf: files: - events.proto importPaths: - ./proto ``` **Binary reference** (WASM code): ```yaml binaries: default: type: wasm/rust-v1 file: ./target/wasm32-unknown-unknown/release/my_substreams.wasm ``` **Network configuration**: ```yaml network: mainnet ``` Supported networks: See [references/networks.md](./references/networks.md) ## Rust Module Development ### Map Handler Example ```rust use substreams::prelude::*; use substreams_ethereum::pb::eth::v2::Block; #[substreams::handlers::map] pub fn map_events(block: Block) -> Result { let mut events = Events::default(); for trx in block.transactions() { for log in trx.logs() { // Process logs, extract events if is_transfer_event(log) { events.transfers.push(extract_transfer(log)); } } } Ok(events) } ``` ### Store Handler Example ```rust #[substreams::handlers::store] pub fn store_totals(events: Events, store: StoreAddInt64) { for transfer in events.transfers { store.add(0, &transfer.token, transfer.amount as i64); } } ``` ### Best Practices * **Handle errors gracefully**: Use `Result` returns * **Log sparingly**: Excessive logging impacts performance * **Validate inputs**: Check for null/empty data before processing * **Use substreams helpers**: Leverage `substreams-ethereum` crate * **Test locally first**: Always test with `substreams run` before deploying * **Avoid excessive cloning**: Use ownership transfer (see Performance section below) ## Performance: Avoiding Excessive Cloning **CRITICAL:** One of the greatest performance impacts in Substreams is excessive cloning of data structures. ### The Problem Cloning large data structures is expensive: * ❌ **Cloning a Transaction**: Copies all fields, logs, traces * ❌ **Cloning a Block**: Copies the entire block including all transactions (EXTREMELY expensive) * ❌ **Cloning in loops**: Multiplies the cost by number of iterations ### The Solution: Ownership Transfer Use Rust's ownership system to transfer or borrow data instead of cloning. #### Bad Example (Excessive Cloning) ```rust #[substreams::handlers::map] pub fn map_events(block: Block) -> Result { let mut events = Events::default(); for trx in block.transactions() { // ❌ BAD: Cloning entire transaction let transaction = trx.clone(); for log in transaction.receipt.logs { // ❌ BAD: Cloning log let log_copy = log.clone(); if is_transfer_event(&log_copy) { events.transfers.push(extract_transfer(&log_copy)); } } } Ok(events) } ``` #### Good Example (Ownership Transfer) ```rust #[substreams::handlers::map] pub fn map_events(block: Block) -> Result { let mut events = Events::default(); // ✅ GOOD: Iterate by reference for trx in block.transactions() { // ✅ GOOD: Borrow, don't clone for log in &trx.receipt.logs { if is_transfer_event(log) { // ✅ GOOD: Only extract what you need events.transfers.push(extract_transfer(log)); } } } Ok(events) } fn is_transfer_event(log: &Log) -> bool { // Use reference, no cloning !log.topics.is_empty() && log.topics[0] == TRANSFER_EVENT_SIGNATURE } fn extract_transfer(log: &Log) -> Transfer { // Extract only the fields you need Transfer { from: Hex::encode(&log.topics[1]), to: Hex::encode(&log.topics[2]), amount: Hex::encode(&log.data), // Don't copy the entire log } } ``` ### When Cloning is Acceptable Clone only small, necessary data: ```rust // ✅ OK: Cloning small strings let token_address = Hex::encode(&log.address).clone(); // ✅ OK: Cloning primitive types let block_number = block.number.clone(); // ❌ BAD: Cloning entire structures let block_copy = block.clone(); // Never do this! let trx_copy = transaction.clone(); // Avoid this! ``` ### Performance Tips 1. **Iterate by reference**: Use `&` when iterating ```rust for log in &trx.receipt.logs { } // Good for log in trx.receipt.logs.clone() { } // Bad ``` 2. **Use references when appropriate**: Pass references to avoid unnecessary cloning ```rust fn process_log(log: &Log) { } // Good for read-only access fn process_log(log: Log) { } // Good when consuming/transforming data ``` 3. **Extract minimal data**: Only copy what you actually need ```rust // Good: Extract only needed fields let amount = parse_amount(&log.data); // Bad: Copy entire log just to get one field let log_copy = log.clone(); let amount = parse_amount(&log_copy.data); ``` 4. **Use** `into()` for consumption: When you need to consume data ```rust // When you truly need to take ownership events.transfers.push(Transfer { from: topics[1].into(), // Consumes the data to: topics[2].into(), }); ``` ### Common Pitfalls **Pitfall #1: Cloning in filters** ```rust // ❌ BAD block.transactions() .iter() .filter(|trx| trx.clone().to == target) // Clone every transaction! // ✅ GOOD block.transactions() .iter() .filter(|trx| trx.to == target) // Just compare ``` **Pitfall #2: Unnecessary defensive copies** ```rust // ❌ BAD let block_copy = block.clone(); for trx in block_copy.transactions() { } // Why clone the whole block? // ✅ GOOD for trx in block.transactions() { } // Use the block directly ``` **Pitfall #3: Cloning for mutation** ```rust // ❌ BAD let mut trx_copy = trx.clone(); trx_copy.value = process(trx_copy.value); // Clone just to mutate // ✅ GOOD let new_value = process(&trx.value); // Process reference, create new value ``` ### Measuring Impact Use `substreams run` with timing to measure performance: ```bash # Test with cloning (slow) time substreams run -s 17000000 -t +1000 map_events # Test without cloning (fast) time substreams run -s 17000000 -t +1000 map_events # You should see significant speedup (2-10x) by avoiding clones ``` ### Remember * 🎯 **Measure performance impact**: Use timing with `substreams run` to identify bottlenecks * 🎯 **Clone only when necessary**: Most of the time, borrowing is sufficient * 🎯 **Block cloning is almost never needed**: This is the #1 performance killer * 🎯 **Transaction cloning should be rare**: Extract only the data you need ## Common Patterns See [references/patterns.md](./references/patterns.md) for detailed examples: * Event extraction from logs * Store aggregation patterns * Multi-module composition * Parameterized modules * Dynamic data sources ## Development Tips 1. **Start small**: Begin with 1000 block range for testing 2. **Use GUI**: `substreams gui` for visual debugging (when available) 3. **Check estimates**: `substreams estimate` before processing large ranges 4. **Version control**: Commit `.spkg` files for reproducibility 5. **Document modules**: Add `doc:` fields in manifest for clarity ## Troubleshooting **Build fails**: * Check Rust toolchain: `rustup target add wasm32-unknown-unknown` * Verify proto imports are correct * Ensure binary path in manifest matches build output **Empty output**: * Confirm `initialBlock` is before first relevant block * Check module isn't filtered out by upstream index * Verify input data exists in block range **Performance issues**: * Add indexes to skip irrelevant blocks * Use `--production-mode` for large ranges * Check store size (use `substreams gui` or estimate) ## Resources * [Official Documentation](https://substreams.streamingfast.io) * [Module Types Guide](./references/module-types.md) * [Manifest Specification](./references/manifest-spec.md) * [Common Patterns](./references/patterns.md) * [Supported Networks](./references/networks.md) ## Getting Help * [Discord Community](https://discord.gg/streamingfast) * [GitHub Issues](https://github.com/streamingfast/substreams/issues) * [Documentation](https://substreams.streamingfast.io)