# zstd-compression-engineer > Expert guide for implementing Zstandard (zstd) compression and decompression. Use when working with zstd library for data compression tasks including simple compression, streaming operations, dictionary-based compression, or when users need help with compression performance optimization, error handling, or choosing the right API for their use case. - Author: Denis Dvornikov - Repository: dennypenta/home - Version: 20260127111221 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/dennypenta/home - Web: https://mule.run/skillshub/@@dennypenta/home~zstd-compression-engineer:20260127111221 --- --- name: zstd-compression-engineer description: Expert guide for implementing Zstandard (zstd) compression and decompression. Use when working with zstd library for data compression tasks including simple compression, streaming operations, dictionary-based compression, or when users need help with compression performance optimization, error handling, or choosing the right API for their use case. --- # Zstd Compression Engineer Expert guidance for implementing Zstandard (zstd) compression in any programming language. ## Quick Decision Tree Choose your API based on the use case: 1. **Simple one-off compression** → Use `ZSTD_compress()` / `ZSTD_decompress()` 2. **Large files or unknown sizes** → Use streaming API (`ZSTD_compressStream2()` / `ZSTD_decompressStream()`) 3. **Many small similar files** → Use dictionary compression (`ZSTD_compress_usingCDict()`) 4. **Repeated operations** → Reuse contexts (`ZSTD_compressCCtx()` / `ZSTD_decompressDCtx()`) ## Core Implementation Patterns ### Pattern 1: Simple Compression ```c // Allocate destination buffer size_t dstCapacity = ZSTD_compressBound(srcSize); void* dst = malloc(dstCapacity); // Compress size_t compressedSize = ZSTD_compress(dst, dstCapacity, src, srcSize, compressionLevel); // Always check for errors if (ZSTD_isError(compressedSize)) { fprintf(stderr, "Compression failed: %s\n", ZSTD_getErrorName(compressedSize)); // Handle error } ``` **Key points:** - Use `ZSTD_compressBound()` to calculate required buffer size - Default compression level is 3 (balance of speed/ratio) - Levels 1-3: fast, 4-9: balanced, 10-19: high compression, 20-22: ultra (memory intensive) ### Pattern 2: Context Reuse for Multiple Operations ```c // Create context once ZSTD_CCtx* cctx = ZSTD_createCCtx(); // Use for multiple compressions for (each file) { size_t result = ZSTD_compressCCtx(cctx, dst, dstCapacity, src, srcSize, level); // Process result } // Cleanup ZSTD_freeCCtx(cctx); ``` **Benefits:** - Reuses allocated memory across operations - Better performance than creating new contexts - No impact on compression ratio ### Pattern 3: Streaming for Large Data See `references/streaming-api.md` for complete streaming implementation guide. **Use streaming when:** - Source data doesn't fit in memory - Decompressed size is unknown - Processing data incrementally (network streams, pipes) **Buffer size recommendations:** - Input: `ZSTD_CStreamInSize()` / `ZSTD_DStreamInSize()` - Output: `ZSTD_CStreamOutSize()` / `ZSTD_DStreamOutSize()` ### Pattern 4: Dictionary Compression See `references/dictionary-compression.md` for complete dictionary usage guide. **Use dictionaries when:** - Compressing many small similar files (< 100KB each) - Data has repeated patterns across files - Working with structured data (JSON, XML, logs) **Critical rule:** Pre-digest dictionaries with `ZSTD_createCDict()` for repeated use. Loading raw dictionaries repeatedly kills performance. ## Error Handling **Always check results:** ```c size_t result = ZSTD_compress(...); if (ZSTD_isError(result)) { const char* errMsg = ZSTD_getErrorName(result); // Handle error } ``` **Context recovery after errors:** - Contexts may be in undefined state after errors - Reset before reuse: `ZSTD_CCtx_reset()` or `ZSTD_DCtx_reset()` **Untrusted data validation:** - Always validate decompressed sizes from untrusted sources - Use `ZSTD_getFrameContentSize()` to check size before allocating - Implement application-specific size limits - Prefer streaming decompression for untrusted data ## Thread Safety **Per-thread contexts:** - Maintain separate `ZSTD_CCtx` per thread - Never share contexts across threads **Shared thread pools (optional):** ```c ZSTD_threadPool* pool = ZSTD_createThreadPool(numThreads); ZSTD_CCtx_refThreadPool(cctx, pool); ``` ## Common Pitfalls 1. **Forgetting to check `ZSTD_compressBound()`** → Buffer overflow 2. **Loading dictionaries repeatedly** → Performance degradation 3. **Not checking `ZSTD_isError()`** → Silent failures 4. **Sharing contexts across threads** → Undefined behavior 5. **Trusting decompressed sizes** → Memory exhaustion attacks ## Performance Tuning **Compression level selection:** - Level 1-3: Real-time compression, minimal CPU - Level 4-9: General purpose (recommended starting point) - Level 10-19: Offline compression, archival - Level 20-22: Maximum compression, high memory usage **Advanced parameters:** - Window log: Controls memory usage and compression ratio - Strategy: fast, dfast, greedy, lazy, btopt (automatic selection usually best) - See `references/api-reference.md` for complete parameter list ## Language-Specific Notes **C/C++:** Direct library access, use patterns above **Python:** Use `zstandard` package (python-zstandard) **Node.js:** Use `@mongodb-js/zstd` or `node-zstd` **Go:** Use `github.com/klauspost/compress/zstd` **Rust:** Use `zstd` crate **Java:** Use `com.github.luben:zstd-jni` All language bindings follow the same conceptual patterns: simple compression, streaming, dictionary support. ## Reference Documentation For detailed API specifications: - **Streaming API guide**: `references/streaming-api.md` - **Dictionary compression**: `references/dictionary-compression.md` - **Complete API reference**: `references/api-reference.md` - **Official docs**: https://facebook.github.io/zstd/doc/api_manual_latest.html ## Implementation Checklist When implementing zstd compression: - [ ] Choose correct API (simple/streaming/dictionary) - [ ] Calculate buffer sizes with `ZSTD_compressBound()` - [ ] Select appropriate compression level - [ ] Implement error checking with `ZSTD_isError()` - [ ] Reuse contexts for multiple operations - [ ] Handle context reset after errors - [ ] Validate untrusted data sizes - [ ] Test with actual data to verify correctness