# metacell
> Build metacells (metacell aggregation) from single-cell RNA-seq AnnData (.h5ad) using the metacells divide-and-conquer pipeline. Use when constructing metacell objects, choosing metacell size, propagating group annotations, or visualizing metacell embeddings from scRNA-seq data.
- Author: xuzhougeng
- Repository: xuzhougeng/single-cell-skills
- Version: 20260122111556
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/xuzhougeng/single-cell-skills
- Web: https://mule.run/skillshub/@@xuzhougeng/single-cell-skills~metacell:20260122111556
---
---
name: metacell
description: Build metacells (metacell aggregation) from single-cell RNA-seq AnnData (.h5ad) using the metacells divide-and-conquer pipeline. Use when constructing metacell objects, choosing metacell size, propagating group annotations, or visualizing metacell embeddings from scRNA-seq data.
---
# Metacell construction
Use the bundled scripts to explore datasets, run the metacell pipeline, and visualize results. Prefer these scripts over reimplementing logic.
## Quick start
1) Explore h5ad inputs to choose parameters.
2) Run the metacell divide-and-conquer pipeline.
3) Visualize metacells on UMAP.
## Scripts
- `metacell/scripts/explore.py`
- Scan a `.h5ad` file or a directory (default: `h5ad/`) and summarize cell/gene counts, likely cell-type column, and suggested target metacell size.
- Writes a CSV summary (default: `metacell/species_characteristics.csv` next to this skill; configurable via `--output`).
- Run when you need a quick survey before choosing `--target-metacell-size`.
- `metacell/scripts/pipeline.py`
- Run the metacell divide-and-conquer pipeline on a single `.h5ad`.
- Writes `.metacells.h5ad` by default.
- By default also writes `_with_metacells.h5ad` containing `obs["metacell_name"]` (cell → metacell assignment). Control via `--output-cells` / `--no-output-cells`.
- Optionally propagates a group column from cells to metacells (`--group-key`).
- Use for the main metacell construction.
- `metacell/scripts/visualize.py`
- Project metacells onto the single-cell UMAP and draw a clean scatter plot.
- Computes UMAP if missing.
- Uses a fast edge renderer (LineCollection) and supports filtering/capping edges.
- Use for publication-style overview plots.
## Typical workflow
1) Explore inputs and choose target size.
- Run: `python metacell/scripts/explore.py` (defaults to `h5ad/`)
- Or single file: `python metacell/scripts/explore.py path/to/sample.h5ad`
- Optional: `-o/--output path/to/summary.csv` (use `--output -` to write CSV to stdout)
- Use the suggested `target_size` per dataset.
2) Build metacells for one dataset.
- Run: `python metacell/scripts/pipeline.py path/to/data.h5ad --target-metacell-size 96`
- This produces:
- `.metacells.h5ad` (metacell-level AnnData)
- `_with_metacells.h5ad` (cell-level AnnData with `obs["metacell_name"]`)
- Add group propagation if needed: `--group-key celltype_anno` (set to empty to skip).
- Use `--lateral-gene`, `--lateral-gene-pattern`, or `--noisy-gene` when needed.
3) Visualize metacells.
- Run: `python metacell/scripts/visualize.py cells_with_metacells.h5ad cells.metacells.h5ad -o metacells.pdf`
- Adjust `--celltype-key` to match annotation column in metacell obs.
- Optional edge controls:
- `--edge-weight-min 0.1` to drop weak edges
- `--max-edges 50000` to cap total edges drawn (speeds up large graphs)
## Notes and checks
- Ensure dependencies are available: `scanpy`, `metacells`, `pandas`, `numpy`, `seaborn`, `matplotlib`.
- The pipeline expects float input; it will cast non-float `adata.X` to float32.
- `visualize.py` requires `obs["metacell_name"]` in the cell-level AnnData to project metacells. Use the `_with_metacells.h5ad` produced by `pipeline.py`.