# metacell

> Build metacells (metacell aggregation) from single-cell RNA-seq AnnData (.h5ad) using the metacells divide-and-conquer pipeline. Use when constructing metacell objects, choosing metacell size, propagating group annotations, or visualizing metacell embeddings from scRNA-seq data.

- Author: xuzhougeng
- Repository: xuzhougeng/single-cell-skills
- Version: 20260122111556
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/xuzhougeng/single-cell-skills
- Web: https://mule.run/skillshub/@@xuzhougeng/single-cell-skills~metacell:20260122111556

---

---
name: metacell
description: Build metacells (metacell aggregation) from single-cell RNA-seq AnnData (.h5ad) using the metacells divide-and-conquer pipeline. Use when constructing metacell objects, choosing metacell size, propagating group annotations, or visualizing metacell embeddings from scRNA-seq data.
---

# Metacell construction

Use the bundled scripts to explore datasets, run the metacell pipeline, and visualize results. Prefer these scripts over reimplementing logic.

## Quick start

1) Explore h5ad inputs to choose parameters.
2) Run the metacell divide-and-conquer pipeline.
3) Visualize metacells on UMAP.

## Scripts

- `metacell/scripts/explore.py`
  - Scan a `.h5ad` file or a directory (default: `h5ad/`) and summarize cell/gene counts, likely cell-type column, and suggested target metacell size.
  - Writes a CSV summary (default: `metacell/species_characteristics.csv` next to this skill; configurable via `--output`).
  - Run when you need a quick survey before choosing `--target-metacell-size`.

- `metacell/scripts/pipeline.py`
  - Run the metacell divide-and-conquer pipeline on a single `.h5ad`.
  - Writes `<input>.metacells.h5ad` by default.
  - By default also writes `<input>_with_metacells.h5ad` containing `obs["metacell_name"]` (cell → metacell assignment). Control via `--output-cells` / `--no-output-cells`.
  - Optionally propagates a group column from cells to metacells (`--group-key`).
  - Use for the main metacell construction.

- `metacell/scripts/visualize.py`
  - Project metacells onto the single-cell UMAP and draw a clean scatter plot.
  - Computes UMAP if missing.
  - Uses a fast edge renderer (LineCollection) and supports filtering/capping edges.
  - Use for publication-style overview plots.

## Typical workflow

1) Explore inputs and choose target size.
   - Run: `python metacell/scripts/explore.py` (defaults to `h5ad/`)
   - Or single file: `python metacell/scripts/explore.py path/to/sample.h5ad`
   - Optional: `-o/--output path/to/summary.csv` (use `--output -` to write CSV to stdout)
   - Use the suggested `target_size` per dataset.

2) Build metacells for one dataset.
   - Run: `python metacell/scripts/pipeline.py path/to/data.h5ad --target-metacell-size 96`
   - This produces:
     - `<input>.metacells.h5ad` (metacell-level AnnData)
     - `<input>_with_metacells.h5ad` (cell-level AnnData with `obs["metacell_name"]`)
   - Add group propagation if needed: `--group-key celltype_anno` (set to empty to skip).
   - Use `--lateral-gene`, `--lateral-gene-pattern`, or `--noisy-gene` when needed.

3) Visualize metacells.
   - Run: `python metacell/scripts/visualize.py cells_with_metacells.h5ad cells.metacells.h5ad -o metacells.pdf`
   - Adjust `--celltype-key` to match annotation column in metacell obs.
   - Optional edge controls:
     - `--edge-weight-min 0.1` to drop weak edges
     - `--max-edges 50000` to cap total edges drawn (speeds up large graphs)

## Notes and checks

- Ensure dependencies are available: `scanpy`, `metacells`, `pandas`, `numpy`, `seaborn`, `matplotlib`.
- The pipeline expects float input; it will cast non-float `adata.X` to float32.
- `visualize.py` requires `obs["metacell_name"]` in the cell-level AnnData to project metacells. Use the `<input>_with_metacells.h5ad` produced by `pipeline.py`.