# sequence

> Analyze biological sequences using Biopython - translate, align, parse FASTA/GenBank

- Author: lamm-mit
- Repository: lamm-mit/scienceclaw
- Version: 20260201061714
- Stars: 0
- Forks: 1
- Last Updated: 2026-02-06
- Source: https://github.com/lamm-mit/scienceclaw
- Web: https://mule.run/skillshub/@@lamm-mit/scienceclaw~sequence:20260201061714

---

---
name: sequence
description: Analyze biological sequences using Biopython - translate, align, parse FASTA/GenBank
metadata:
  openclaw:
    emoji: "🧪"
    requires:
      bins:
        - python3
---

# Sequence Analysis

Analyze biological sequences using Biopython. Translate DNA, compute statistics, parse sequence files, and perform basic alignments.

## Overview

This skill provides sequence analysis capabilities including:
- DNA/RNA translation to protein
- Sequence statistics (GC content, molecular weight, etc.)
- Reverse complement
- FASTA/GenBank file parsing
- Sequence alignment
- Motif searching

## Usage

### Translate DNA to protein:
```bash
python3 {baseDir}/scripts/sequence_tools.py translate --sequence "ATGCGATCGATCGATCG"
```

### Compute sequence statistics:
```bash
python3 {baseDir}/scripts/sequence_tools.py stats --sequence "ATGCGATCGATCGATCG"
```

### Get reverse complement:
```bash
python3 {baseDir}/scripts/sequence_tools.py revcomp --sequence "ATGCGATCGATCG"
```

### Parse FASTA file:
```bash
python3 {baseDir}/scripts/sequence_tools.py parse --file sequences.fasta --format fasta
```

### Find ORFs:
```bash
python3 {baseDir}/scripts/sequence_tools.py orfs --sequence "ATGCGATCGATCGATCGTAG"
```

### Search for motif:
```bash
python3 {baseDir}/scripts/sequence_tools.py motif --sequence "ATGCGATCGATCG" --pattern "GATC"
```

## Commands

### translate
Translate DNA/RNA sequence to protein.

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--sequence` | DNA/RNA sequence or file | Required |
| `--table` | Codon table (1=standard, 2=mitochondrial, etc.) | 1 |
| `--frame` | Reading frame (1, 2, 3, -1, -2, -3) | 1 |
| `--all-frames` | Translate all 6 reading frames | False |
| `--to-stop` | Translate until first stop codon | False |

### stats
Compute sequence statistics.

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--sequence` | Sequence or file | Required |
| `--type` | Sequence type: dna, rna, protein, auto | auto |

Output includes:
- Length
- GC content (nucleotide)
- Molecular weight
- Base/amino acid composition

### revcomp
Get reverse complement of DNA sequence.

| Parameter | Description |
|-----------|-------------|
| `--sequence` | DNA sequence or file |

### parse
Parse sequence files (FASTA, GenBank, etc.).

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--file` | Input file path | Required |
| `--format` | File format: fasta, genbank, embl | auto |
| `--output` | Output format: summary, fasta, json | summary |

### orfs
Find Open Reading Frames.

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--sequence` | DNA sequence or file | Required |
| `--min-length` | Minimum ORF length (codons) | 30 |
| `--table` | Codon table | 1 |

### motif
Search for sequence motifs/patterns.

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--sequence` | Sequence to search | Required |
| `--pattern` | Pattern to find (supports IUPAC codes) | Required |

## Examples

### Translate with specific codon table:
```bash
python3 {baseDir}/scripts/sequence_tools.py translate --sequence "ATGCGATCG" --table 2
```

### Get stats for protein sequence:
```bash
python3 {baseDir}/scripts/sequence_tools.py stats --sequence "MTEYKLVVVGAGGVGKSALTIQLIQ" --type protein
```

### Parse GenBank file and extract sequences:
```bash
python3 {baseDir}/scripts/sequence_tools.py parse --file gene.gb --format genbank --output fasta
```

### Find all ORFs with minimum 50 codons:
```bash
python3 {baseDir}/scripts/sequence_tools.py orfs --file genome.fasta --min-length 50
```

### Translate all 6 reading frames:
```bash
python3 {baseDir}/scripts/sequence_tools.py translate --sequence "ATGCGATCGATCGATCG" --all-frames
```

## Codon Tables

| ID | Description |
|----|-------------|
| 1 | Standard |
| 2 | Vertebrate Mitochondrial |
| 3 | Yeast Mitochondrial |
| 4 | Mold/Protozoan Mitochondrial |
| 5 | Invertebrate Mitochondrial |
| 6 | Ciliate Nuclear |
| 11 | Bacterial/Archaeal/Plant Plastid |

## IUPAC Codes

### Nucleotides
- R = A or G (purine)
- Y = C or T (pyrimidine)
- S = G or C
- W = A or T
- K = G or T
- M = A or C
- N = any nucleotide

### Amino Acids
- X = any amino acid
- B = D or N
- Z = E or Q

## Notes

- Sequences can be provided directly or as file paths
- Auto-detection identifies DNA/RNA/protein sequences
- Large files are processed efficiently with streaming