# bio-write-sequences > Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records. - Author: mdbabumiamssm - Repository: mdbabumiamssm/Universal-Life-Science-and-Clinical-Skills- - Version: 20260206103107 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/mdbabumiamssm/Universal-Life-Science-and-Clinical-Skills- - Web: https://mule.run/skillshub/@@mdbabumiamssm/Universal-Life-Science-and-Clinical-Skills-~bio-write-sequences:20260206103107 --- --- name: bio-write-sequences description: Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records. tool_type: python primary_tool: Bio.SeqIO --- # Write Sequences Write SeqRecord objects to sequence files using Biopython's Bio.SeqIO module. ## Required Import ```python from Bio import SeqIO from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord ``` ## Core Functions ### SeqIO.write() - Write Records to File Write one or more SeqRecord objects to a file. ```python SeqIO.write(records, 'output.fasta', 'fasta') ``` **Parameters:** - `records` - Single SeqRecord, list, or iterator of SeqRecords - `handle` - Filename (string) or file handle - `format` - Output format string **Returns:** Number of records written (integer) ### record.format() - Get Formatted String Get a string representation without writing to file. ```python formatted = record.format('fasta') print(formatted) ``` ## Creating SeqRecord Objects ### Minimal SeqRecord ```python record = SeqRecord(Seq('ATGCGATCGATCG'), id='seq1') ``` ### Full SeqRecord ```python record = SeqRecord( Seq('ATGCGATCGATCG'), id='seq1', name='sequence_one', description='Example sequence for demonstration' ) ``` ### With Annotations (for GenBank output) ```python from Bio.SeqFeature import SeqFeature, FeatureLocation record = SeqRecord( Seq('ATGCGATCGATCG'), id='seq1', annotations={'molecule_type': 'DNA'} ) record.features.append( SeqFeature(FeatureLocation(0, 9), type='gene', qualifiers={'gene': ['exampleGene']}) ) ``` ## Common Formats | Format | String | Notes | |--------|--------|-------| | FASTA | `'fasta'` | Most universal, sequence + header only | | FASTQ | `'fastq'` | Requires quality scores in letter_annotations | | GenBank | `'genbank'` | Requires annotations and molecule_type | | EMBL | `'embl'` | Similar requirements to GenBank | | Tab | `'tab'` | Simple ID + sequence tabular format | ## Code Patterns ### Write Single Record ```python record = SeqRecord(Seq('ATGC'), id='my_seq', description='test sequence') SeqIO.write(record, 'output.fasta', 'fasta') ``` ### Write Multiple Records ```python records = [ SeqRecord(Seq('ATGC'), id='seq1'), SeqRecord(Seq('GCTA'), id='seq2'), SeqRecord(Seq('TTAA'), id='seq3') ] count = SeqIO.write(records, 'output.fasta', 'fasta') print(f'Wrote {count} records') ``` ### Write to File Handle ```python with open('output.fasta', 'w') as handle: SeqIO.write(records, handle, 'fasta') ``` ### Write Modified Records ```python from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord def uppercase_record(rec): return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description) records = SeqIO.parse('input.fasta', 'fasta') modified = (uppercase_record(rec) for rec in records) SeqIO.write(modified, 'output.fasta', 'fasta') ``` ### Append to Existing File ```python with open('output.fasta', 'a') as handle: SeqIO.write(new_records, handle, 'fasta') ``` ### Write FASTQ with Quality Scores ```python record = SeqRecord(Seq('ATGCGATCG'), id='read1') record.letter_annotations['phred_quality'] = [30, 30, 28, 25, 30, 30, 28, 25, 30] SeqIO.write(record, 'output.fastq', 'fastq') ``` ### Write GenBank Format ```python record = SeqRecord(Seq('ATGCGATCGATCG'), id='SEQ001', name='example') record.annotations['molecule_type'] = 'DNA' record.annotations['topology'] = 'linear' record.annotations['organism'] = 'Example organism' SeqIO.write(record, 'output.gb', 'genbank') ``` ## Common Errors | Error | Cause | Solution | |-------|-------|----------| | `TypeError: SeqRecord expected` | Passed raw string/Seq | Wrap in SeqRecord object | | `ValueError: missing molecule_type` | GenBank without annotations | Add `record.annotations['molecule_type'] = 'DNA'` | | `ValueError: missing quality scores` | FASTQ without phred_quality | Add quality scores to letter_annotations | | `ValueError: Sequences must all be the same length` | PHYLIP with unequal lengths | Pad or trim sequences first | ## Format-Specific Requirements ### FASTQ Must have quality scores: ```python record.letter_annotations['phred_quality'] = [30] * len(record.seq) ``` ### GenBank/EMBL Must have molecule_type: ```python record.annotations['molecule_type'] = 'DNA' # or 'RNA', 'protein' ``` ### PHYLIP All sequences must be same length. IDs truncated to 10 characters. ## Related Skills - read-sequences - Read sequences before modifying and writing - format-conversion - Direct format conversion without intermediate processing - filter-sequences - Filter sequences before writing subset - sequence-manipulation/seq-objects - Create SeqRecord objects to write - alignment-files - For SAM/BAM output, use samtools/pysam