bio-transcription-translation

Name: bio-transcription-translation
Author: GPTomics/bioSkills

$npx mdskill add GPTomics/bioSkills/bio-transcription-translation

Convert DNA, RNA, and protein sequences using Biopython.

Handles DNA to RNA transcription and RNA to protein translation.
Depends on Bio.Seq module within Biopython 1.83+.
Adapts code dynamically when package versions differ.
Returns transformed sequence objects ready for analysis.

SKILL.md

.github/skills/bio-transcription-translationView on GitHub ↗

---
name: bio-transcription-translation
description: Transcribe DNA to RNA and translate to protein using Biopython. Use when converting between DNA, RNA, and protein sequences, finding ORFs, or using alternative codon tables.
tool_type: python
primary_tool: Bio.Seq
---

## Version Compatibility

Reference examples tested with: BioPython 1.83+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Transcription and Translation

**"Translate my DNA sequence to protein"** → Transcribe DNA to RNA and translate to protein, handling alternative codon tables and six-frame translation.
- Python: `Seq.translate()`, `Seq.transcribe()` (BioPython)

Convert between DNA, RNA, and protein sequences using Biopython.

## Required Import

```python
from Bio.Seq import Seq
```

## Core Methods

### Transcription (DNA to RNA)

```python
dna = Seq('ATGCGATCGATCG')
rna = dna.transcribe()  # Returns Seq('AUGCGAUCGAUCG')
```

Transcription replaces T with U. Works on coding strand (5' to 3').

### Back Transcription (RNA to DNA)

```python
rna = Seq('AUGCGAUCGAUCG')
dna = rna.back_transcribe()  # Returns Seq('ATGCGATCGATCG')
```

### Translation (RNA/DNA to Protein)

```python
# From coding DNA (includes ATG start)
coding_dna = Seq('ATGTTTGGT')
protein = coding_dna.translate()  # Returns Seq('MFG')

# From RNA
rna = Seq('AUGUUUGGU')
protein = rna.translate()  # Returns Seq('MFG')
```

## Translation Options

### Stop at First Stop Codon

```python
seq = Seq('ATGTTTGGTTAAGGG')
protein = seq.translate(to_stop=True)  # Stops at TAA, excludes stop
```

### Include Stop Codon Symbol

```python
seq = Seq('ATGTTTGGTTAA')
protein = seq.translate()  # Returns Seq('MFG*')
```

### Alternative Codon Tables

Biopython supports NCBI codon tables. Common tables:

| ID | Name | Use Case |
|----|------|----------|
| 1 | Standard | Default, most organisms |
| 2 | Vertebrate Mitochondrial | Human/vertebrate mitochondria |
| 4 | Mold Mitochondrial | Fungi, protozoa mitochondria |
| 5 | Invertebrate Mitochondrial | Insects, worms mitochondria |
| 6 | Ciliate Nuclear | Tetrahymena, Paramecium |
| 11 | Bacterial/Archaeal | Prokaryotes, plastids |

```python
# Bacterial translation
seq = Seq('ATGTTTGGT')
protein = seq.translate(table=11)

# Mitochondrial translation
protein = seq.translate(table=2)

# By name
protein = seq.translate(table='Vertebrate Mitochondrial')
```

### CDS Translation (Complete Coding Sequence)

For validated coding sequences with proper start/stop:

```python
cds = Seq('ATGTTTGGTTAA')  # Must start with start codon, end with stop
protein = cds.translate(cds=True)  # Validates and removes stop
```

The `cds=True` option:
- Validates start codon (ATG or alternative)
- Validates stop codon at end
- Removes stop codon from result
- Raises error if invalid CDS

## Code Patterns

### Basic Transcription and Translation Pipeline

```python
dna = Seq('ATGTTTGGTCATTAA')
rna = dna.transcribe()
protein = rna.translate()
print(f'DNA: {dna}')
print(f'RNA: {rna}')
print(f'Protein: {protein}')
```

### Translate All Six Reading Frames

**Goal:** Translate a DNA sequence in all six frames (three forward, three reverse) to find all possible protein products.

**Approach:** For each strand, offset by 0, 1, and 2 bases, trim to a multiple of 3, and translate.

**Reference (BioPython 1.83+):**
```python
def six_frame_translation(seq):
    frames = []
    for strand, s in [('+', seq), ('-', seq.reverse_complement())]:
        for frame in range(3):
            length = 3 * ((len(s) - frame) // 3)
            fragment = s[frame:frame + length]
            frames.append((strand, frame, fragment.translate()))
    return frames

seq = Seq('ATGCGATCGATCGATCGATCG')
for strand, frame, protein in six_frame_translation(seq):
    print(f'{strand}{frame}: {protein}')
```

### Find All ORFs (Start to Stop)

**Goal:** Identify all open reading frames (M to stop codon) across both strands and all three frames.

**Approach:** Translate each of the six frames, then scan for Met-to-stop segments meeting the minimum length.

**Reference (BioPython 1.83+):**
```python
def find_orfs(seq, min_length=30):
    orfs = []
    for strand, s in [('+', seq), ('-', seq.reverse_complement())]:
        for frame in range(3):
            trans = s[frame:].translate()
            aa_start = 0
            while True:
                start = trans.find('M', aa_start)
                if start == -1:
                    break
                stop = trans.find('*', start)
                if stop == -1:
                    stop = len(trans)
                orf = trans[start:stop]
                if len(orf) * 3 >= min_length:
                    orfs.append((strand, frame, start * 3 + frame, str(orf)))
                aa_start = start + 1
    return orfs

seq = Seq('ATGCGATCGATCGATCGATCGTAA')
for strand, frame, pos, orf in find_orfs(seq, min_length=3):
    print(f'{strand} frame {frame} pos {pos}: {orf}')
```

### Translate with Quality Check

```python
def translate_cds_safe(seq):
    try:
        return seq.translate(cds=True)
    except Exception as e:
        return seq.translate(to_stop=True)  # Fallback
```

### Get Codon Table Info

```python
from Bio.Data import CodonTable

table = CodonTable.unambiguous_dna_by_id[1]
print(f'Start codons: {table.start_codons}')
print(f'Stop codons: {table.stop_codons}')
```

## Common Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `TranslationError: First codon is not a start codon` | Used `cds=True` without valid start | Remove `cds=True` or fix sequence |
| `TranslationError: Final codon is not a stop codon` | Used `cds=True` without stop codon | Remove `cds=True` or add stop codon |
| `TranslationError: Sequence length not multiple of 3` | Partial codons at end | Trim sequence to multiple of 3 |
| Unexpected amino acids | Wrong codon table | Specify correct table for organism |

## Decision Tree

```
Need to convert sequence?
├── DNA to RNA?
│   └── Use seq.transcribe()
├── RNA to DNA?
│   └── Use seq.back_transcribe()
├── DNA/RNA to protein?
│   ├── Complete CDS with start/stop?
│   │   └── Use translate(cds=True)
│   ├── Stop at first stop codon?
│   │   └── Use translate(to_stop=True)
│   ├── Non-standard organism?
│   │   └── Use translate(table=N)
│   └── Get all including stop symbol?
│       └── Use translate()
└── Find all ORFs?
    └── Translate all six frames, search for M...*
```

## Related Skills

- seq-objects - Create Seq objects for translation
- reverse-complement - Translate both strands (six-frame translation)
- codon-usage - Analyze codon bias in coding sequences
- sequence-io/read-sequences - Parse GenBank files with CDS features
- database-access/entrez-fetch - Fetch CDS sequences from NCBI for translation

More from GPTomics/bioSkills

Skill	Description
bio-admet-prediction	Predicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
bio-alignment-amplicon-clipping	Trim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
bio-alignment-filtering	Filter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
bio-alignment-indexing	Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
bio-alignment-io	Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
bio-alignment-msa-parsing	Parse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
bio-alignment-msa-statistics	Calculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
bio-alignment-multiple	Perform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
bio-alignment-pairwise	Perform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
bio-alignment-sorting	Sort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.