bio-small-rna-seq-mirge3-analysis
$
npx mdskill add GPTomics/bioSkills/bio-small-rna-seq-mirge3-analysisQuantifies miRNAs and isomiRs with A-to-I editing analysis using miRge3
- Solves fast annotation and quantification of known miRNAs from small RNA-seq data
- Uses miRge3 with MirgeDB and organism-specific libraries for accurate detection
- Analyzes isomiR variants and RNA editing events during read alignment and classification
- Delivers quantified miRNA counts and editing reports in structured output directories
SKILL.md
.github/skills/bio-small-rna-seq-mirge3-analysisView on GitHub ↗
---
name: bio-small-rna-seq-mirge3-analysis
description: Fast miRNA quantification with isomiR detection and A-to-I editing analysis using miRge3. Use when quantifying known miRNAs quickly or analyzing isomiR variants and RNA editing.
tool_type: python
primary_tool: miRge3
---
## Version Compatibility
Reference examples tested with: numpy 1.26+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# miRge3 Analysis
**"Quantify miRNAs with isomiR detection"** → Fast miRNA annotation and quantification with isomiR variant detection and A-to-I RNA editing analysis from small RNA-seq reads.
- CLI: `miRge3.0 annotate -s sample.fastq -lib human -db mirgenedb -o results/`
## Basic Quantification
**Goal:** Quantify known miRNA expression from small RNA-seq FASTQ files.
**Approach:** Run miRge3 annotation pipeline with adapter trimming, organism-specific libraries, and multi-sample input.
```bash
# Run miRge3 on FASTQ files
miRge3.0 annotate \
-s sample1.fastq.gz,sample2.fastq.gz \
-lib miRge3_libs \
-on human \
-db mirbase \
-o output_dir \
-a TGGAATTCTCGGGTGCCAAGG \
--threads 8
# Key options:
# -s: Input FASTQ files (comma-separated)
# -lib: Path to miRge3 library
# -on: Organism name
# -db: Database (mirbase or mirgenedb)
# -a: 3' adapter sequence
```
## Install miRge3 Libraries
**Goal:** Download organism-specific reference libraries required for miRge3 annotation.
**Approach:** Use miRge3 built-in download command to fetch pre-built bowtie indices and annotations.
```bash
# Download pre-built libraries
miRge3.0 --download-library human mirbase
# Libraries include:
# - Bowtie indices for miRNAs, tRNAs, rRNAs
# - miRBase or MirGeneDB annotations
# - A-to-I editing sites
```
## IsomiR Detection
**Goal:** Identify and quantify isomiR variants including 5'/3' additions, deletions, and internal modifications.
**Approach:** Enable miRge3 isomiR mode to classify reads by their deviation from canonical miRNA sequences.
```bash
# Enable isomiR analysis
miRge3.0 annotate \
-s sample.fastq.gz \
-lib miRge3_libs \
-on human \
-db mirbase \
--isomir \
-o output_dir
# IsomiRs include:
# - 5' variants (templated and non-templated)
# - 3' variants (templated and non-templated)
# - Internal modifications
```
## A-to-I RNA Editing
**Goal:** Detect adenosine-to-inosine RNA editing events in miRNA sequences.
**Approach:** Enable miRge3 A-to-I detection mode which identifies editing sites and calculates editing frequencies.
```bash
# Detect A-to-I editing
miRge3.0 annotate \
-s sample.fastq.gz \
-lib miRge3_libs \
-on human \
-db mirbase \
--AtoI \
-o output_dir
# Outputs editing sites and frequencies
```
## Output Files
| File | Description |
|------|-------------|
| miR.Counts.csv | Raw read counts per miRNA |
| miR.RPM.csv | RPM normalized counts |
| isomiR.Counts.csv | IsomiR-level counts |
| isomiR.summary.csv | IsomiR summary per miRNA |
| annotation.report.html | Interactive QC report |
## Python API
**Goal:** Run miRge3 quantification programmatically from Python.
**Approach:** Call the miRge3 annotate function directly with configuration parameters instead of CLI invocation.
```python
from mirge3.annotate import annotate
# Run programmatically
annotate(
samples=['sample1.fastq.gz', 'sample2.fastq.gz'],
lib_path='miRge3_libs',
organism='human',
database='mirbase',
adapter='TGGAATTCTCGGGTGCCAAGG',
output_dir='results',
threads=8
)
```
## Parse miRge3 Output
**Goal:** Load miRge3 count matrices and isomiR tables into pandas for downstream analysis.
**Approach:** Read CSV output files and apply minimum count filtering to remove lowly-expressed miRNAs.
```python
import pandas as pd
def load_mirge3_counts(output_dir):
'''Load miRge3 count matrix'''
counts = pd.read_csv(f'{output_dir}/miR.Counts.csv', index_col=0)
return counts
def load_isomirs(output_dir):
'''Load isomiR-level counts'''
isomirs = pd.read_csv(f'{output_dir}/isomiR.Counts.csv', index_col=0)
return isomirs
# Filter low-expressed miRNAs
def filter_low_counts(counts, min_total=10):
'''Keep miRNAs with total count >= threshold'''
return counts[counts.sum(axis=1) >= min_total]
```
## Compare Multiple Samples
**Goal:** Normalize and transform miRNA counts for cross-sample comparison.
**Approach:** Apply RPM normalization to account for library size, then log2-transform for variance stabilization.
```python
def normalize_rpm(counts):
'''Normalize to reads per million'''
total_per_sample = counts.sum(axis=0)
rpm = counts / total_per_sample * 1e6
return rpm
def log_transform(rpm, pseudocount=1):
'''Log2 transform with pseudocount'''
import numpy as np
return np.log2(rpm + pseudocount)
```
## IsomiR Analysis
**Goal:** Summarize isomiR diversity metrics per canonical miRNA.
**Approach:** Group isomiR-level counts by parent miRNA and compute total reads, variant count, and dominant isoform.
```python
def summarize_isomirs(isomir_counts):
'''Summarize isomiR diversity per miRNA'''
# Group by canonical miRNA
isomir_counts['miRNA'] = isomir_counts.index.str.extract(r'(hsa-\w+-\d+[a-z]*)')[0]
summary = isomir_counts.groupby('miRNA').agg({
'count': ['sum', 'count', lambda x: x.idxmax()]
})
summary.columns = ['total_reads', 'n_isomirs', 'dominant_isomir']
return summary
```
## Related Skills
- smrna-preprocessing - Prepare reads for miRge3
- mirdeep2-analysis - Alternative with novel miRNA discovery
- differential-mirna - DE analysis of miRge3 counts