bio-workflows-smrna-pipeline
$
npx mdskill add GPTomics/bioSkills/bio-workflows-smrna-pipelineAnalyzes small RNA-seq data from FASTQ files to identify differential miRNA expression
- Processes miRNA, piRNA, and other small RNA sequencing data for differential expression analysis
- Uses cutadapt for trimming, miRDeep2 for quantification, and DESeq2 for differential analysis
- Automates adapter trimming, genome alignment, quantification, and target prediction workflows
- Delivers results through structured output files and integrates with downstream analysis tools
SKILL.md
.github/skills/bio-workflows-smrna-pipelineView on GitHub ↗
---
name: bio-workflows-smrna-pipeline
description: End-to-end small RNA-seq analysis from FASTQ to differential miRNA expression. Use when analyzing miRNA, piRNA, or other small RNA sequencing data.
tool_type: mixed
primary_tool: miRDeep2
---
## Version Compatibility
Reference examples tested with: DESeq2 1.42+, cutadapt 4.4+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Small RNA-seq Pipeline
**"Analyze my small RNA-seq data from FASTQ to differential miRNAs"** → Orchestrate adapter trimming (cutadapt), miRNA quantification (miRDeep2/miRge3), novel miRNA discovery, differential expression (DESeq2), and target prediction (miRanda).
## Pipeline Overview
```
FASTQ → cutadapt trim → miRDeep2 → Quantification → DESeq2 → Target prediction
```
## Step 1: Preprocessing
```bash
# Adapter trimming and size selection
cutadapt -a TGGAATTCTCGGGTGCCAAGG \
--minimum-length 18 --maximum-length 30 \
-o trimmed.fastq.gz reads.fastq.gz
```
## Step 2: miRDeep2 Analysis
```bash
# Align to genome
mapper.pl trimmed.fastq.gz -e -h -i -j -l 18 \
-m -p genome_index -s reads_collapsed.fa \
-t reads_collapsed_vs_genome.arf
# miRNA quantification and novel prediction
miRDeep2.pl reads_collapsed.fa genome.fa \
reads_collapsed_vs_genome.arf \
mature_ref.fa none hairpin_ref.fa
```
## Step 3: Differential Expression
```r
library(DESeq2)
counts <- read.csv('mirna_counts.csv', row.names = 1)
dds <- DESeqDataSetFromMatrix(counts, colData, ~condition)
dds <- DESeq(dds)
results <- results(dds)
```
## Step 4: Target Prediction
```bash
# miRanda for target prediction
miranda mature_mirnas.fa target_3utrs.fa -out targets.txt
```
## QC Checkpoints
1. **After trimming**: Size distribution should peak at 21-23nt
2. **After alignment**: >70% mapping rate expected
3. **After DE**: Check volcano plot and PCA
## Related Skills
- small-rna-seq/mirdeep2-analysis - Detailed miRDeep2
- small-rna-seq/differential-mirna - DE analysis
- small-rna-seq/target-prediction - Target analysis