bio-rna-structure-structure-probing
$
npx mdskill add GPTomics/bioSkills/bio-rna-structure-structure-probingConverts SHAPE-MaP data into reactivity profiles for RNA structure prediction.
- Transforms mutation rates from sequencing experiments into nucleotide-specific reactivity data.
- Depends on ShapeMapper2 CLI and ViennaRNA folding tools for processing.
- Adapts code patterns to match installed library versions and API signatures.
- Outputs per-nucleotide reactivity profiles to constrain thermodynamic structure models.
SKILL.md
.github/skills/bio-rna-structure-structure-probingView on GitHub ↗
---
name: bio-rna-structure-structure-probing
description: Analyzes experimental RNA structure probing data from SHAPE-MaP and DMS-MaPseq experiments using ShapeMapper2. Converts mutation rates to per-nucleotide reactivity profiles that constrain structure prediction. Use when processing SHAPE-MaP or DMS-MaPseq sequencing data to obtain experimental RNA structure information.
tool_type: cli
primary_tool: ShapeMapper2
---
## Version Compatibility
Reference examples tested with: STAR 2.7.11+, eggNOG-mapper 2.1+, matplotlib 3.8+, numpy 1.26+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Structure Probing
**"Process my SHAPE-MaP experiment to get RNA reactivity profiles"** → Convert mutation rates from SHAPE-MaP or DMS-MaPseq sequencing data into per-nucleotide reactivity profiles, then use reactivities as constraints for thermodynamic structure prediction.
- CLI: `shapemapper` (ShapeMapper2) for end-to-end SHAPE-MaP processing
- CLI: `RNAfold --shape` (ViennaRNA) for SHAPE-constrained folding
Analyze experimental RNA structure probing data (SHAPE-MaP, DMS-MaPseq) to obtain per-nucleotide reactivity profiles. High reactivity indicates flexible/single-stranded nucleotides; low reactivity indicates base-paired/structured positions. Reactivities constrain thermodynamic folding for more accurate structure prediction.
## Platform Note
ShapeMapper2 is Linux-only. On macOS, use Docker or Singularity:
```bash
# Docker
docker pull shapemapper2/shapemapper2
docker run -v $(pwd):/data shapemapper2/shapemapper2 shapemapper \
--target /data/target.fa --modified --R1 /data/mod_R1.fq.gz --R2 /data/mod_R2.fq.gz \
--untreated --R1 /data/unmod_R1.fq.gz --R2 /data/unmod_R2.fq.gz \
--out /data/results
# Singularity
singularity pull shapemapper2.sif docker://shapemapper2/shapemapper2
singularity exec -B $(pwd):/data shapemapper2.sif shapemapper \
--target /data/target.fa --modified --R1 /data/mod_R1.fq.gz ...
```
## ShapeMapper2 Pipeline
### Basic SHAPE-MaP Analysis
```bash
shapemapper \
--target target_rna.fa \
--name my_rna \
--modified --R1 modified_R1.fastq.gz --R2 modified_R2.fastq.gz \
--untreated --R1 untreated_R1.fastq.gz --R2 untreated_R2.fastq.gz \
--out results/ \
--nproc 8 \
--min-depth 5000
```
### With Denatured Control
A denatured control normalizes for sequence-dependent mutation biases.
```bash
shapemapper \
--target target_rna.fa \
--name my_rna \
--modified --R1 modified_R1.fastq.gz --R2 modified_R2.fastq.gz \
--untreated --R1 untreated_R1.fastq.gz --R2 untreated_R2.fastq.gz \
--denatured --R1 denatured_R1.fastq.gz --R2 denatured_R2.fastq.gz \
--out results/ \
--nproc 8
```
### Amplicon Mode (Targeted)
```bash
# For PCR-amplified targets
shapemapper \
--target target_rna.fa \
--name my_rna \
--amplicon \
--modified --R1 modified_R1.fastq.gz --R2 modified_R2.fastq.gz \
--untreated --R1 untreated_R1.fastq.gz --R2 untreated_R2.fastq.gz \
--out results/ \
--nproc 8
```
### Key ShapeMapper2 Options
| Option | Description |
|--------|-------------|
| `--target` | FASTA file with target RNA sequence(s) |
| `--name` | Name for output files |
| `--modified` | Modified sample FASTQ files |
| `--untreated` | Untreated control FASTQ files |
| `--denatured` | Denatured control FASTQ files |
| `--amplicon` | Amplicon sequencing mode |
| `--nproc` | Number of processors |
| `--min-depth` | Minimum read depth per nucleotide (default: 5000) |
| `--min-qual-to-count` | Minimum base quality (default: 20) |
| `--overwrite` | Overwrite existing output |
| `--star-aligner` | Use STAR instead of Bowtie2 for alignment |
### ShapeMapper2 Output Files
```
results/
├── my_rna_profile.txt # Per-nucleotide reactivity profile
├── my_rna_shapemapper_log.txt # Run log and QC metrics
├── my_rna_histograms/ # Mutation rate histograms
├── my_rna_profile.pdf # Reactivity bar plot
└── my_rna_map.shape # SHAPE reactivities for RNAfold
```
## Reactivity Interpretation
### SHAPE Reactivity Scale
| Value | Interpretation |
|-------|---------------|
| < 0.3 | Low reactivity, likely base-paired |
| 0.3 - 0.7 | Moderate, partially flexible or dynamic |
| > 0.7 | High reactivity, likely single-stranded |
| -999 | No data (insufficient read depth) |
### Quality Filters
| Metric | Threshold | Rationale |
|--------|-----------|-----------|
| Read depth | >= 5,000 | Minimum for reliable mutation rate estimation |
| Effective depth | >= 1,000 | After quality filtering |
| Modification rate (untreated) | < 0.5% | High background suggests RNA damage or mapping artifact |
| Modification rate (modified) | 1-10% | Too low = undermodified; too high = degraded |
## Structure-Guided Folding with SHAPE Data
### RNAfold with SHAPE Constraints
```bash
# Use ShapeMapper2 .shape output directly with RNAfold
RNAfold --shape=results/my_rna_map.shape < target_rna.fa
# Adjust SHAPE slope/intercept for different reagents
# Default (1NAI/NMIA): m=1.8, b=-0.6
RNAfold --shape=results/my_rna_map.shape --shapeMethod="Dm1.8b-0.6" < target_rna.fa
# Convert to hard constraints at extreme positions
# Reactivity > 0.7 forced unpaired, < 0.3 allowed to pair
RNAfold --shape=results/my_rna_map.shape --shapeConversion="S" < target_rna.fa
```
### Python: SHAPE-Constrained Folding
```python
import RNA
import pandas as pd
def load_shape_profile(profile_file):
'''Load ShapeMapper2 reactivity profile.'''
df = pd.read_csv(profile_file, sep='\t')
reactivities = df['Reactivity_profile'].tolist()
sequence = ''.join(df['Nucleotide'].tolist())
return sequence, reactivities
def fold_with_shape(sequence, reactivities, m=1.8, b=-0.6):
'''
Fold RNA constrained by SHAPE reactivities.
m, b: Deigan et al. (2009) parameters.
m=1.8, b=-0.6: standard for SHAPE (1M7, NMIA, NAI).
m=1.1, b=-0.3: suggested for DMS-MaPseq (A/C only).
'''
fc = RNA.fold_compound(sequence)
# Prepend -999 for 0-indexed padding (ViennaRNA expects 1-indexed)
shape_data = [-999.0] + [r if r != -999 else -999.0 for r in reactivities]
fc.sc_add_SHAPE_deigan(shape_data, m, b)
structure, mfe = fc.mfe()
# Also compute partition function for confidence
fc2 = RNA.fold_compound(sequence)
fc2.sc_add_SHAPE_deigan(shape_data, m, b)
_, pf_energy = fc2.pf()
centroid, _ = fc2.centroid()
return {
'mfe_structure': structure,
'mfe_energy': mfe,
'centroid_structure': centroid,
'ensemble_energy': pf_energy
}
def shape_agreement(structure, reactivities, low_threshold=0.3, high_threshold=0.7):
'''
Assess agreement between SHAPE data and predicted structure.
Returns fraction of nucleotides where reactivity agrees with structure
(low reactivity = paired, high reactivity = unpaired).
'''
agree, total = 0, 0
for i, (char, react) in enumerate(zip(structure, reactivities)):
if react == -999:
continue
total += 1
paired = char in '()'
if paired and react < low_threshold:
agree += 1
elif not paired and react > high_threshold:
agree += 1
return agree / total if total > 0 else 0.0
```
## DMS-MaPseq Analysis
DMS methylates unpaired A and C residues. SEISMIC-RNA (successor to DREEM) or rf-count process DMS-MaPseq data.
### SEISMIC-RNA
```bash
# Install
pip install seismic-rna
# Align reads to target
seismic align target.fa reads_R1.fq.gz reads_R2.fq.gz --out-dir seismic_out
# Relate mutations to reference
seismic relate seismic_out/align --out-dir seismic_out
# Compute mutation rates per position
seismic mask seismic_out/relate --out-dir seismic_out
# Cluster structural conformations (if RNA adopts multiple states)
seismic cluster seismic_out/mask --out-dir seismic_out --max-clusters 3
```
### rf-count (RNAframework)
```bash
# Count mutations with rf-count
rf-count -t target.fa -r modified.bam -rc untreated.bam -o rf_results/
# Normalize reactivities with rf-norm
rf-norm -t target.fa -i rf_results/ -o rf_norm/ -sm 3 -nm 2
```
### DMS vs SHAPE Differences
| Feature | SHAPE (1M7/NAI) | DMS-MaPseq |
|---------|-----------------|------------|
| Reactive nucleotides | All four (A, C, G, U) | Primarily A, C |
| Resolution | All positions | ~50% of positions |
| In-cell probing | NAI-N3 for in vivo | Standard DMS works in vivo |
| Reagent cost | Moderate | Low |
| Analysis tools | ShapeMapper2 | SEISMIC-RNA, rf-count |
## Visualization
### Plot Reactivity Profile
```python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
def plot_reactivity_profile(profile_file, output_file='reactivity_profile.png', title=None):
'''Plot SHAPE/DMS reactivity bar chart with confidence coloring.'''
df = pd.read_csv(profile_file, sep='\t')
positions = df.index + 1
reactivities = df['Reactivity_profile'].values
valid = reactivities != -999
colors = np.where(reactivities > 0.7, '#FF0000', # high = red = unpaired
np.where(reactivities > 0.3, '#FFA500', # moderate = orange
'#000000')) # low = black = paired
fig, ax = plt.subplots(figsize=(max(8, len(positions) * 0.08), 4))
ax.bar(positions[valid], reactivities[valid], color=colors[valid], width=1.0, edgecolor='none')
ax.axhline(y=0.3, color='gray', linestyle='--', linewidth=0.5)
ax.axhline(y=0.7, color='gray', linestyle='--', linewidth=0.5)
ax.set_xlabel('Nucleotide position')
ax.set_ylabel('Reactivity')
ax.set_title(title or 'SHAPE reactivity profile')
ax.set_xlim(0, len(positions) + 1)
plt.tight_layout()
plt.savefig(output_file, dpi=150)
plt.close()
print(f'Saved reactivity profile to {output_file}')
```
### Arc Diagram with Reactivities
```python
import RNA
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
def plot_arc_diagram(sequence, structure, reactivities=None, output_file='arc_diagram.png'):
'''Draw arc diagram colored by SHAPE reactivity.'''
n = len(sequence)
pt = RNA.ptable(structure)
fig, ax = plt.subplots(figsize=(max(8, n * 0.1), 4))
if reactivities is not None:
valid_react = [r for r in reactivities if r != -999 and r >= 0]
vmax = np.percentile(valid_react, 95) if valid_react else 1.0
cmap = plt.cm.YlOrRd
for i in range(n):
r = reactivities[i]
color = 'gray' if r == -999 else cmap(min(r / vmax, 1.0))
ax.plot(i + 1, 0, 'o', color=color, markersize=3)
else:
for i in range(n):
ax.plot(i + 1, 0, 'o', color='black', markersize=3)
for i in range(1, n + 1):
j = pt[i]
if j > i:
center = (i + j) / 2
radius = (j - i) / 2
arc = patches.Arc((center, 0), j - i, j - i, angle=0,
theta1=0, theta2=180, color='steelblue', linewidth=0.5)
ax.add_patch(arc)
ax.set_xlim(0, n + 1)
ax.set_ylim(-1, n / 2 + 1)
ax.set_xlabel('Position')
ax.set_title('Arc diagram')
ax.set_aspect('equal')
plt.tight_layout()
plt.savefig(output_file, dpi=150)
plt.close()
```
## Related Skills
- secondary-structure-prediction - Unconstrained and SHAPE-constrained RNA folding
- clip-seq/binding-site-annotation - Protein-RNA binding site annotation
- epitranscriptomics/m6a-peak-calling - RNA modification detection