bio-gene-regulatory-networks-multiomics-grn

Name: bio-gene-regulatory-networks-multiomics-grn
Author: GPTomics/bioSkills

$npx mdskill add GPTomics/bioSkills/bio-gene-regulatory-networks-multiomics-grn

Integrate multiomics data to infer enhancer-driven gene regulatory networks

Solves cis-regulatory GRN inference from paired scRNA-seq and ATAC-seq datasets.
Depends on SCENIC+, pycisTopic, scanpy, pandas, matplotlib, and Cell Ranger tools.
Decides eRegulon assembly by linking transcription factors to enhancers via chromatin accessibility patterns.
Delivers results as TF-enhancer-gene triplets representing regulatory relationships.

SKILL.md

.github/skills/bio-gene-regulatory-networks-multiomics-grnView on GitHub ↗

---
name: bio-gene-regulatory-networks-multiomics-grn
description: Build enhancer-driven gene regulatory networks by integrating single-cell RNA-seq and ATAC-seq data using SCENIC+ to identify eRegulons linking transcription factors to enhancers and target genes. Use when analyzing 10x multiome or paired scRNA+scATAC data to infer cis-regulatory GRNs.
tool_type: python
primary_tool: SCENIC+
---

## Version Compatibility

Reference examples tested with: Cell Ranger 8.0+, MACS3 3.0+, matplotlib 3.8+, pandas 2.2+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Multiomics GRN Inference

**"Build an enhancer-driven gene regulatory network from my multiome data"** → Integrate scRNA-seq and scATAC-seq to identify eRegulons: transcription factor-enhancer-target gene triplets linking TF binding to chromatin accessibility and gene expression changes.
- Python: SCENIC+ pipeline with `scenicplus` for eRegulon assembly
- Python: `pycisTopic` for topic modeling of scATAC-seq regions

Build enhancer-driven gene regulatory networks from paired single-cell RNA-seq and ATAC-seq data. SCENIC+ extends SCENIC by linking TFs to their enhancers and target genes through eRegulons.

## SCENIC+ Overview

| Component | Tool | Purpose |
|-----------|------|---------|
| Topic modeling | cisTopic | Identify cis-regulatory topics from scATAC |
| Region-to-gene | SCENIC+ | Link accessible regions to target genes |
| TF-to-region | SCENIC+ | Connect TFs to their enhancer targets |
| eRegulon assembly | SCENIC+ | Combine into TF-enhancer-gene triplets |

## Input Preparation

### From 10x Multiome (CellRanger ARC)

```python
import scanpy as sc
import pycisTopic
from pycisTopic.cistopic_class import create_cistopic_object_from_fragments

# Load RNA from CellRanger ARC output
adata_rna = sc.read_10x_h5('filtered_feature_bc_matrix.h5', gex_only=True)
adata_rna.var_names_make_unique()
sc.pp.filter_cells(adata_rna, min_genes=200)
sc.pp.filter_genes(adata_rna, min_cells=3)

# Load ATAC fragments
fragments_file = 'atac_fragments.tsv.gz'
```

### Call Peaks with MACS3

```python
import subprocess

# Call peaks from fragments file for region universe
subprocess.run([
    'macs3', 'callpeak',
    '-t', 'atac_fragments.tsv.gz',
    '-f', 'BEDPE',
    '--nomodel', '--shift', '-75', '--extsize', '150',
    '-g', 'hs',
    '--keep-dup', 'all',
    '-n', 'multiome_peaks',
    '--outdir', 'macs3_output'
], check=True)
```

### Create cisTopic Object

```python
from pycisTopic.cistopic_class import create_cistopic_object_from_fragments
from pycisTopic.lda_models import run_cgs_models

# Create binary accessibility matrix from fragments
path_to_regions = 'macs3_output/multiome_peaks_peaks.narrowPeak'
cistopic_obj = create_cistopic_object_from_fragments(
    path_to_fragments=fragments_file,
    path_to_regions=path_to_regions,
    path_to_blacklist='hg38-blacklist.v2.bed',
    n_cpu=8
)

# Run LDA topic modeling
# Test multiple topic numbers; select by model metrics
models = run_cgs_models(cistopic_obj, n_topics=[10, 20, 30, 40, 50],
                        n_cpu=8, n_iter=300, random_state=42)

# Select best model (highest coherence, lowest perplexity)
from pycisTopic.lda_models import evaluate_models
model = evaluate_models(models, return_model=True)
cistopic_obj.add_LDA_model(model)
```

## SCENIC+ Workflow

**Goal:** Assemble enhancer-driven gene regulatory networks (eRegulons) linking transcription factors to their target enhancers and downstream genes from paired scRNA+scATAC data.

**Approach:** Create a SCENICPLUS object from preprocessed scRNA-seq AnnData and cisTopic ATAC object, then run the complete pipeline which performs motif enrichment, region-to-gene linking, and eRegulon assembly.

```python
import scenicplus
from scenicplus.scenicplus_class import SCENICPLUS
from scenicplus.wrappers.run_scenicplus import run_scenicplus

# Prepare SCENIC+ object
scplus_obj = SCENICPLUS(
    adata_rna,           # scRNA-seq AnnData
    cistopic_obj,        # cisTopic object with topics
    menr=None            # will be computed
)

# Run complete SCENIC+ pipeline
# This performs: motif enrichment, region-to-gene linking, eRegulon assembly
run_scenicplus(
    scplus_obj,
    variable=['GeneExpressionLevel'],
    species='hsapiens',
    assembly='hg38',
    tf_file='allTFs_hg38.txt',
    save_path='scenicplus_output/',
    biomart_host='http://www.ensembl.org',
    upstream=[1000, 150000],    # search space upstream of TSS
    downstream=[1000, 150000],  # search space downstream of TSS
    calculate_TF_eGRN_correlation=True,
    calculate_DEGs_DARs=True,
    export_to_loom_file=True,
    export_to_UCSC_file=True,
    n_cpu=8
)
```

## eRegulon Interpretation

**Goal:** Summarize the discovered eRegulons to identify the most active transcription factors and distinguish activating from repressive regulation.

**Approach:** Parse the eRegulon metadata table to count regions and target genes per TF, then separate activating (+) and repressive (-) eRegulons by their name suffix.

```python
import pandas as pd

# eRegulons link TFs -> enhancers -> target genes
eregulons = scplus_obj.uns['eRegulon_metadata']
print(f'Found {eregulons["Region_signature_name"].nunique()} eRegulons')

# Summarize eRegulons
ereg_summary = eregulons.groupby('TF').agg(
    n_regions=('Region', 'nunique'),
    n_genes=('Gene', 'nunique')
).sort_values('n_genes', ascending=False)
print(ereg_summary.head(20))

# Activating vs repressive eRegulons
# (+) suffix = activating (TF expression positively correlated with targets)
# (-) suffix = repressive (TF expression negatively correlated with targets)
activating = eregulons[eregulons['Region_signature_name'].str.contains(r'\(\+\)')]
repressive = eregulons[eregulons['Region_signature_name'].str.contains(r'\(-\)')]
print(f'Activating: {activating["TF"].nunique()}, Repressive: {repressive["TF"].nunique()}')
```

### eRegulon Activity Scoring

```python
from scenicplus.eregulon_enrichment import score_eRegulons

# Score eRegulon activity per cell (like AUCell but for eRegulons)
score_eRegulons(scplus_obj, ranking_key='eRegulon_AUC', enrichment_type='region')
score_eRegulons(scplus_obj, ranking_key='eRegulon_AUC_gene', enrichment_type='gene')

# eRegulon AUC matrix
ereg_auc = scplus_obj.uns['eRegulon_AUC']['Gene_based'].copy()
```

## Visualization

```python
import matplotlib.pyplot as plt
from scenicplus.plotting.dotplot import heatmap_dotplot

# eRegulon activity heatmap by cell type
heatmap_dotplot(
    scplus_obj,
    size_matrix=scplus_obj.uns['eRegulon_AUC']['Gene_based'],
    color_matrix=scplus_obj.uns['eRegulon_AUC']['Region_based'],
    group_variable='cell_type',
    save='eregulon_dotplot.pdf'
)

# Network visualization of top eRegulon
from scenicplus.plotting.grn_plot import plot_eRegulon

plot_eRegulon(scplus_obj, tf_name='PAX5', save='pax5_eregulon.pdf')
```

## FigR Alternative

FigR infers TF-gene regulatory interactions from paired scRNA+scATAC without topic modeling.

```r
library(FigR)
library(Seurat)
library(Signac)

# Load Seurat object with RNA and ATAC assays (from Signac)
seurat_obj <- readRDS('multiome_seurat.rds')

# Run FigR
# Step 1: compute peak-gene correlations
peak_gene_cors <- runGenePeakcorr(
    ATAC.se = seurat_obj@assays$ATAC,
    RNAmat = seurat_obj@assays$RNA@data,
    genome = 'hg38',
    nCores = 8,
    p.cut = 0.05  # significance threshold for peak-gene links
)

# Step 2: infer TF-gene regulation scores (DORC-based)
fig_results <- runFigR(
    ATAC.se = seurat_obj@assays$ATAC,
    dorcTab = peak_gene_cors,
    genome = 'hg38',
    dorcMat = getDORCScores(seurat_obj@assays$ATAC, peak_gene_cors),
    rnaMat = seurat_obj@assays$RNA@data,
    nCores = 8
)

# Top regulatory TF-gene links
top_links <- fig_results[order(abs(fig_results$Score), decreasing = TRUE), ]
head(top_links, 20)
```

## Resource Requirements

| Dataset Size | RAM | Time | Notes |
|-------------|-----|------|-------|
| 5K cells | 16 GB | 2-4 hours | Feasible on laptop |
| 20K cells | 64 GB | 8-12 hours | Standard server |
| 50K+ cells | 128 GB+ | 24+ hours | Subsample or use HPC |

## Related Skills

- scenic-regulons - RNA-only regulon inference with pySCENIC
- single-cell/scatac-analysis - scATAC-seq preprocessing with Signac/ArchR
- atac-seq/atac-peak-calling - Peak calling for region universe
- chip-seq/motif-analysis - Motif enrichment for TF identification

More from GPTomics/bioSkills

Skill	Description
bio-admet-prediction	Predicts ADMET properties using ADMETlab 3.0 API or DeepChem models. Estimates bioavailability, CYP inhibition, hERG liability, and 119 toxicity endpoints with uncertainty quantification. Filters for PAINS and other structural alerts. Use when filtering compounds for drug-likeness or prioritizing leads by predicted safety.
bio-alignment-amplicon-clipping	Trim PCR primers from aligned reads in amplicon-panel BAMs using samtools ampliconclip. Use when processing SARS-CoV-2 ARTIC, hereditary cancer panels, ctDNA hot-spot panels, or any amplicon assay where primer-derived bases would falsely confirm reference at primer footprints.
bio-alignment-filtering	Filter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
bio-alignment-indexing	Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
bio-alignment-io	Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.
bio-alignment-msa-parsing	Parse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.
bio-alignment-msa-statistics	Calculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.
bio-alignment-multiple	Perform multiple sequence alignment using MAFFT, MUSCLE5, ClustalOmega, or T-Coffee. Guides tool and algorithm selection based on dataset size, sequence divergence, and downstream application. Use when aligning three or more homologous sequences for phylogenetics, conservation analysis, or evolutionary studies.
bio-alignment-pairwise	Perform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.
bio-alignment-sorting	Sort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.