string-database

$npx mdskill add aipoch/medical-research-skills/string-database

Resolve identifiers and retrieve protein networks with STRING.

  • Map gene symbols to protein identifiers for downstream analysis.
  • Fetch interaction edges with confidence scores from STRING.
  • Expand candidate lists by retrieving interaction partners.
  • Deliver static network images for reports or notebooks.

SKILL.md

.github/skills/string-databaseView on GitHub ↗
---
name: string-database
description: Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have gene symbols (e.g., `TP53`) and need to resolve them to STRING protein identifiers for downstream analysis.
- You want to retrieve a protein–protein interaction (PPI) network (functional/physical) with confidence scores for one or more proteins.
- You need to find interaction partners for a target protein to expand a candidate list (e.g., add top N neighbors).
- You want to perform functional enrichment (GO/KEGG/Reactome, etc.) for a protein set to interpret biological themes.
- You need a quick static visualization (PNG/SVG) of a STRING network for reports or notebooks.

## Key Features

- **ID Mapping**: Convert gene/protein names to STRING identifiers for a given organism.
- **Network Retrieval**: Fetch interaction edges with confidence scores from STRING.
- **Interaction Partners**: Expand a protein list by retrieving interaction partners.
- **Enrichment Analysis**:
  - Functional enrichment (e.g., GO, KEGG, Reactome)
  - PPI enrichment statistics
  - Functional annotations (e.g., PFAM/SMART where supported by STRING endpoints)
- **Visualization**: Download static network images (PNG/SVG).

## Dependencies

- Python `>=3.8`
- `requests` (tested with `>=2.28`)
- `pandas` (tested with `>=1.5`)

Install:

```bash
pip install requests pandas
```

## Example Usage

```python
from scripts.string_api import StringClient

def main():
    # STRING does not require a secret API key, but providing a caller identity is recommended.
    client = StringClient(caller_identity="my_analysis_tool")

    # 1) Map an identifier (e.g., TP53 in Homo sapiens; NCBI taxonomy ID 9606)
    protein_id = client.map_id(identifier="TP53", species=9606)
    print("Mapped ID:", protein_id)

    # 2) Download a network image and expand by adding interaction partners
    client.get_network_image(
        identifiers=[protein_id],
        output_file="tp53_network.png",
        add_color_nodes=10,  # add 10 partners
    )
    print("Saved network image to tp53_network.png")

    # 3) Run PPI enrichment for the set
    ppi_stats = client.get_ppi_enrichment(identifiers=[protein_id])
    print("PPI enrichment:", ppi_stats)

if __name__ == "__main__":
    main()
```

## Implementation Details

- **Client entry point**: `scripts/string_api.py` provides the main wrapper (e.g., `StringClient`) around the STRING REST API.
- **Caller identity**:
  - STRING endpoints do **not** require an API key.
  - A `caller_identity` string is strongly recommended (project name/email/URL) to support rate/load management.
  - Pass it at initialization (e.g., `StringClient(caller_identity="my_email@example.com")`) or inject via environment variables in your own wrapper.
- **Organism selection**:
  - Most operations require a species identifier (commonly NCBI taxonomy ID, e.g., `9606` for human).
- **Network retrieval and scoring**:
  - Network endpoints return interactions with confidence scores; downstream filtering is typically done by applying a score threshold in your analysis code (if exposed by the wrapper).
- **Visualization**:
  - Static images are retrieved directly from STRING image endpoints and written to disk (PNG/SVG depending on the method/parameters).
- **Reference documentation**:
  - See `references/string_reference.md` for original API notes and endpoint details included with this skill.

More from aipoch/medical-research-skills

SkillDescription
3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.