string-database

Name: string-database
Author: aipoch/medical-research-skills

$npx mdskill add aipoch/medical-research-skills/string-database

Resolve identifiers and retrieve protein networks with STRING.

Map gene symbols to protein identifiers for downstream analysis.
Fetch interaction edges with confidence scores from STRING.
Expand candidate lists by retrieving interaction partners.
Deliver static network images for reports or notebooks.

SKILL.md

.github/skills/string-databaseView on GitHub ↗

---
name: string-database
description: Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have gene symbols (e.g., `TP53`) and need to resolve them to STRING protein identifiers for downstream analysis.
- You want to retrieve a protein–protein interaction (PPI) network (functional/physical) with confidence scores for one or more proteins.
- You need to find interaction partners for a target protein to expand a candidate list (e.g., add top N neighbors).
- You want to perform functional enrichment (GO/KEGG/Reactome, etc.) for a protein set to interpret biological themes.
- You need a quick static visualization (PNG/SVG) of a STRING network for reports or notebooks.

## Key Features

- **ID Mapping**: Convert gene/protein names to STRING identifiers for a given organism.
- **Network Retrieval**: Fetch interaction edges with confidence scores from STRING.
- **Interaction Partners**: Expand a protein list by retrieving interaction partners.
- **Enrichment Analysis**:
  - Functional enrichment (e.g., GO, KEGG, Reactome)
  - PPI enrichment statistics
  - Functional annotations (e.g., PFAM/SMART where supported by STRING endpoints)
- **Visualization**: Download static network images (PNG/SVG).

## Dependencies

- Python `>=3.8`
- `requests` (tested with `>=2.28`)
- `pandas` (tested with `>=1.5`)

Install:

```bash
pip install requests pandas
```

## Example Usage

```python
from scripts.string_api import StringClient

def main():
    # STRING does not require a secret API key, but providing a caller identity is recommended.
    client = StringClient(caller_identity="my_analysis_tool")

    # 1) Map an identifier (e.g., TP53 in Homo sapiens; NCBI taxonomy ID 9606)
    protein_id = client.map_id(identifier="TP53", species=9606)
    print("Mapped ID:", protein_id)

    # 2) Download a network image and expand by adding interaction partners
    client.get_network_image(
        identifiers=[protein_id],
        output_file="tp53_network.png",
        add_color_nodes=10,  # add 10 partners
    )
    print("Saved network image to tp53_network.png")

    # 3) Run PPI enrichment for the set
    ppi_stats = client.get_ppi_enrichment(identifiers=[protein_id])
    print("PPI enrichment:", ppi_stats)

if __name__ == "__main__":
    main()
```

## Implementation Details

- **Client entry point**: `scripts/string_api.py` provides the main wrapper (e.g., `StringClient`) around the STRING REST API.
- **Caller identity**:
  - STRING endpoints do **not** require an API key.
  - A `caller_identity` string is strongly recommended (project name/email/URL) to support rate/load management.
  - Pass it at initialization (e.g., `StringClient(caller_identity="my_email@example.com")`) or inject via environment variables in your own wrapper.
- **Organism selection**:
  - Most operations require a species identifier (commonly NCBI taxonomy ID, e.g., `9606` for human).
- **Network retrieval and scoring**:
  - Network endpoints return interactions with confidence scores; downstream filtering is typically done by applying a score threshold in your analysis code (if exposed by the wrapper).
- **Visualization**:
  - Static images are retrieved directly from STRING image endpoints and written to disk (PNG/SVG depending on the method/parameters).
- **Reference documentation**:
  - See `references/string_reference.md` for original API notes and endpoint details included with this skill.

More from aipoch/medical-research-skills

Skill	Description
3d-molecule-ray-tracer	Generate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizer	Transform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmer	Precision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refiner	Refines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generator	Generate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generator	Generates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-review	Detects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generator	Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpacker	Intelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparison	Generates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.