reference-search

$npx mdskill add aipoch/medical-research-skills/reference-search

Execute multi-database literature searches with reproducible strategies.

  • Delivers structured results for evidence-based review tasks.
  • Integrates with PubMed search scripts for data retrieval.
  • Prioritizes packaged executable paths over generic answers.
  • Outputs traceable lists for systematic research workflows.

SKILL.md

.github/skills/reference-searchView on GitHub ↗
---
name: reference-search
description: Multi-database literature search and search-strategy design that outputs structured, reproducible result lists; use when you need reference retrieval, systematic searching, review topic selection, or to construct a traceable search strategy.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
# Reference Search

## When to Use

- Use this skill when you need multi-database literature search and search-strategy design that outputs structured, reproducible result lists; use when you need reference retrieval, systematic searching, review topic selection, or to construct a traceable search strategy in a reproducible workflow.
- Use this skill when a evidence insight task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when `scripts/pubmed_search.py` is the most direct path to complete the request.
- Use this skill when you need the `reference-search` package behavior rather than a generic answer.

## Key Features

- Scope-focused workflow aligned to: Multi-database literature search and search-strategy design that outputs structured, reproducible result lists; use when you need reference retrieval, systematic searching, review topic selection, or to construct a traceable search strategy.
- Packaged executable path(s): `scripts/pubmed_search.py`.
- Reference material available in `references/` for task-specific guidance.
- Reusable packaged asset(s), including `assets/search_log_template.csv`.
- Structured execution path designed to keep outputs consistent and reviewable.

## Dependencies

- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.

## Example Usage

```bash
cd "20260316/scientific-skills/Evidence Insight/reference-search"
python -m py_compile scripts/pubmed_search.py
python scripts/pubmed_search.py --help
```

Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/pubmed_search.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.

## Implementation Details

- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/pubmed_search.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Packaged assets: reusable files are available under `assets/`.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.

## 1. When to Use

Use this skill in the following scenarios:

1. **Systematic or scoping reviews** where you must document a reproducible search strategy and export structured results.
2. **Rapid evidence retrieval** for a research question, with quick export to CSV/JSON for screening.
3. **Search strategy construction** (keywords, synonyms, Boolean logic, field restrictions) before running searches at scale.
4. **Review topic selection** by exploring the volume and distribution of literature for candidate topics.
5. **Traceable search logging** when you need to record search date, query string, and result counts for auditability.

## 2. Key Features

- **Multi-database search framework** (currently implemented for **PubMed**).
- **Automatic keyword extraction** and **search strategy construction** (Boolean logic + field constraints).
- **Structured outputs**:
  - Machine-readable **JSON**
  - Spreadsheet-friendly **CSV**
- **Reproducible search records** (query string, keywords, counts, and record list).
- **Compliance-oriented network access** restricted to official PubMed E-utilities endpoints.

## 3. Dependencies

| Dependency | Version | Notes |
|---|---:|---|
| Python | 3.10+ | Uses Python standard library only (no third-party packages). |

## 4. Example Usage

### Run the PubMed search script

```bash
cd skills/reference-search
python scripts/pubmed_search.py
```

### Configure the script

Edit the `CONFIG` section in `scripts/pubmed_search.py`:

```python
from pathlib import Path

CONFIG = {
    "EMAIL": "your_email@example.com",          # Required (must be provided by the user)
    "API_KEY": "",                               # Optional (can increase rate limits)
    "RETMAX": 20,                                # Max number of records to return
    "OUTPUT_DIR": Path("outputs/pubmed_search"), # Allowed output directory
}
```

### Example output (JSON)

```json
{
  "query": "\"Cancer cachexia\"[Title] AND cachexia[Title/Abstract] AND pancreatic[Title/Abstract]",
  "keywords": ["cachexia", "pancreatic", "cancer", "weight", "muscle", "atrophy", "mortality", "treatment"],
  "count": 20,
  "records": [
    {
      "pmid": "36280389",
      "title": "Role of noncoding RNAs in pancreatic ductal adenocarcinoma associated cachexia.",
      "journal": "Journal of Cachexia, Sarcopenia and Muscle",
      "pubdate": "2022",
      "authors": "Wang X, Li Y, Zhang S"
    }
  ]
}
```

## 5. Implementation Details

### Supported databases and endpoints

- **PubMed (NCBI E-utilities)** only.
  - `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi` (search)
  - `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi` (record summaries)

### Search workflow (recommended)

1. **Define requirements and scope**
   - Confirm research question and core concepts.
   - Set inclusion/exclusion criteria (time window, language, publication type).
2. **Design the search strategy**
   - Expand keywords with synonyms.
   - Combine with Boolean operators (AND/OR) and apply field restrictions (e.g., Title/Abstract/MeSH).
3. **Execute and export**
   - Run the script and export results to JSON/CSV.
   - If combining multiple sources, merge and deduplicate externally while preserving source labels.
4. **Record for reproducibility**
   - Save the final query string, search date, and result counts.

### Configuration parameters

- `EMAIL` (required): Must be provided by the user; **must not** be hard-coded as a real credential.
- `API_KEY` (optional): If provided, can improve throughput under NCBI policies.
- `RETMAX`: Limits the number of returned records.
- `OUTPUT_DIR`: Must point to an `outputs/` subdirectory.

### Security, compliance, and access constraints

- **Network access**: restricted to the official NCBI host `eutils.ncbi.nlm.nih.gov` only.
- **Prohibited**: any third-party URLs.
- **File read constraints**: do not read files outside the skill directory.
- **File write constraints**: write outputs only under `outputs/` (ensure the directory exists or is created by the script).
- **Timeout**: 20 seconds per API request.
- **Rate limiting**: 0.35 seconds between requests.
- **Error handling**: return semantic, user-facing error messages without exposing sensitive technical details.

### Included assets and references (in-repo)

- Templates:
  - `assets/search_log_template.csv`
  - `assets/search_results_template.csv`
- Additional guidance and checklists:
  - `references/guide.md`
  - `references/evaluation-checklist.md`
- Tests:
  - `tests/test_pubmed_search.py`
- External documentation:
  - PubMed E-utilities: https://www.ncbi.nlm.nih.gov/books/NBK25504/

More from aipoch/medical-research-skills

SkillDescription
3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.