arxiv-database

Name: arxiv-database
Author: aipoch/medical-research-skills

$npx mdskill add aipoch/medical-research-skills/arxiv-database

Search arXiv preprints by keyword, author, or category.

Retrieves abstracts, DOIs, and PDF links for scientific papers.
Depends on the arXiv API and Python script execution.
Executes structured queries via arxiv_search.py for reproducibility.
Delivers concrete files or metadata for offline reading.

SKILL.md

.github/skills/arxiv-databaseView on GitHub ↗

---
name: arxiv-database
description: Search and retrieve scientific preprints from arXiv; use it when you need to find papers by keyword/author/category, fetch metadata (abstract, DOI, PDF URL), or download PDFs for offline reading.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
# ArXiv Database Skill

## When to Use

- Use this skill when you need search and retrieve scientific preprints from arxiv; use it when you need to find papers by keyword/author/category, fetch metadata (abstract, doi, pdf url), or download pdfs for offline reading in a reproducible workflow.
- Use this skill when a evidence insight task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when `scripts/arxiv_search.py` is the most direct path to complete the request.
- Use this skill when you need the `arxiv-database` package behavior rather than a generic answer.

## Key Features

- Scope-focused workflow aligned to: Search and retrieve scientific preprints from arXiv; use it when you need to find papers by keyword/author/category, fetch metadata (abstract, DOI, PDF URL), or download PDFs for offline reading.
- Packaged executable path(s): `scripts/arxiv_search.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.

## Dependencies

- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.

## Example Usage

```bash
cd "20260316/scientific-skills/Evidence Insight/arxiv-database"
python -m py_compile scripts/arxiv_search.py
python scripts/arxiv_search.py --help
```

Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/arxiv_search.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.

## Implementation Details

- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/arxiv_search.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.

## 1. When to Use

- You need to quickly find arXiv preprints by keyword, phrase, author, or category (e.g., `cs.AI`, `cs.CL`).
- You want to collect paper metadata (title, authors, publication date, abstract/summary, PDF link) for review or indexing.
- You need the latest submissions in a topic area (sorted by submission date or last updated date).
- You want to download one or more PDFs from search results for offline reading or batch processing.
- You have a known arXiv identifier and want to retrieve the corresponding paper directly.

## 2. Key Features

- arXiv query-based search (supports category filters, author filters, phrases, and ID lookups).
- Configurable result limits (`--max-results`).
- Sort control (`--sort-by`: `Relevance`, `LastUpdatedDate`, `SubmittedDate`).
- Metadata output per result (title, authors, published date, abstract/summary, PDF URL; DOI when available via arXiv metadata).
- Optional PDF download for returned results (`--download`) with configurable output directory (`--dir`).

## 3. Dependencies

- Python 3.8+
- `arxiv` (Python package) — version depends on your environment; install a recent release (e.g., `arxiv>=1.4.0`)

## 4. Example Usage

### Install dependencies

```bash
pip install "arxiv>=1.4.0"
```

### Run searches and downloads

**Search for papers in `cs.AI` about reinforcement learning (top 5 results):**
```bash
python scripts/arxiv_search.py --query "cat:cs.AI AND reinforcement learning" --max-results 5
```

**Search for “Large Language Models” in `cs.CL`:**
```bash
python scripts/arxiv_search.py --query "cat:cs.CL AND \"Large Language Models\""
```

**Get the latest 5 papers on “quantum computing” (sorted by submission date):**
```bash
python scripts/arxiv_search.py --query "quantum computing" --sort-by SubmittedDate --max-results 5
```

**Download a specific paper by arXiv ID:**
```bash
python scripts/arxiv_search.py --query "id:2101.12345" --download
```

**Download results into a specific directory:**
```bash
python scripts/arxiv_search.py --query "cat:cs.LG AND diffusion" --max-results 3 --download --dir ./papers
```

## 5. Implementation Details

- **Entry point:** `scripts/arxiv_search.py` wraps the `arxiv` Python API to execute queries against the arXiv search endpoint.
- **Query syntax:** The `--query` string is passed to arXiv search and can include:
- Category filters (e.g., `cat:cs.AI`)
- Author filters (e.g., `au:Smith`)
- Exact phrases using quotes (e.g., `"Large Language Models"`)
- ID lookup (e.g., `id:2101.12345`)
- Boolean operators such as `AND`
- **Result limiting:** `--max-results` controls how many entries are returned (default: `10`).
- **Sorting:** `--sort-by` selects the ordering of results:
- `Relevance` (default)
- `LastUpdatedDate`
- `SubmittedDate`
- **Downloads:** When `--download` is set, the script downloads the PDF for each returned result using the provided PDF URL and saves it to `--dir` (default: current working directory).
- **Metadata fields:** Each result includes core arXiv metadata (title, authors, published date, summary/abstract, PDF URL). DOI is included when present in arXiv’s metadata for that record.

More from aipoch/medical-research-skills

Skill	Description
3d-molecule-ray-tracer	Generate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
abstract-summarizer	Transform lengthy academic papers into concise, structured 250-word abstracts.
abstract-trimmer	Precision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
academic-abstract-refiner	Refines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
academic-cv-generator	Generate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
academic-highlight-generator	Generates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
academic-norm-review	Detects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
academic-poster-generator	Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
acronym-unpacker	Intelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
active-comparator-single-soc-faers-safety-comparison	Generates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.