contamination-audit

Name: contamination-audit
Author: yogsoth-ai/de-anthropocentric-research-engine

$npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/contamination-audit

Detect evidence of train-test data leakage, benchmark contamination, and memorization artifacts that inflate reported scores beyond genuine capability.

SKILL.md

.github/skills/contamination-auditView on GitHub ↗

---
name: contamination-audit
description: Detect train-test data leakage and memorization artifacts
execution: subagent
prompt: ./prompt.md
input: benchmark_name, training_data_sources, test_set_metadata
used-by: benchmark-archaeology
---

# Contamination Audit SOP

Detect evidence of train-test data leakage, benchmark contamination, and memorization artifacts that inflate reported scores beyond genuine capability.

## Input

- **benchmark_name**: Name of the benchmark being audited
- **training_data_sources**: Known or suspected training data sources for models evaluated on this benchmark
- **test_set_metadata**: Information about the test set (size, creation date, source, public availability)

## Procedure

1. Assess temporal contamination risk (test data predates training cutoff?)
2. Check for known contamination disclosures in model papers
3. Search for contamination studies specific to this benchmark
4. Analyze performance patterns indicative of memorization
5. Assess mitigation measures (canary strings, held-out splits, version rotation)

## Output

Contamination risk assessment with evidence and confidence levels.

More from yogsoth-ai/de-anthropocentric-research-engine

Skill	Description
abductive-hypothesis-generation	Strategy: 面对异常的最佳解释推理
ablation-brainstorm	Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
ablation-component-mapping	Map system architecture to ablatable units for ablation studies
ablation-design	Design ablation studies to isolate component contributions in ML systems
ablation-execution	Remove components one by one from a system, record the response/impact of each removal.
abp-vulnerability-classification	Classify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
abstraction-extraction	Extract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
abstraction-ladder	Perform bisociation at multiple abstraction levels
abstraction-laddering	Move between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
abstraction-to-design	Abstract biological principle to design principle. Bridge from biology to engineering.