construct-validity-assessment

Name: construct-validity-assessment
Author: yogsoth-ai/de-anthropocentric-research-engine

$npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/construct-validity-assessment

Assesses if a benchmark accurately measures its claimed capability

Evaluates alignment between benchmark tasks and intended capability
Uses psychometric validity frameworks adapted for AI evaluation
Analyzes task examples for content, convergent, and discriminant validity
Produces a structured validity verdict with evidence for each dimension

SKILL.md

.github/skills/construct-validity-assessmentView on GitHub ↗

---
name: construct-validity-assessment
description: Evaluate whether benchmark measures its claimed capability
execution: subagent
prompt: ./prompt.md
input: benchmark_name, claimed_capability, task_examples
used-by: benchmark-archaeology
---

# Construct Validity Assessment SOP

Evaluate whether a benchmark actually measures the capability it claims to measure, using psychometric validity frameworks adapted for AI evaluation.

## Input

- **benchmark_name**: Name of the benchmark
- **claimed_capability**: What the benchmark authors claim it measures
- **task_examples**: Representative examples from the benchmark

## Procedure

1. Define the construct (claimed capability) precisely
2. Analyze task requirements — what skills are actually needed to solve examples?
3. Assess content validity — do items representatively sample the construct?
4. Check convergent validity — correlation with other measures of same construct
5. Check discriminant validity — independence from unrelated constructs
6. Identify construct-irrelevant variance (confounds)

## Output

Validity verdict with evidence for each validity dimension.

More from yogsoth-ai/de-anthropocentric-research-engine

Skill	Description
abductive-hypothesis-generation	Strategy: 面对异常的最佳解释推理
ablation-brainstorm	Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
ablation-component-mapping	Map system architecture to ablatable units for ablation studies
ablation-design	Design ablation studies to isolate component contributions in ML systems
ablation-execution	Remove components one by one from a system, record the response/impact of each removal.
abp-vulnerability-classification	Classify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
abstraction-extraction	Extract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
abstraction-ladder	Perform bisociation at multiple abstraction levels
abstraction-laddering	Move between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
abstraction-to-design	Abstract biological principle to design principle. Bridge from biology to engineering.