robustness-design

Name: robustness-design
Author: yogsoth-ai/de-anthropocentric-research-engine

$npx mdskill add yogsoth-ai/de-anthropocentric-research-engine/robustness-design

**Question**: Under what conditions does the method fail?

SKILL.md

.github/skills/robustness-designView on GitHub ↗

---
name: robustness-design
description: "Design experiments to identify failure boundaries and robustness limits"
version: 1.0.0
category: experiment-execution
type: strategy
used-by: experiment-design
sops:
  - factor-identification
  - level-specification
  - baseline-selection
  - metric-specification
  - sample-size-estimation
  - design-matrix-construction
tactics:
  - statistical-method-selection
---

# Strategy: Robustness Design

**Question**: Under what conditions does the method fail?

## Methodology

- **Distribution Shift Testing**: Evaluate under covariate shift, label shift, domain shift.
- **Adversarial Robustness**: Perturbation-based attacks (PGD, AutoAttack) at varying epsilon.
- **Cross-Domain Transfer**: Test on domains not seen during training.
- **Noise Injection**: Gaussian noise, label noise, missing data at varying severity.
- **Stress Testing**: Push inputs to boundary conditions (extreme lengths, rare categories, edge cases).

## Execution Flow

1. **factor-identification** → Identify robustness dimensions (noise type, shift type, severity)
2. **level-specification** → Define severity levels for each perturbation
3. **baseline-selection** → Select robust baselines for comparison
4. **metric-specification** → Define degradation metrics (absolute and relative to clean)
5. **design-matrix-construction** → Build perturbation grid
6. **sample-size-estimation** → Determine samples needed per condition
7. **statistical-method-selection** (tactic) → Choose tests for degradation significance

## Budget Gate

| Robustness Type | Conditions | Severities | Min Runs | Notes |
|----------------|-----------|-----------|----------|-------|
| Single perturbation | 1 | 3-5 | 3-5 | Quick sanity check |
| Multi-perturbation | 3-5 | 3 each | 9-15 | Standard robustness eval |
| Adversarial sweep | 1 attack | 5-10 epsilon | 5-10 | Adversarial robustness curve |
| Comprehensive | 5+ types | 3-5 each | 50+ | Publication-ready robustness |
| Cross-domain | N domains | 1 | N | Transfer evaluation |

More from yogsoth-ai/de-anthropocentric-research-engine

Skill	Description
abductive-hypothesis-generation	Strategy: 面对异常的最佳解释推理
ablation-brainstorm	Remove components one by one, observe system changes to reveal hidden dependencies and generate ideas from structural gaps.
ablation-component-mapping	Map system architecture to ablatable units for ablation studies
ablation-design	Design ablation studies to isolate component contributions in ML systems
ablation-execution	Remove components one by one from a system, record the response/impact of each removal.
abp-vulnerability-classification	Classify assumptions on 2 axes — load-bearing (how much conclusion depends on it) × vulnerable (how likely to be false). Focuses attention on High-Load × High-Vulnerable quadrant.
abstraction-extraction	Extract abstract principles from concrete domain cases. Strips domain-specific details to reveal transferable mechanisms.
abstraction-ladder	Perform bisociation at multiple abstraction levels
abstraction-laddering	Move between concrete and abstract framings — 3 levels up (Why?) and 3 levels down (How?) to find the most productive research level.
abstraction-to-design	Abstract biological principle to design principle. Bridge from biology to engineering.