generate-asset-actions
$
npx mdskill add UKGovernmentBEIS/inspect_evals/generate-asset-actionsRegenerate asset action tiers from asset inventory.
- Updates asset policies based on host reliability and maintenance status.
- Depends on ASSETS.yaml and internal audit manifest tools.
- Classifies targets by comparing upstream history and domain trust.
- Outputs updated YAML and markdown summaries for audit tracking.
SKILL.md
.github/skills/generate-asset-actionsView on GitHub ↗
---
name: generate-asset-actions
description: Generate asset-actions.yaml from ASSETS.yaml by classifying assets into priority tiers. Use when the user asks to regenerate, update, or refresh the asset actions.
---
# Generate Asset Policy
Regenerate `internal/audits/asset-actions.yaml` and `internal/audits/audit-summary.md` from `ASSETS.yaml`.
If ASSETS.yaml may be stale, run `uv run python tools/generate_asset_manifest.py` first.
Run `uv run python tools/summarise_asset_manifest.py` to get aggregate counts (by type, by state, totals). Use these numbers when populating `audit-summary.md`.
## Classification
Read `ASSETS.yaml`. For each asset, determine target stage first, then priority. Process both `state: floating` assets AND `state: pinned` assets that match known-unstable sources (since their target is `controlled`, they are not yet at their target stage).
### Target stages (per ADR-0007)
The target stage depends on **host reliability**, not asset type:
- **`controlled`** (Stage 2) — any asset where upstream has broken before, maintainer is unresponsive/deprecated, OR host is unreliable (personal repos, Google Drive, `.edu` domains, university servers, any host without version control). This applies to `git_clone`, `direct_url`, and `huggingface` alike.
- **`pinned`** (Stage 1) — assets on reliable, version-controlled hosts (GitHub, HuggingFace, well-known CDNs) with no history of breakage.
Per ADR-0007: "Anything hosted on a less reliable domain (personal websites, Google Drive, university servers, or any host without version control) should skip straight to Stage 2."
### Priority tiers
1. **Urgent** — all other floating refs on reliable hosts. Target is `pinned`.
2. **High** — matches a known-unstable source (see registry below). Target is `controlled`.
3. **Medium** — unreliable host (`drive.google.com`, `.edu` domains, personal repos/websites) not already in the known-unstable registry. Target is `controlled`.
For assets with `state: pinned` and a `{SHA}` placeholder but no checksum, classify as **Low** (target: `pinned` with checksum).
Omit assets already at their target stage.
Every entry needs: `eval`, `source`, `type`, `state`, `target`, `action`, `reason`.
## Known-Unstable Sources
Update this list when new instability is discovered.
| Source | Eval | Incident |
| -------------------------------- | ---------- | ------------------------------------- |
| `xlang-ai/OSWorld` | osworld | Files removed (PR #958) |
| `openai/evals` | makemesay | Deprecated upstream |
| `corebench.cs.princeton.edu` | core_bench | University server, no versioning |
| `epatey/fonts` | osworld | Personal repo |
| `ShishirPatil/gorilla` | bfcl | Data format issues (PR #954) |
| `yunx-z/MLRC-Bench` | mlrc_bench | Broken task |
| `LRudL/sad` | sad | Upstream bugs (issues #7, #8) |
| `meg-tong/sycophancy-eval` | sycophancy | Invalid JSON/NaN, workaround in code |
| `josancamon/paperbench` | paperbench | Paper ID mismatch (HF discussion #2) |
| `sentientfutures/moru-benchmark` | moru | Exact duplicate rows |
## Verification
1. `asset-actions.yaml` parses as valid YAML
2. Every floating asset in ASSETS.yaml appears in urgent, high, or medium
3. `floating_assets + needing_checksums + no_action_needed == total_external_assets`
4. Numbers in `audit-summary.md` match output of `summarise_asset_manifest.py`