generate-asset-actions

Name: generate-asset-actions
Author: UKGovernmentBEIS/inspect_evals

$npx mdskill add UKGovernmentBEIS/inspect_evals/generate-asset-actions

Regenerate asset action tiers from asset inventory.

Updates asset policies based on host reliability and maintenance status.
Depends on ASSETS.yaml and internal audit manifest tools.
Classifies targets by comparing upstream history and domain trust.
Outputs updated YAML and markdown summaries for audit tracking.

SKILL.md

.github/skills/generate-asset-actionsView on GitHub ↗

---
name: generate-asset-actions
description: Generate asset-actions.yaml from ASSETS.yaml by classifying assets into priority tiers. Use when the user asks to regenerate, update, or refresh the asset actions.
---

# Generate Asset Policy

Regenerate `internal/audits/asset-actions.yaml` and `internal/audits/audit-summary.md` from `ASSETS.yaml`.

If ASSETS.yaml may be stale, run `uv run python tools/generate_asset_manifest.py` first.

Run `uv run python tools/summarise_asset_manifest.py` to get aggregate counts (by type, by state, totals). Use these numbers when populating `audit-summary.md`.

## Classification

Read `ASSETS.yaml`. For each asset, determine target stage first, then priority. Process both `state: floating` assets AND `state: pinned` assets that match known-unstable sources (since their target is `controlled`, they are not yet at their target stage).

### Target stages (per ADR-0007)

The target stage depends on **host reliability**, not asset type:

- **`controlled`** (Stage 2) — any asset where upstream has broken before, maintainer is unresponsive/deprecated, OR host is unreliable (personal repos, Google Drive, `.edu` domains, university servers, any host without version control). This applies to `git_clone`, `direct_url`, and `huggingface` alike.
- **`pinned`** (Stage 1) — assets on reliable, version-controlled hosts (GitHub, HuggingFace, well-known CDNs) with no history of breakage.

Per ADR-0007: "Anything hosted on a less reliable domain (personal websites, Google Drive, university servers, or any host without version control) should skip straight to Stage 2."

### Priority tiers

1. **Urgent** — all other floating refs on reliable hosts. Target is `pinned`.
2. **High** — matches a known-unstable source (see registry below). Target is `controlled`.
3. **Medium** — unreliable host (`drive.google.com`, `.edu` domains, personal repos/websites) not already in the known-unstable registry. Target is `controlled`.

For assets with `state: pinned` and a `{SHA}` placeholder but no checksum, classify as **Low** (target: `pinned` with checksum).

Omit assets already at their target stage.

Every entry needs: `eval`, `source`, `type`, `state`, `target`, `action`, `reason`.

## Known-Unstable Sources

Update this list when new instability is discovered.

| Source                           | Eval       | Incident                              |
| -------------------------------- | ---------- | ------------------------------------- |
| `xlang-ai/OSWorld`               | osworld    | Files removed (PR #958)               |
| `openai/evals`                   | makemesay  | Deprecated upstream                   |
| `corebench.cs.princeton.edu`     | core_bench | University server, no versioning      |
| `epatey/fonts`                   | osworld    | Personal repo                         |
| `ShishirPatil/gorilla`           | bfcl       | Data format issues (PR #954)          |
| `yunx-z/MLRC-Bench`              | mlrc_bench | Broken task                           |
| `LRudL/sad`                      | sad        | Upstream bugs (issues #7, #8)         |
| `meg-tong/sycophancy-eval`       | sycophancy | Invalid JSON/NaN, workaround in code  |
| `josancamon/paperbench`          | paperbench | Paper ID mismatch (HF discussion #2)  |
| `sentientfutures/moru-benchmark` | moru       | Exact duplicate rows                  |

## Verification

1. `asset-actions.yaml` parses as valid YAML
2. Every floating asset in ASSETS.yaml appears in urgent, high, or medium
3. `floating_assets + needing_checksums + no_action_needed == total_external_assets`
4. Numbers in `audit-summary.md` match output of `summarise_asset_manifest.py`

More from UKGovernmentBEIS/inspect_evals

Skill	Description
build-repo-context	Crawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request.
check-trajectories-workflow	Use Inspect Scout to analyze agent trajectories from evaluation log files. Runs default and custom scanners to detect external failures, formatting issues, reward hacking, and ethical refusals. Use when user asks to check/analyze agent trajectories. Trigger when the user asks you to run the "Check Agent Trajectories" workflow.
ci-maintenance-workflow	CI and GitHub Actions maintenance workflows — fix a failing test from a CI URL, fix a failing smoke test, add @pytest.mark.slow markers to slow tests, or review a PR against agent-checkable standards. Use when user asks to fix a failing test, fix a smoke test, mark slow tests, or review a PR. Trigger when the user asks you to run the "Write a PR For A Failing Test", "Fix A Failing Smoke Test", "Mark Slow Tests", or "Review PR According to Agent-Checkable Standards" workflow.
code-quality-fix-all	Fix code quality issues identified in a code quality review stored in agent_artefacts/code_quality/<topic>/. Systematically addresses issues found by the code-quality-review-all skill for ANY code quality topic, with validation and testing at each step. Use when user asks to fix issues from a code quality review, or asks to fix issues from agent_artefacts/code_quality/<topic>.
code-quality-review-all	Review all evaluations in the repository against a single code quality standard. Checks ALL evals against ONE standard for periodic quality reviews. Use when user asks to review/audit/check all evaluations for a specific topic or standard. Do NOT use for reviewing a single eval (use eval-quality-workflow instead) or for test coverage (use ensure-test-coverage instead).
create-eval	Redirect to the inspect-evals-template for creating new evaluations. New evals are no longer created in this repository — they live in standalone repos. Use when user asks to create/implement/build a new evaluation.
ensure-test-coverage	Ensure test coverage for a single evaluation - both reviewing existing tests and creating missing ones. Analyzes testable components, checks tests against repository conventions, reports coverage gaps, and creates or improves tests. Use when user asks to check/review/create/add/ensure tests for an eval. Use whenever you are asked to review an evaluation that contains tests, or whenever you need to write a suite of tests. Do NOT use for fixing a specific failing CI test (use ci-maintenance-workflow instead).
eval-quality-workflow	Fix or review a single evaluation against all EVALUATION_CHECKLIST.md standards. Use "fix" mode to refactor an eval into compliance, or "review" mode to assess compliance without making changes. Use when user asks to fix, review, or check an evaluation's quality. Trigger when the user asks you to run the "Fix An Evaluation" or "Review An Evaluation" workflow. Do NOT use for reviewing ALL evals against a single code quality standard (use code-quality-review-all instead).
eval-report-workflow	Create an evaluation report for a README by selecting models, estimating costs, running evaluations, and formatting results tables. Use when user asks to make/create/generate an evaluation report. Trigger when the user asks you to run the "Make An Evaluation Report" workflow.
eval-validity-review	Review a single evaluation's validity — whether its claims hold up, whether its name is accurate, whether samples can be both succeeded and failed at, and whether scoring measures ground truth. Use when user asks to check validity of an eval, or as part of the Master Checklist workflow. Do NOT use for code quality or test coverage (use eval-quality-workflow or ensure-test-coverage instead).