build-repo-context
$
npx mdskill add UKGovernmentBEIS/inspect_evals/build-repo-contextExtracts institutional knowledge from GitHub history into a shared context document.
- Helps agents understand repo conventions and avoid common mistakes.
- Integrates with GitHub CLI to fetch PRs, issues, and review comments.
- Decides scope by checking existing document headers for date and PR ranges.
- Outputs an updated markdown file containing distilled knowledge and patterns.
SKILL.md
.github/skills/build-repo-contextView on GitHub ↗
---
name: build-repo-context
description: Crawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request.
---
# Build Repo Context
Crawl GitHub history (PRs, issues, review comments) and distill institutional knowledge into `agent_artefacts/repo_context/REPO_CONTEXT.md`. This document helps worker agents understand repo conventions, common mistakes, and known tech debt before making changes.
## Workflow
### 1. Setup
1. Create `agent_artefacts/repo_context/` if it doesn't exist
2. Read existing `agent_artefacts/repo_context/REPO_CONTEXT.md` if present (will be updated, not replaced)
### 2. Identify What's New
Use the header of `REPO_CONTEXT.md` to determine what to process. The header contains the last-updated date and PR range (e.g., `PRs processed: #965-#1050`).
- **First run** (no `REPO_CONTEXT.md`): Fetch the most recent 50 merged PRs + all open issues
- **Incremental runs**: Fetch PRs merged after the highest PR number in the header, and issues updated since the last-updated date
Use the `gh` CLI to list candidates:
```bash
# First run: recent merged PRs
gh pr list --state merged --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt
# Incremental: PRs merged since last crawl
gh pr list --state merged --search "merged:>YYYY-MM-DD" --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt
# Open issues
gh issue list --state open --limit 100 --json number,title,labels,createdAt,updatedAt
```
### 3. Triage
Fast pass over PR titles and metadata. **Skip** these categories (they rarely contain design insights):
- Dependency bumps (titles matching `bump`, `update dependencies`, `renovate`, `dependabot`)
- Changelog-only updates (titles matching `changelog`, `scriv`)
- Bot-generated PRs with no review comments
- PRs with fewer than 5 lines changed and no review comments
**Prioritize** PRs that have:
- Review comments (especially multiple rounds — that's where design discussion lives)
- Changes touching shared utilities (`src/inspect_evals/utils/`, `CONTRIBUTING.md`, `BEST_PRACTICES.md`, `AGENTS.md`)
**Cap at 50 PRs per run** to keep execution time reasonable.
### 4. Extract
For each selected PR, fetch:
```bash
# PR body and metadata
gh pr view <N> --json body,title,labels,files,reviewDecision,comments,reviews
# Review comments (inline code review feedback)
gh api repos/{owner}/{repo}/pulls/<N>/comments --paginate
# Issue comments (general discussion)
gh api repos/{owner}/{repo}/issues/<N>/comments --paginate
```
For open issues, fetch body and comments similarly.
**Link traversal**: If a comment references another PR/issue (e.g., "see #123" or "fixed in #456"), continue to crawl recursively up to 3 hops in total. Do not recurse to an existing PR/issue in the chain to prevent loops.
### 5. Distill
This is the core intellectual work. For each PR/issue, extract **actionable insights** in these categories:
- **Design decisions**: What architectural choice was made and why? What alternatives were rejected?
- **Reviewer corrections**: What mistakes did reviewers catch? These reveal common pitfalls.
- **Established conventions**: What patterns were deliberately chosen that future contributors should follow?
- **Tech debt acknowledged**: What shortcuts were taken intentionally? What should NOT be "fixed" without discussion?
- **Common agent mistakes**: If review comments mention agent-generated code issues, capture the pattern.
**Quality requirements for each insight**:
- Must cite source PR/issue number (e.g., "Per PR #973...")
- Must be actionable ("Do X" / "Don't do Y"), not descriptive ("PR #123 added X")
- Must add nuance beyond what CONTRIBUTING.md and BEST_PRACTICES.md already state
- Must be relevant to future contributors, not just historically interesting
- Must be broadly applicable beyond a single issue or evaluation. If the context is excessively narrow, leave it out.
- Must reflect team convention, not a single maintainer's code style or proposal. If in doubt, leave it out.
**Skip**:
- Bot comments (dependabot, renovate, CI status checks)
- Feature announcements without design implications
- Trivial PRs (typo fixes, version bumps) unless they reveal a convention
- Duplicate insights already captured in REPO_CONTEXT.md
### 6. Merge Into REPO_CONTEXT.md
Integrate new insights into the existing document structure. **Do not just append** — place each insight in the appropriate section and deduplicate:
- If a new insight updates or supersedes an existing one, replace it
- If a section is getting too long, distill further (combine related insights)
- Update the header metadata (last updated date, PR watermark)
- Keep total document size between 500-1000 lines (aggressive distillation if over)
**Each insight appears in exactly one section** — do not repeat the same rule across multiple sections with different framing (see step 7).
### 7. Deduplicate & Consolidate
After merging, review the full document for **cross-section duplication**. This is critical — incremental runs naturally introduce duplication because the same convention surfaces in multiple PR reviews (e.g., "use `@pytest.mark.docker`" might appear as a reviewer correction, an established convention, AND a testing recipe).
**Process**:
1. For each insight, search the entire document for overlapping content. Look for insights that cover the same topic even if phrased differently.
2. Keep each insight in **exactly one location** — the most specific section that fits. Prefer this priority:
- "Rules & Conventions" for mandatory practices ("always do X", "never do Y")
- "Testing Recipes" for detailed how-to patterns (mock setup, test structure)
- "Known Tech Debt" for acknowledged issues that should not be fixed without discussion
- "CI/Tooling" for build/CI/tooling specifics
- "Open Issues" for bugs and design direction
3. Remove the duplicate occurrences, keeping the most complete/specific version.
4. Combine related insights that are split across bullets into a single, richer bullet.
**Common duplication patterns to watch for**:
- The same pytest marker rule appearing in both "Rules" and "Testing Recipes"
- Reviewer corrections that duplicate established conventions (merge into the convention)
- Agent mistakes that are just the inverse of an established convention (keep only the convention)
- API usage patterns appearing in both rules and recipes (keep the rule brief, detail in recipes)
## Bounding Rules
| Rule | Limit |
| ---------------------------- | ------------------------------------------- |
| First run scope | Most recent 50 merged PRs + all open issues |
| Incremental run scope | New items since last crawl |
| Max PRs per run | 50 |
| Link traversal depth | 3 hops |
| Target REPO_CONTEXT.md size | 500-1000 lines |
| Max issues per run | 100 |
## Insight Quality Guidelines
These are critical — the value of REPO_CONTEXT.md depends on insight quality:
1. **Every insight must cite its source** PR or issue number. It is acceptable to cite multiple sources for the same insight.
2. **Insights must be actionable**: "Do X" / "Don't do Y", not "PR #123 added X"
3. **Don't duplicate existing docs**: Only add nuance that CONTRIBUTING.md and BEST_PRACTICES.md miss
4. **Skip noise**: Bot comments, feature announcements without design implications, trivial PRs
5. **Focus on**: Reviewer corrections, design trade-offs, rejected alternatives, acknowledged tech debt, common agent mistakes
6. **Be specific**: "Use `hf_dataset()` wrapper instead of raw `load_dataset()` for HuggingFace datasets (PR #842)" is better than "Use the right dataset loading function"
7. **Date-stamp volatile insights**: If an insight might become stale (e.g., "Currently X is broken"), include the date so agents can verify
## Expected Output
After running this workflow:
```text
agent_artefacts/repo_context/
└── REPO_CONTEXT.md # Distilled institutional knowledge (committed)
```
## Verification Checklist
After each run, verify:
1. `REPO_CONTEXT.md` exists and has well-structured content
2. Insights cite source PR/issue numbers
3. Insights are actionable, not merely descriptive
4. **No duplicate insights across sections** — search for key terms (e.g., `sample ID`, `get_model`, `@pytest.mark`) and confirm each appears in exactly one place
5. Document stays under ~1000 lines
6. Header metadata (date, PR range) is updated
7. Incremental runs don't reprocess already-crawled PRs