code-quality-fix-all
$
npx mdskill add UKGovernmentBEIS/inspect_evals/code-quality-fix-allFix code quality issues from reviews with validation and testing.
- Resolves problems found in code quality review artifacts.
- Depends on code-quality-review-all skill and agent_artefacts.
- Selects targets via user questions and complexity filters.
- Delivers corrected code with step-by-step validation checks.
SKILL.md
.github/skills/code-quality-fix-allView on GitHub ↗
---
name: code-quality-fix-all
description: Fix code quality issues identified in a code quality review stored in agent_artefacts/code_quality/<topic>/. Systematically addresses issues found by the code-quality-review-all skill for ANY code quality topic, with validation and testing at each step. Use when user asks to fix issues from a code quality review, or asks to fix issues from agent_artefacts/code_quality/<topic>.
---
# Code Quality Fix All
Fix code quality issues identified in a code quality review. This skill systematically addresses issues found by the `code-quality-review-all` skill for ANY code quality topic, with validation and testing at each step.
## Expected Arguments
When invoked, this skill expects the path to a code quality topic as an argument (e.g., `agent_artefacts/code_quality/private_api_imports`).
If not provided, the skill will ask the user for the topic path. Within the topic path, there are several files:
- README.md - contains description of the issue and examples of how to fix it
- results.json - contains list of all identified issues
- SUMMARY.md - contains summary of the identified issues
**Filters and options** are specified interactively after the skill starts by using the `AskUserQuestion` tool to present options unless specified otherwise in arguments.
- Which issue types to target:
- all
- specific types
- Fix complexity level (easy only, medium and below, or all)
- Which evaluations to fix (all, specific ones, evaluations with small number of issues)
- Maximum number of issues to fix in this run
## Workflow
### Phase 1: Understanding the Topic and Planning
1. **Read topic documentation**
- Read the README.md to understand:
- What code quality issue this topic addresses
- Why it matters (stability, maintainability, etc.)
- How to detect the issue
- How to fix the issue (fix patterns, examples)
- Read results.json to get all identified issues
- Identify which issues are in scope based on arguments
2. **Analyze and categorize issues**
- Analyze fix complexity based on:
- issue_description
- suggested_fix from results.json
- Fix examples in README.md
- Classify as:
- **Easy**: Single-line changes, clear fix pattern in README
- **Medium**: Multi-line changes, well-documented fix approach
- **Hard**: No clear fix pattern, requires research or copying code
- Group issues by evaluation and issue type
- Generate statistics for presenting to user
3. **Ask user for filtering preferences**
- Use `AskUserQuestion` tool to ask:
- Which evaluations to fix? (all / specific ones / most affected)
- Which issue types to target? (all / specific types)
- Fix complexity level? (easy only / easy+medium / all)
- Max issues per run? (all / limit to specific number)
- Apply filters based on user responses
- Present filtered plan with:
- Number of issues to fix
- Breakdown by evaluation and issue type
- Complexity distribution
- Ask for final confirmation to proceed
4. **Validate understanding of fixes**
- For each unique issue type in scope:
- Check if README.md documents how to fix it
- Look for "Good Examples" and "Bad Examples" sections
- Check "suggested_fix" field in results.json
- If fix approach is unclear for any issue type:
- Research the correct approach
- Update `<topic>/README.md` with findings
- Ask user for guidance if still uncertain
### Phase 2: Pre-Fix Validation
For each issue to be fixed:
1. **Read and understand context**
- Read the entire file containing the issue (not just the line)
- Understand how the problematic code is used
- Look for related issues in the same file
- Check for patterns that might affect the fix (e.g., multiple occurrences)
- Identify any cascading changes needed (related imports, type hints, etc.)
2. **Validate the suggested fix**
- Review the "suggested_fix" from results.json
- Check against fix patterns in README.md
- Verify the fix won't break functionality
- For complex fixes:
- Check if dependencies/alternatives actually exist
- Validate that replacement code follows same patterns
- Consider edge cases
3. **Estimate change scope**
- Count how many lines will change for this fix
- Identify if cascading changes are needed
- Determine if multiple files need updating
- **If changes exceed 100 lines for a single issue:**
- Alert user with:
- Issue details
- Why the change is large
- What will change
- Get explicit approval before proceeding
### Phase 3: Applying Fixes
1. **Apply fixes systematically**
- Create a new branch to apply fixes to, with a name like agent/<short_description_of_issue>
- Process one evaluation at a time
- Within each evaluation, group by issue type
- For each fix:
- Use Edit tool to apply the change
- Follow the suggested_fix guidance
- Apply fix patterns from README.md
- Handle related issues in same file together
- Add comments if the fix requires it (e.g., copied code attribution)
- Track what was fixed
1. **Verify changes compile/parse**
- After fixing each file, validate:
- File is syntactically valid (Python can parse it)
- No obvious import errors introduced
- Code follows repository patterns
- If validation fails:
- Investigate the issue
- Attempt to fix validation error
- Rollback change if cannot be resolved
2. **Track progress**
- Maintain list of:
- Issues successfully fixed (file, line, issue type)
- Issues that couldn't be fixed (with reasons)
- Evaluations that have been modified
- Files that were changed
### Phase 4: Testing and Validation
1. **Run linting**
- Run repository's linter on modified files (ruff, flake8, mypy, etc.)
- Check for:
- Import errors
- Type checking errors
- Style violations introduced
- Fix any linting issues that result from changes
- If linting issues can't be fixed, document them
2. **Run unit tests**
- Identify test files for each modified evaluation
- Run unit tests for affected evaluations using pytest:
**Basic test commands:**
```bash
# Install relevant packages in the event of import failure
uv sync --extra test
# Run tests for a specific evaluation
uv run pytest tests/<evaluation_name>/
# Run a specific test file
uv run pytest tests/test_file.py
# Run a specific test
uv run pytest tests/test_file.py::TestClass::test_method
# Run slow tests (excluded by default)
uv run pytest --runslow tests/
# Skip dataset download tests
uv run pytest -m 'not dataset_download' tests/
# Run only slow tests
uv run pytest -m slow tests/
```
**Test markers to be aware of:**
- `@pytest.mark.slow` - Tests taking >10 seconds
- `@pytest.mark.dataset_download` - Tests that download datasets
- `@pytest.mark.docker` - Tests using Docker
- `@pytest.mark.huggingface` - HuggingFace-related tests
- Focus on tests for the specific evaluation
- Look for test failures or errors
- **IMPORTANT**: Do NOT run full evaluations (they take too long) unless user explicitly requests it
3. **Handle test failures**
- For each test failure:
- Read test output carefully
- Determine if failure is caused by the fix
- Check if it's a pre-existing failure
- If caused by fix:
- Try to adjust the fix to make tests pass
- If cannot be resolved, rollback the change
- Document the issue for user review
- If pre-existing:
- Note it but don't block on it
- Inform user
### Phase 5: Re-Review and Handle Remaining Issues
1. **Update results.json with fix status**
- For each issue that was fixed, add `"fix_status"` field after `"suggested_fix"`:
```json
{
...
"suggested_fix": "...",
"fix_status": "fixed - please review"
}
```
- For issues that couldn't be fixed, add explanation:
```json
"fix_status": "not fixed - reason: ..."
```
- **IMPORTANT**: Do NOT remove any entries from results.json - only add/update "fix_status"
- The code-quality-review-all skill owns results.json and is responsible for removing entries
2. **Re-run code quality review**
- **IMPORTANT**: Use Task tool to spawn subagent running code-quality-review-all skill
- Pass the same topic path
- This will update results.json with current state
- Compare results before and after to identify:
- Issues that are now resolved (no longer appear)
- New issues that may have been introduced
- Issues that still remain despite fix attempts
3. **Fix remaining issues if in scope**
- For each new or remaining in-scope issue:
- Investigate why previous fix didn't work
- Attempt alternative fix approach
- Update "fix_status" with attempt results
- Repeat this process until no more in-scope issues can be fixed
4. **Update topic's README.md**
- Add any knowledge that you have discovered that will be useful in detecting or fixing topic-related issues in the future
- Do not remove examples of bad code or patterns that were fixed - they will be useful in future reviews and fixes of future evaluations.
5. **Update SUMMARY.md**
- Add a "Recent Fixes" section with:
- Date of fix run
- Number of issues fixed
- Which evaluations were updated
- Keep historical data (don't remove past information)
- Update recommendations to reflect remaining work
6. **Run markdown linters**
- Use `uv run pre-commit run markdownlint-fix` to fix markdown linting issues
### Phase 6: Create PR Description and Present Results
1. **Create/Update PR description (cumulative)**
- Read existing `PR_DESCRIPTION.md` if it exists (from previous runs)
- **Cumulative tracking**: PR description represents ALL changes from branch base, not just this run
- If PR_DESCRIPTION.md exists:
- Parse existing content to extract previous runs' data
- Append information from this run
- Update cumulative statistics
- If PR_DESCRIPTION.md doesn't exist (first run):
- Create new file
- Format for GitHub/GitLab pull request with:
- **Summary**: Brief overview of the code quality topic and total fixes (2-3 sentences)
- **Overall Changes** (cumulative from all runs):
- Total issues fixed across all runs by type
- Total evaluations affected
- Total files modified
- **Fix Sessions**: List each run session with:
- Date/time of run
- Issues fixed in that session
- Complexity level targeted (easy/medium/all)
- **Fixed Issues** (cumulative): Table or list with all file paths and issue types from all runs
- **Testing** (from latest run):
- Which tests were run
- Pass/fail status
- Any test issues encountered
- **Remaining Issues** (current state):
- Count of issues still open
- Brief note on what remains
- **Review Notes** (cumulative):
- Any complications or special considerations from any run
- Areas that need extra attention during review
- Use proper markdown formatting for PR readability
- **IMPORTANT**: Do NOT commit PR_DESCRIPTION.md - it's only for creating the PR
- Example structure:
```markdown
## Summary
Fix private API imports code quality issues across evaluations.
## Overall Changes
- Total issues fixed: 25
- Evaluations affected: 8
## Fix Sessions
### Session 1: 2026-01-18 10:30 (Easy issues)
- Fixed 10 easy issues
- Targeted: Easy complexity, All evaluations
### Session 2: 2026-01-18 14:15 (Medium issues)
- Fixed 15 medium issues
- Targeted: Medium complexity, Specific evaluations
## Fixed Issues
[Table of all fixed issues from all sessions]
## Testing
[Latest test results]
## Remaining Issues
6 issues remain (4 hard, 2 require investigation)
## Review Notes
- Session 1: All tests passed
- Session 2: One test required adjustment in fortress/scorer.py
```
2. **Present results to user**
- Show high-level summary for **this run**:
- X issues fixed in this session
- Y issues remain
- Z tests passed
- Show **cumulative progress** from PR_DESCRIPTION.md:
- Total issues fixed across all runs
- Number of fix sessions completed
- Show before/after statistics from SUMMARY.md
- List modified files from this run
- Display content of PR_DESCRIPTION.md for user review
- **Do NOT automatically commit** - let user review changes
3. **Offer next steps**
- **Create commit and PR**: Offer to:
- Commit all changes (source files, results.json, SUMMARY.md)
- Create pull request with description from PR_DESCRIPTION.md
- Note: PR_DESCRIPTION.md itself is NOT committed (it's just for PR description)
- The PR description includes **cumulative changes from all fix sessions** on this branch
- **Run more fixes**: If issues remain, suggest running skill again with different filters
- Running again will **append** to PR_DESCRIPTION.md, creating cumulative tracking
- This allows iterative fixing: easy issues first, then medium, then hard
- **Manual review needed**: List any issues that require manual attention
## Important Guidelines
### Safety First
- **Never batch all fixes blindly** - Validate each fix type before applying en masse
- **Always read before editing** - Understand context before changing code
- **Verify fixes don't break functionality** - Run tests incrementally
- **Be conservative** - Skip fixes you're uncertain about rather than risk breaking code
- **Get approval for large changes** - Alert user when fixes exceed 100 lines
- **Have rollback strategy** - Be able to revert if fixes cause problems
### Context is Critical
- **Understand the quality issue** - Read README.md thoroughly
- **Understand why code was written that way** - There might be good reasons
- **Look for patterns** - Similar issues often need similar fixes
- **Check related code** - Fixes might require updating nearby code
- **Read existing comments** - Developers might have documented why they used certain patterns
### Validation at Every Step
- **Verify fix patterns from README** - Don't guess how to fix
- **Check suggested_fix in results.json** - Use provided guidance
- **Validate changes compile** - Ensure code parses after changes
- **Run linters** - Catch style and import issues
- **Run tests** - Detect regressions immediately
- **Re-run review** - Verify fixes actually resolve issues
### Communication
- **Show plan before executing** - Let user see what will be fixed
- **Alert for large changes** - Get approval for fixes >10 lines
- **Report uncertainties** - Flag issues where fix approach is unclear
- **Show progress** - Keep user informed during fixes
- **Explain failures** - Document why certain issues couldn't be fixed
- **Provide detailed reports** - Create comprehensive fix reports
### What NOT to Do
- **Don't assume you know how to fix** - Always consult README.md and results.json
- **Don't remove entries from results.json** - Only add/update "fix_status" field
- **Don't replace PR_DESCRIPTION.md** - Append to it to maintain cumulative history across runs
- **Don't run full evaluations** - Only run unit tests (evaluations are slow)
- **Don't commit automatically** - Let user review changes first
- **Don't commit PR_DESCRIPTION.md** - It's only for creating the PR
- **Don't fix issues you can't validate** - Skip rather than risk breaking
- **Don't ignore test failures** - Investigate or rollback
- **Don't make unrelated changes** - Only fix the specific quality issues
- **Don't assume all issues of same type are identical** - Context matters