skill-benchmark

$npx mdskill add HoangNguyen0403/agent-skills-standard/skill-benchmark

Quantifies AI skill effectiveness against legacy constraints

  • Measures implementation quality improvements from active skills
  • Analyzes source files, active skills registry, and anti-pattern severity
  • Ranks legacy issues by severity and selects candidates automatically
  • Generates prioritized compliance delta and skill applicability reports

SKILL.md

.github/skills/skill-benchmarkView on GitHub ↗
---
name: skill-benchmark
description: "Benchmark AI skill effectiveness by measuring implementation quality against legacy constraints."
metadata:
  triggers:
    keywords:
    - skill benchmark
    - workflow
---
# Skill Benchmark Skill

> [!IMPORTANT]
> Benchmark AI skill effectiveness by measuring implementation quality against legacy constraints.

## Instructions

When the user asks to perform this workflow, execute the following steps:


# 📊 Skill Benchmark Orchestrator

> **Goal**: Quantify how much active skills improve implementation quality. Deliver a prioritized compliance delta and skill applicability report.

---

## Step 1 — Project Context & Active Skills

Identify the tech stack and all active skills in `AGENTS.md`.

```bash
# 1. Total source files and lines changed
find src -name "*.ts" -o -name "*.tsx" | xargs wc -l 2>/dev/null | sort -rn | head -20
# 2. Check active skill registry
cat AGENTS.md | head -80
```

---

## Step 2 — Auto-Select a Legacy Trap

Pick the file automatically. Rank candidates by the severity of anti-patterns:

- 🔴 **P0**: Hardcoded secrets; Logic inside UI components.
- 🟠 **P1**: Wrong Router pattern; Global state for local concerns; Missing design tokens.
- 🟡 **P2**: Raw user-facing strings (i18n).

---

## Step 3 — Build Eval-Driven Scorecard

Source your scorecard from `evals/evals.json`, not from hardcoded patterns.
Follow the Scorecard Rubric in `<SKILLS>/common/common-skill-creator/references/benchmark.md` when synced:

1. Read `<SKILLS>/<category>/<skill>/evals/evals.json`.
2. Generate columns for **Failure Pattern** and **Success Pattern**.
3. Refactor the file, citing the exact skill rule for each change.

---

## Step 4 — Benchmark Report & Compliance Delta

Output the scorecard and compliant score using the templates in `<SKILLS>/common/common-skill-creator/references/benchmark.md` when synced.

- **Compliance Score Before vs After**.
- **Δ Delta: +Z%** 🚀.
- **Eval Alignment**: How well does the skill teach what the eval tests?

---

## Step 5 — Skill Applicability & Iteration

For every `❌ FAIL`, identify the root cause using the **Iteration Table** in:
`<SKILLS>/common/common-skill-creator/references/benchmark.md` when synced.

1. Signal not matching file? → Refine trigger.
2. Rule too vague? → Add Anti-Pattern rule.
3. Conflict? → Ensure P0 overrides P1.

### Suggested .skillsrc Exclusions

Recommend any skills that are noisy or non-applicable for the project.

```yaml
exclude:
  - [skill-id] # reason
```

More from HoangNguyen0403/agent-skills-standard

SkillDescription
android-agp-upgradeUpgrade an Android project to Android Gradle Plugin (AGP) 9. Use when migrating to AGP 9, updating Gradle build files, migrating to built-in Kotlin, or adopting the new AGP DSL.
android-architectureApply Clean Architecture layering, modularization, and Unidirectional Data Flow in Android projects. Use when setting up project structure, placing code in layers, configuring feature/core modules, or implementing UDF patterns.
android-background-workImplement WorkManager and background processing correctly on Android. Use when creating Worker classes, scheduling tasks, choosing between WorkManager and Foreground Services, or setting up Hilt in workers.
android-composeBuild high-performance declarative UI with Jetpack Compose. Use when writing Composable functions, optimizing recomposition, hoisting state, or working with LazyColumn and side effects.
android-compose-migrationMigrate an Android XML View to Jetpack Compose following a structured 10-step workflow. Use when converting XML layouts to Compose, setting up Compose in an existing View-based project, or incrementally adopting Compose.
android-concurrencyWrite correct coroutine scopes, Flow collection, and dispatcher injection in Android. Use when writing suspend functions, choosing between StateFlow and SharedFlow, or injecting Dispatchers for testability.
android-deploymentConfigure release signing, R8 obfuscation, and App Bundle publishing for Android. Use when setting up signing configs, enabling minification, adding ProGuard keep rules, or preparing for Play Store submission.
android-design-systemEnforce Material Design 3 theming and design token usage in Jetpack Compose. Use when implementing M3 components, color schemes, typography, or design tokens.
android-diConfigure Hilt dependency injection with proper scoping, modules, and constructor injection in Android. Use when setting up Hilt DI, defining modules, or configuring component scoping.
android-edge-to-edgeMigrate a Jetpack Compose app to edge-to-edge display and fix system bar inset issues. Use when UI components are obscured by navigation/status bars, fixing IME insets, or enabling edge-to-edge for SDK 35+.