skill-benchmark

Name: skill-benchmark
Author: HoangNguyen0403/agent-skills-standard

$npx mdskill add HoangNguyen0403/agent-skills-standard/skill-benchmark

Quantifies AI skill effectiveness against legacy constraints

Measures implementation quality improvements from active skills
Analyzes source files, active skills registry, and anti-pattern severity
Ranks legacy issues by severity and selects candidates automatically
Generates prioritized compliance delta and skill applicability reports

SKILL.md

.github/skills/skill-benchmarkView on GitHub ↗

---
name: skill-benchmark
description: "Benchmark AI skill effectiveness by measuring implementation quality against legacy constraints."
metadata:
triggers:
keywords:
- skill benchmark
- workflow
---
# Skill Benchmark Skill

> [!IMPORTANT]
> Benchmark AI skill effectiveness by measuring implementation quality against legacy constraints.

## Instructions

When the user asks to perform this workflow, execute the following steps:

# 📊 Skill Benchmark Orchestrator

> **Goal**: Quantify how much active skills improve implementation quality. Deliver a prioritized compliance delta and skill applicability report.

---

## Step 1 — Project Context & Active Skills

Identify the tech stack and all active skills in `AGENTS.md`.

```bash
# 1. Total source files and lines changed
find src -name "*.ts" -o -name "*.tsx" | xargs wc -l 2>/dev/null | sort -rn | head -20
# 2. Check active skill registry
cat AGENTS.md | head -80
```

---

## Step 2 — Auto-Select a Legacy Trap

Pick the file automatically. Rank candidates by the severity of anti-patterns:

- 🔴 **P0**: Hardcoded secrets; Logic inside UI components.
- 🟠 **P1**: Wrong Router pattern; Global state for local concerns; Missing design tokens.
- 🟡 **P2**: Raw user-facing strings (i18n).

---

## Step 3 — Build Eval-Driven Scorecard

Source your scorecard from `evals/evals.json`, not from hardcoded patterns.
Follow the Scorecard Rubric in `<SKILLS>/common/common-skill-creator/references/benchmark.md` when synced:

1. Read `<SKILLS>/<category>/<skill>/evals/evals.json`.
2. Generate columns for **Failure Pattern** and **Success Pattern**.
3. Refactor the file, citing the exact skill rule for each change.

---

## Step 4 — Benchmark Report & Compliance Delta

Output the scorecard and compliant score using the templates in `<SKILLS>/common/common-skill-creator/references/benchmark.md` when synced.

- **Compliance Score Before vs After**.
- **Δ Delta: +Z%** 🚀.
- **Eval Alignment**: How well does the skill teach what the eval tests?

---

## Step 5 — Skill Applicability & Iteration

For every `❌ FAIL`, identify the root cause using the **Iteration Table** in:
`<SKILLS>/common/common-skill-creator/references/benchmark.md` when synced.

1. Signal not matching file? → Refine trigger.
2. Rule too vague? → Add Anti-Pattern rule.
3. Conflict? → Ensure P0 overrides P1.

### Suggested .skillsrc Exclusions

Recommend any skills that are noisy or non-applicable for the project.

```yaml
exclude:
- [skill-id] # reason
```

More from HoangNguyen0403/agent-skills-standard

Skill	Description
android-agp-upgrade	Upgrade an Android project to Android Gradle Plugin (AGP) 9. Use when migrating to AGP 9, updating Gradle build files, migrating to built-in Kotlin, or adopting the new AGP DSL.
android-architecture	Apply Clean Architecture layering, modularization, and Unidirectional Data Flow in Android projects. Use when setting up project structure, placing code in layers, configuring feature/core modules, or implementing UDF patterns.
android-background-work	Implement WorkManager and background processing correctly on Android. Use when creating Worker classes, scheduling tasks, choosing between WorkManager and Foreground Services, or setting up Hilt in workers.
android-compose	Build high-performance declarative UI with Jetpack Compose. Use when writing Composable functions, optimizing recomposition, hoisting state, or working with LazyColumn and side effects.
android-compose-migration	Migrate an Android XML View to Jetpack Compose following a structured 10-step workflow. Use when converting XML layouts to Compose, setting up Compose in an existing View-based project, or incrementally adopting Compose.
android-concurrency	Write correct coroutine scopes, Flow collection, and dispatcher injection in Android. Use when writing suspend functions, choosing between StateFlow and SharedFlow, or injecting Dispatchers for testability.
android-deployment	Configure release signing, R8 obfuscation, and App Bundle publishing for Android. Use when setting up signing configs, enabling minification, adding ProGuard keep rules, or preparing for Play Store submission.
android-design-system	Enforce Material Design 3 theming and design token usage in Jetpack Compose. Use when implementing M3 components, color schemes, typography, or design tokens.
android-di	Configure Hilt dependency injection with proper scoping, modules, and constructor injection in Android. Use when setting up Hilt DI, defining modules, or configuring component scoping.
android-edge-to-edge	Migrate a Jetpack Compose app to edge-to-edge display and fix system bar inset issues. Use when UI components are obscured by navigation/status bars, fixing IME insets, or enabling edge-to-edge for SDK 35+.