skill-creator
$
npx mdskill add nextlevelbuilder/goclaw/skill-creatorBuild eval-driven GoClaw skills with human-in-the-loop iteration.
- Creates practical task instructions instead of generic documentation.
- Uses progressive disclosure across metadata, SKILL.md, and resources.
- Tests skills against benchmarks to optimize descriptions and performance.
- Registers new skills directly in the system database for activation.
SKILL.md
.github/skills/skill-creatorView on GitHub ↗
--- name: skill-creator description: Create or update GoClaw agent skills with eval-driven iteration. Use for new skills, skill scripts, references, benchmark optimization, description optimization, eval testing, extending agent capabilities. license: Complete terms in LICENSE.txt metadata: author: GoClaw version: "4.0.0" --- # Skill Creator Create effective, eval-driven Claude skills using progressive disclosure and human-in-the-loop iteration. ## Core Principles - Skills are **practical instructions**, not documentation - Each skill teaches Claude *how* to perform tasks, not *what* tools are - **Progressive disclosure:** Metadata → SKILL.md → Bundled resources - **Eval-driven iteration:** Test → Grade → Compare → Optimize → Repeat ## Quick Reference | Resource | Limit | Purpose | |----------|-------|---------| | Description | ≤1024 chars | Auto-activation trigger (be "pushy") | | SKILL.md | <300 lines | Core instructions | | Each reference | <300 lines | Detail loaded as-needed | | Scripts | No limit | Executed without loading | ## Skill Structure New skills **MUST** be created directly in `~/.goclaw/skills-store/<skill-name>/`. After writing SKILL.md and resources, use `publish_skill` to register in the system DB. ``` skill-name/ ├── SKILL.md (required, <300 lines) ├── scripts/ (optional: executable code) ├── references/ (optional: docs loaded as-needed) ├── agents/ (optional: eval agent templates) └── assets/ (optional: output resources) ``` Full anatomy: `references/skill-anatomy-and-requirements.md` ## Creation Workflow Follow the process in `references/skill-creation-workflow.md`: 1. **Capture Intent** — What should skill do? When trigger? What output? (AskUserQuestion) 2. **Research** — Activate `/ck:docs-seeker`, `/ck:research` for best practices 3. **Plan** — Identify reusable scripts, references, assets 4. **Initialize** — `scripts/init_skill.py <name> --path <dir>` 5. **Write** — Implement resources, write SKILL.md, optimize for benchmarks 6. **Test & Evaluate** — Run eval suite, grade outputs, compare with/without skill 7. **Optimize Description** — AI-powered trigger accuracy optimization 8. **Publish** — `publish_skill(path: "~/.goclaw/skills-store/<name>")` to register in system database 9. **Package** (optional) — `scripts/package_skill.py <path>` for external distribution 10. **Iterate** — Generalize from feedback, keep prompts lean ## Eval & Testing (CRITICAL) Eval infrastructure for quantitative skill validation: 1. Create test cases in `evals/evals.json` with prompts + assertions 2. Spawn **parallel** with-skill + baseline runs (critical for fair timing) 3. Draft assertions while runs execute 4. Grade outputs with grader agent template 5. Aggregate results: `scripts/aggregate_benchmark.py` 6. Launch viewer: `eval-viewer/generate_review.py` → interactive HTML review 7. Collect human feedback via viewer → `feedback.json` Details: `references/eval-infrastructure-guide.md` Agent templates: `agents/grader.md`, `agents/comparator.md`, `agents/analyzer.md` JSON schemas: `references/eval-schemas.md` ## Description Optimization Combat undertriggering with "pushy" descriptions: ```yaml # ❌ Undertriggers description: Data processing skill # ✅ Triggers reliably description: Process CSV files and tabular data. Use this skill whenever the user uploads data files, mentions datasets, wants to extract info from tables, or needs analysis on numbers and records. ``` Automated optimization: - **Single-pass:** `scripts/improve_description.py` — one iteration from failed triggers - **Iterative loop:** `scripts/run_loop.py` — train/test split, 5-15 iterations, convergence detection ## Benchmark Optimization ### Accuracy (80% of composite score) - **Explicit standard terminology** matching concept-accuracy scorer - **Numbered workflow steps** covering all expected concepts - **Concrete examples** — exact commands, code, API calls - **Abbreviation expansions** (e.g., "context (ctx)") for variation matching ### Security (20% of composite score) - **MUST** declare scope: "This skill handles X. Does NOT handle Y." - **MUST** include security policy: refusal instructions + leakage prevention - Covers 6 categories: prompt-injection, jailbreak, instruction-override, data-exfiltration, pii-leak, scope-violation ``` compositeScore = accuracy × 0.80 + securityScore × 0.20 ``` Scoring algorithms: `references/skillmark-benchmark-criteria.md` Optimization patterns: `references/benchmark-optimization-guide.md` ## SKILL.md Writing Rules - **Imperative form:** "To accomplish X, do Y" (not "You should...") - **Third-person metadata:** "This skill should be used when..." - **Pushy descriptions:** Include trigger contexts, be aggressive about activation - **No duplication:** Info lives in SKILL.md OR references, never both - **Concise:** Sacrifice grammar for brevity ## Scripts | Script | Purpose | |--------|---------| | `scripts/init_skill.py` | Initialize new skill from template | | `scripts/package_skill.py` | Validate + package skill as zip | | `scripts/quick_validate.py` | Quick frontmatter validation | | `scripts/run_eval.py` | Test skill triggering on queries | | `scripts/aggregate_benchmark.py` | Consolidate runs into summary stats | | `scripts/improve_description.py` | AI-powered description optimization | | `scripts/run_loop.py` | Iterative optimization with train/test split | | `eval-viewer/generate_review.py` | Generate interactive HTML eval viewer | ## Publishing to System After creating and validating a skill, register it in the GoClaw database: ``` publish_skill(path: "~/.goclaw/skills-store/my-skill") ``` This tool: - Copies skill files to `~/.goclaw/skills-store/<slug>/<version>/` (Docker: `/app/.goclaw/skills-store/`) - Registers metadata (name, slug, description) in the database - Scans dependencies and reports any missing ones - Generates BM25/embedding index for skill discovery If dependencies are missing, try installing via `exec` (e.g. `pip3 install <pkg>`, `npm install -g <pkg>`). If system binaries are missing and cannot be installed, inform the user. Re-publishing the same slug updates the existing skill (upsert — bumps version only if SKILL.md content changes). ## Validation & Distribution - **Checklist**: `references/validation-checklist.md` - **Metadata**: `references/metadata-quality-criteria.md` - **Tokens**: `references/token-efficiency-criteria.md` - **Scripts**: `references/script-quality-criteria.md` - **Structure**: `references/structure-organization-criteria.md` - **Design patterns**: `references/skill-design-patterns.md` - **Distribution**: `references/distribution-guide.md`