skill-creator

Name: skill-creator
Author: nextlevelbuilder/goclaw

$npx mdskill add nextlevelbuilder/goclaw/skill-creator

Build eval-driven GoClaw skills with human-in-the-loop iteration.

Creates practical task instructions instead of generic documentation.
Uses progressive disclosure across metadata, SKILL.md, and resources.
Tests skills against benchmarks to optimize descriptions and performance.
Registers new skills directly in the system database for activation.

SKILL.md

.github/skills/skill-creatorView on GitHub ↗

---
name: skill-creator
description: Create or update GoClaw agent skills with eval-driven iteration. Use for new skills, skill scripts, references, benchmark optimization, description optimization, eval testing, extending agent capabilities.
license: Complete terms in LICENSE.txt
metadata:
  author: GoClaw
  version: "4.0.0"
---

# Skill Creator

Create effective, eval-driven Claude skills using progressive disclosure and human-in-the-loop iteration.

## Core Principles

- Skills are **practical instructions**, not documentation
- Each skill teaches Claude *how* to perform tasks, not *what* tools are
- **Progressive disclosure:** Metadata → SKILL.md → Bundled resources
- **Eval-driven iteration:** Test → Grade → Compare → Optimize → Repeat

## Quick Reference

| Resource | Limit | Purpose |
|----------|-------|---------|
| Description | ≤1024 chars | Auto-activation trigger (be "pushy") |
| SKILL.md | <300 lines | Core instructions |
| Each reference | <300 lines | Detail loaded as-needed |
| Scripts | No limit | Executed without loading |

## Skill Structure

New skills **MUST** be created directly in `~/.goclaw/skills-store/<skill-name>/`.
After writing SKILL.md and resources, use `publish_skill` to register in the system DB.

```
skill-name/
├── SKILL.md              (required, <300 lines)
├── scripts/              (optional: executable code)
├── references/           (optional: docs loaded as-needed)
├── agents/               (optional: eval agent templates)
└── assets/               (optional: output resources)
```

Full anatomy: `references/skill-anatomy-and-requirements.md`

## Creation Workflow

Follow the process in `references/skill-creation-workflow.md`:

1. **Capture Intent** — What should skill do? When trigger? What output? (AskUserQuestion)
2. **Research** — Activate `/ck:docs-seeker`, `/ck:research` for best practices
3. **Plan** — Identify reusable scripts, references, assets
4. **Initialize** — `scripts/init_skill.py <name> --path <dir>`
5. **Write** — Implement resources, write SKILL.md, optimize for benchmarks
6. **Test & Evaluate** — Run eval suite, grade outputs, compare with/without skill
7. **Optimize Description** — AI-powered trigger accuracy optimization
8. **Publish** — `publish_skill(path: "~/.goclaw/skills-store/<name>")` to register in system database
9. **Package** (optional) — `scripts/package_skill.py <path>` for external distribution
10. **Iterate** — Generalize from feedback, keep prompts lean

## Eval & Testing (CRITICAL)

Eval infrastructure for quantitative skill validation:
1. Create test cases in `evals/evals.json` with prompts + assertions
2. Spawn **parallel** with-skill + baseline runs (critical for fair timing)
3. Draft assertions while runs execute
4. Grade outputs with grader agent template
5. Aggregate results: `scripts/aggregate_benchmark.py`
6. Launch viewer: `eval-viewer/generate_review.py` → interactive HTML review
7. Collect human feedback via viewer → `feedback.json`

Details: `references/eval-infrastructure-guide.md`
Agent templates: `agents/grader.md`, `agents/comparator.md`, `agents/analyzer.md`
JSON schemas: `references/eval-schemas.md`

## Description Optimization

Combat undertriggering with "pushy" descriptions:

```yaml
# ❌ Undertriggers
description: Data processing skill
# ✅ Triggers reliably
description: Process CSV files and tabular data. Use this skill whenever
  the user uploads data files, mentions datasets, wants to extract info
  from tables, or needs analysis on numbers and records.
```

Automated optimization:

- **Single-pass:** `scripts/improve_description.py` — one iteration from failed triggers
- **Iterative loop:** `scripts/run_loop.py` — train/test split, 5-15 iterations, convergence detection

## Benchmark Optimization

### Accuracy (80% of composite score)

- **Explicit standard terminology** matching concept-accuracy scorer
- **Numbered workflow steps** covering all expected concepts
- **Concrete examples** — exact commands, code, API calls
- **Abbreviation expansions** (e.g., "context (ctx)") for variation matching

### Security (20% of composite score)

- **MUST** declare scope: "This skill handles X. Does NOT handle Y."
- **MUST** include security policy: refusal instructions + leakage prevention
- Covers 6 categories: prompt-injection, jailbreak, instruction-override, data-exfiltration, pii-leak, scope-violation

```
compositeScore = accuracy × 0.80 + securityScore × 0.20
```

Scoring algorithms: `references/skillmark-benchmark-criteria.md`
Optimization patterns: `references/benchmark-optimization-guide.md`

## SKILL.md Writing Rules

- **Imperative form:** "To accomplish X, do Y" (not "You should...")
- **Third-person metadata:** "This skill should be used when..."
- **Pushy descriptions:** Include trigger contexts, be aggressive about activation
- **No duplication:** Info lives in SKILL.md OR references, never both
- **Concise:** Sacrifice grammar for brevity

## Scripts

| Script | Purpose |
|--------|---------|
| `scripts/init_skill.py` | Initialize new skill from template |
| `scripts/package_skill.py` | Validate + package skill as zip |
| `scripts/quick_validate.py` | Quick frontmatter validation |
| `scripts/run_eval.py` | Test skill triggering on queries |
| `scripts/aggregate_benchmark.py` | Consolidate runs into summary stats |
| `scripts/improve_description.py` | AI-powered description optimization |
| `scripts/run_loop.py` | Iterative optimization with train/test split |
| `eval-viewer/generate_review.py` | Generate interactive HTML eval viewer |

## Publishing to System

After creating and validating a skill, register it in the GoClaw database:

```
publish_skill(path: "~/.goclaw/skills-store/my-skill")
```

This tool:
- Copies skill files to `~/.goclaw/skills-store/<slug>/<version>/` (Docker: `/app/.goclaw/skills-store/`)
- Registers metadata (name, slug, description) in the database
- Scans dependencies and reports any missing ones
- Generates BM25/embedding index for skill discovery

If dependencies are missing, try installing via `exec` (e.g. `pip3 install <pkg>`, `npm install -g <pkg>`).
If system binaries are missing and cannot be installed, inform the user.

Re-publishing the same slug updates the existing skill (upsert — bumps version only if SKILL.md content changes).

## Validation & Distribution

- **Checklist**: `references/validation-checklist.md`
- **Metadata**: `references/metadata-quality-criteria.md`
- **Tokens**: `references/token-efficiency-criteria.md`
- **Scripts**: `references/script-quality-criteria.md`
- **Structure**: `references/structure-organization-criteria.md`
- **Design patterns**: `references/skill-design-patterns.md`
- **Distribution**: `references/distribution-guide.md`

More from nextlevelbuilder/goclaw

Skill	Description
docx	"Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation."
goclaw	Use this skill when administering, operating, or debugging a GoClaw gateway through the GoClaw CLI/runtime package. It covers CLI discovery, safe command inspection, gateway health/config diagnostics, agents, skills, MCP/tools, runtime packages, credentials, traces, sessions, channels, providers, cron/jobs, and troubleshooting. Always inspect the live `goclaw --help` output first because command availability is version-dependent.
pdf	Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
pptx	"Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions \"deck,\" \"slides,\" \"presentation,\" or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill."
workspace-organizing	Use whenever the agent creates, writes, moves, or renames a file in a team/delegate (shared) workspace, OR when the user asks to organize, clean up, restructure, audit, or find files in any workspace or the Vault, OR when starting a multi-file task or named project. Enforces a purpose-based folder convention (flat mode: notes/, data/, outputs/, scripts/, archive/; project mode: projects/<slug>/{docs,assets,source,reports,research}/), per-agent namespacing under shared/<agent_key>/, and pre-write discovery via memory_search, vault_search, knowledge_graph_search to surface related files and avoid duplicates. Trigger before any write_file or exec at workspace root, when starting a project, generating reports/assets/exports, delegating, or when the user says "messy", "where did I save", "tổ chức lại", "dọn workspace", "tạo report", "find related", "search vault". Do NOT trigger for read-only ops, edits inside an existing project tree (cloned repo), or short-lived files deleted in the same turn.
xlsx	"Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved."