routing-calibration-loop
$
npx mdskill add vllm-project/semantic-router/routing-calibration-loop- Use when a signal, projection, decision, or maintained routing example needs to be checked against a live router apiserver - Use when a routing failure must be classified as a bad probe, bad routing policy, or bad validator rule instead of blindly patching the profile - Use when a maintainer wants the loop `eval -> update -> validate -> deploy -> eval` to be run with versioned evidence
SKILL.md
.github/skills/routing-calibration-loopView on GitHub ↗
--- name: routing-calibration-loop category: support description: Calibrates routing changes against a live router endpoint with executable probes, local DSL validation, versioned deploys, and structured failure review. Use when tuning signals, projections, decisions, or maintained route examples against a real apiserver. --- # Routing Calibration Loop ## Trigger - Use when a signal, projection, decision, or maintained routing example needs to be checked against a live router apiserver - Use when a routing failure must be classified as a bad probe, bad routing policy, or bad validator rule instead of blindly patching the profile - Use when a maintainer wants the loop `eval -> update -> validate -> deploy -> eval` to be run with versioned evidence ## Required Surfaces - `harness_docs` ## Conditional Surfaces - `harness_exec` - `router_service_platform` - `router_config_contract` - `signal_runtime` - `decision_logic` - `algorithm_selection` - `dsl_crd` - `docs_examples` ## Stop Conditions - No live router base URL is available and no local replacement environment has been chosen - No probe manifest exists and the task cannot safely infer executable probes from maintained examples - A deploy would change remote runtime state without capturing the current version or without a rollback path - Local validation fails for reasons that are not yet understood or recorded ## Workflow 1. Start from executable probes, not prose examples. - Prefer a machine-readable manifest. [`deploy/recipes/balance.probes.yaml`](../../../../../deploy/recipes/balance.probes.yaml) is the default maintained example, not the only supported target. - The manifest should stay profile-generic: point to any owned routing YAML / DSL pair through `routing_assets`, and group probes by decision with multiple variants when robustness matters. - Treat each probe as both a test case and a specification fragment. 2. Baseline the live router before editing policy. - Use [`tools/agent/scripts/router_calibration_loop.py`](../../../../../tools/agent/scripts/router_calibration_loop.py) to snapshot `/config/router` and `/config/router/versions`, then run `/api/v1/eval` across the probe suite. - Record which decision actually fired, which signals matched, and which signals were expected but absent. 3. Classify every failure under one of three buckets before changing anything. - `query_quality`: the prompt is not a robust representative of the intended route. - `routing_design`: the signal / projection / decision design is too broad, too narrow, or too brittle. - `validator_quality`: the runtime behavior is reasonable but static validation is over-reporting or under-reporting. 4. Edit the canonical authoring surface locally. - For maintained routing, edit the owned YAML / DSL asset pair instead of patching only the live server. - Do not add narrow trigger-phrase hacks just to pass one probe. 5. Run local validation before deploying. - Use the runner's `run` or `validate` path to execute `sr-dsl validate` against the DSL source, or against a YAML file through decompile-then-validate. - Prefer manifest-owned assets as defaults, but allow explicit YAML / DSL overrides for any other routing profile. - Keep validation output with the loop artifacts so validator behavior can be reviewed alongside runtime eval output. 6. Deploy durably and re-evaluate. - Use `PUT /config/router` for versioned full-document replacement so the live router exactly matches the canonical YAML being calibrated. - After every config update, wait for `GET /ready` to return `ready=true` before trusting `eval` results. Do not treat a successful update response as proof that router initialization has finished. - Re-run the same probe suite after deploy and compare before / after success rate and per-probe traces. 7. Close the loop with structured reflection. - `0. Query quality`: Is the probe semantically representative, or is it a brittle phrase trigger? - `1. Routing design`: Are the signal, projection, and decision boundaries robust, or merely sufficient for this probe set? - `2. Validator quality`: Do warnings or failures reflect real ambiguity, or missing static semantics? 8. If a durable architecture gap remains, update the indexed debt entry instead of leaving the mismatch only in chat or the report. ## Gotchas - The calibration loop now deploys with `PUT /config/router`, so the calibrated YAML must be a complete router document, not a partial merge fragment. - Do not declare success just because one crafted query passes. Probe quality is part of the task; decision-level robustness should be checked with multiple variants, not just one trigger phrase. - If runtime eval looks correct and validation still looks wrong, assume validator semantics may need work rather than forcing a worse route design. - If deploy succeeds but success rate regresses, capture the returned version and use the versions endpoint before continuing. ## Must Read - [AGENTS.md](../../../../../AGENTS.md) - [deploy/amd/README.md](../../../../../deploy/amd/README.md) - [deploy/recipes/balance.probes.yaml](../../../../../deploy/recipes/balance.probes.yaml) - [tools/agent/scripts/router_calibration_loop.py](../../../../../tools/agent/scripts/router_calibration_loop.py) ## Standard Commands - `python3 tools/agent/scripts/router_calibration_loop.py eval --router-url http://<router-host>:8080 --probes <profile>.probes.yaml` - `python3 tools/agent/scripts/router_calibration_loop.py run --router-url http://<router-host>:8080 --probes <profile>.probes.yaml` - `python3 tools/agent/scripts/router_calibration_loop.py run --router-url http://<router-host>:8080 --probes <profile>.probes.yaml --yaml <routing>.yaml --dsl <routing>.dsl` - `python3 tools/agent/scripts/router_calibration_loop.py deploy --router-url http://<router-host>:8080 --yaml <routing>.yaml --dsl <routing>.dsl --ready-timeout 300` - `make agent-report ENV=amd CHANGED_FILES="deploy/recipes/balance.yaml,deploy/recipes/balance.dsl,deploy/amd/README.md"` - `make agent-ci-gate CHANGED_FILES="tools/agent/skills/maintainer/routing-calibration/SKILL.md,tools/agent/scripts/router_calibration_loop.py,deploy/recipes/balance.probes.yaml"` ## Acceptance - Each calibration round produces a probe report with before / after outcomes, live decision traces, and the captured deploy version when a deploy occurs - Failures are explicitly reviewed under query quality, routing design, and validator quality instead of being patched blindly - Maintained routing changes are validated locally before deploy and re-evaluated on the live endpoint after deploy - The loop leaves behind executable probes or maintained examples that are stronger than the ones it started with, ideally by improving decision-level variant coverage instead of adding single-example hacks