routing-calibration-loop

Name: routing-calibration-loop
Author: vllm-project/semantic-router

$npx mdskill add vllm-project/semantic-router/routing-calibration-loop

- Use when a signal, projection, decision, or maintained routing example needs to be checked against a live router apiserver - Use when a routing failure must be classified as a bad probe, bad routing policy, or bad validator rule instead of blindly patching the profile - Use when a maintainer wants the loop `eval -> update -> validate -> deploy -> eval` to be run with versioned evidence

SKILL.md

.github/skills/routing-calibration-loopView on GitHub ↗

---
name: routing-calibration-loop
category: support
description: Calibrates routing changes against a live router endpoint with executable probes, local DSL validation, versioned deploys, and structured failure review. Use when tuning signals, projections, decisions, or maintained route examples against a real apiserver.
---

# Routing Calibration Loop

## Trigger

- Use when a signal, projection, decision, or maintained routing example needs to be checked against a live router apiserver
- Use when a routing failure must be classified as a bad probe, bad routing policy, or bad validator rule instead of blindly patching the profile
- Use when a maintainer wants the loop `eval -> update -> validate -> deploy -> eval` to be run with versioned evidence

## Required Surfaces

- `harness_docs`

## Conditional Surfaces

- `harness_exec`
- `router_service_platform`
- `router_config_contract`
- `signal_runtime`
- `decision_logic`
- `algorithm_selection`
- `dsl_crd`
- `docs_examples`

## Stop Conditions

- No live router base URL is available and no local replacement environment has been chosen
- No probe manifest exists and the task cannot safely infer executable probes from maintained examples
- A deploy would change remote runtime state without capturing the current version or without a rollback path
- Local validation fails for reasons that are not yet understood or recorded

## Workflow

1. Start from executable probes, not prose examples.
- Prefer a machine-readable manifest. [`deploy/recipes/balance.probes.yaml`](../../../../../deploy/recipes/balance.probes.yaml) is the default maintained example, not the only supported target.
- The manifest should stay profile-generic: point to any owned routing YAML / DSL pair through `routing_assets`, and group probes by decision with multiple variants when robustness matters.
- Treat each probe as both a test case and a specification fragment.
2. Baseline the live router before editing policy.
- Use [`tools/agent/scripts/router_calibration_loop.py`](../../../../../tools/agent/scripts/router_calibration_loop.py) to snapshot `/config/router` and `/config/router/versions`, then run `/api/v1/eval` across the probe suite.
- Record which decision actually fired, which signals matched, and which signals were expected but absent.
3. Classify every failure under one of three buckets before changing anything.
- `query_quality`: the prompt is not a robust representative of the intended route.
- `routing_design`: the signal / projection / decision design is too broad, too narrow, or too brittle.
- `validator_quality`: the runtime behavior is reasonable but static validation is over-reporting or under-reporting.
4. Edit the canonical authoring surface locally.
- For maintained routing, edit the owned YAML / DSL asset pair instead of patching only the live server.
- Do not add narrow trigger-phrase hacks just to pass one probe.
5. Run local validation before deploying.
- Use the runner's `run` or `validate` path to execute `sr-dsl validate` against the DSL source, or against a YAML file through decompile-then-validate.
- Prefer manifest-owned assets as defaults, but allow explicit YAML / DSL overrides for any other routing profile.
- Keep validation output with the loop artifacts so validator behavior can be reviewed alongside runtime eval output.
6. Deploy durably and re-evaluate.
- Use `PUT /config/router` for versioned full-document replacement so the live router exactly matches the canonical YAML being calibrated.
- After every config update, wait for `GET /ready` to return `ready=true` before trusting `eval` results. Do not treat a successful update response as proof that router initialization has finished.
- Re-run the same probe suite after deploy and compare before / after success rate and per-probe traces.
7. Close the loop with structured reflection.
- `0. Query quality`: Is the probe semantically representative, or is it a brittle phrase trigger?
- `1. Routing design`: Are the signal, projection, and decision boundaries robust, or merely sufficient for this probe set?
- `2. Validator quality`: Do warnings or failures reflect real ambiguity, or missing static semantics?
8. If a durable architecture gap remains, update the indexed debt entry instead of leaving the mismatch only in chat or the report.

## Gotchas

- The calibration loop now deploys with `PUT /config/router`, so the calibrated YAML must be a complete router document, not a partial merge fragment.
- Do not declare success just because one crafted query passes. Probe quality is part of the task; decision-level robustness should be checked with multiple variants, not just one trigger phrase.
- If runtime eval looks correct and validation still looks wrong, assume validator semantics may need work rather than forcing a worse route design.
- If deploy succeeds but success rate regresses, capture the returned version and use the versions endpoint before continuing.

## Must Read

- [AGENTS.md](../../../../../AGENTS.md)
- [deploy/amd/README.md](../../../../../deploy/amd/README.md)
- [deploy/recipes/balance.probes.yaml](../../../../../deploy/recipes/balance.probes.yaml)
- [tools/agent/scripts/router_calibration_loop.py](../../../../../tools/agent/scripts/router_calibration_loop.py)

## Standard Commands

- `python3 tools/agent/scripts/router_calibration_loop.py eval --router-url http://<router-host>:8080 --probes <profile>.probes.yaml`
- `python3 tools/agent/scripts/router_calibration_loop.py run --router-url http://<router-host>:8080 --probes <profile>.probes.yaml`
- `python3 tools/agent/scripts/router_calibration_loop.py run --router-url http://<router-host>:8080 --probes <profile>.probes.yaml --yaml <routing>.yaml --dsl <routing>.dsl`
- `python3 tools/agent/scripts/router_calibration_loop.py deploy --router-url http://<router-host>:8080 --yaml <routing>.yaml --dsl <routing>.dsl --ready-timeout 300`
- `make agent-report ENV=amd CHANGED_FILES="deploy/recipes/balance.yaml,deploy/recipes/balance.dsl,deploy/amd/README.md"`
- `make agent-ci-gate CHANGED_FILES="tools/agent/skills/maintainer/routing-calibration/SKILL.md,tools/agent/scripts/router_calibration_loop.py,deploy/recipes/balance.probes.yaml"`

## Acceptance

- Each calibration round produces a probe report with before / after outcomes, live decision traces, and the captured deploy version when a deploy occurs
- Failures are explicitly reviewed under query quality, routing design, and validator quality instead of being patched blindly
- Maintained routing changes are validated locally before deploy and re-evaluated on the live endpoint after deploy
- The loop leaves behind executable probes or maintained examples that are stronger than the ones it started with, ideally by improving decision-level variant coverage instead of adding single-example hacks

More from vllm-project/semantic-router

Skill	Description
config-platform-change	Synchronizes config representations across router config, Python CLI schema, and dashboard config UI. Use when adding or changing a config concept that spans those surfaces or addressing config representation debt before Kubernetes-facing translation.
cross-stack-bugfix	Diagnoses and fixes bugs that span multiple layers (runtime, CLI, UI, platform, tests) requiring coordinated changes across surfaces. Use when a bug does not map cleanly to a narrower skill, the fix touches more than one surface, or changes need cross-cutting validation.
dashboard-platform-change	Modifies dashboard frontend or backend surfaces that present, configure, or manage router behavior through the console UI. Use when changing dashboard pages or components, backend handlers, console persistence, auth or session flows, or user-visible routing metadata in the dashboard.
fleet-sim-change	Modifies the fleet simulator package, API service, release wiring, or simulator-owned docs and assets as one maintained subsystem. Use when changing src/fleet-sim, simulator release workflow, or fleet-sim-owned docs and assets under website/.
harness-contract-change	Modifies the repository's agent contract including AGENTS.md, docs index, manifests, validation scripts, and contributor-facing harness wrappers. Use when updating agent documentation, changing repo manifests, editing validation scripts, modifying CI/workflow classification, or updating contributor-facing guides like README.md, CONTRIBUTING.md, or the PR template.
k8s-platform-change	Modifies Kubernetes-facing operator, CRD, deployment-profile, or DSL translation behavior for semantic-router platform integration. Use when changing operator APIs or controllers, deployment stack manifests, profile-owned platform wiring, or router-to-Kubernetes translation layers.
maintainer-issue-pr-management	Manages GitHub issue and pull-request lifecycle including creation, updates, triage labelling, and closeout metadata using canonical templates and repository taxonomy. Use when a maintainer asks to create, update, close, or triage GitHub issues or PRs, or when issue creation requires codebase analysis for scope, labels, or acceptance criteria.
maintainer-release-ops	Maintainer release and milestone operating workflow. Use when a maintainer wants to plan a release, create milestone issues, sync GitHub issue or PR state, generate a daily review brief, or manage stale PRs and backlog routing.
openclaw-vsr-bridge	Install vLLM Semantic Router in agent-safe mode, import supported OpenClaw model providers into canonical VSR config, and rewrite OpenClaw to target VSR.
plugin-end-to-end	Implements end-to-end plugin changes spanning router config, post-decision processing, optional CLI/UI exposure, and E2E test coverage. Use when adding a new plugin type, changing plugin config schema or execution semantics, updating plugin chain behavior, or modifying plugin-exposed metadata across surfaces.