regression-search

$npx mdskill add sonichi/sutando/regression-search

Automates regression detection in phone-call data using keyword analysis

  • Identifies when a feature stopped working by analyzing call transcripts
  • Uses find-regression.py and diagnose-call.py scripts with call data from JSONL files
  • Classifies calls as working or broken based on refusal/error patterns and timestamps
  • Outputs sorted timelines, call metrics, and diagnostic snippets for investigation

SKILL.md

.github/skills/regression-searchView on GitHub ↗
---
name: regression-search
description: "Search phone-call history for when a feature regressed (find-regression.py) and drill into a single call to see what went wrong (diagnose-call.py). Skips reading 100+ transcripts by hand."
---

# Regression Search

Two scripts for hunting down bad calls without reading every transcript:

1. **`find-regression.py`** — search `results/calls/calls.jsonl` for calls touching a feature, classify each as working/broken, print a sorted timeline.
2. **`diagnose-call.py`** — drill into a single call by SID, report refusals/errors/silences/repeated requests, optionally show metrics from `data/call-metrics.jsonl`.

Closes [#188](https://github.com/sonichi/sutando/issues/188).

## When to use

- "When did the X feature stop working?" — pass the feature keyword.
- "Has feature Y improved?" — see the broken/working trend over time.
- Before shipping a fix — sanity check that the regression is reproducible.

## Usage

```bash
python3 skills/regression-search/scripts/find-regression.py "record"
python3 skills/regression-search/scripts/find-regression.py "summon" --since 2026-04-01
python3 skills/regression-search/scripts/find-regression.py "play" --json
```

Flags:
- `--since YYYY-MM-DD` — only show calls on/after this date
- `--json` — machine-readable output
- `--show-snippet` — print a one-line transcript snippet for each call

## Heuristics

A call is **broken** for a query if any of:
- Sutando refuses ("I can't", "I'm not able", "I'm unable", "sorry I cannot")
- Sutando reports an error ("error", "failed", "didn't work", "something went wrong")
- The user repeats the same request 2+ times in a row (Sutando didn't respond usefully)
- Sutando says "(Silence)" after the user mentions the feature

Otherwise the call is **working** if Sutando's response includes the feature keyword and isn't flagged broken.

These are intentionally crude — the goal is "good enough to find the regression window without reading 163 transcripts." Tune as you find false positives.

## Limitations

- Keyword matching only. "recording doesn't stop" vs "recording won't start" both match `record`. The issue calls this out as future work.
- No semantic understanding. A call where Sutando talks about recording but the user wanted something else still matches.
- Doesn't correlate with git commits — manual step for now.

## diagnose-call.py

```bash
python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2
python3 skills/regression-search/scripts/diagnose-call.py CA701fc4129779... --metrics
python3 skills/regression-search/scripts/diagnose-call.py de1f04733fc2 --json
```

Accepts a full SID or just the last 12 characters. Reports turn counts, refusals, errors, silences, repeated user requests, and the ending style (normal vs abrupt user end vs sutando silence). With `--metrics`, also pulls per-event tool-call timeline from `data/call-metrics.jsonl` (requires PR #223). Exit code 1 if any issues are found, 0 if clean — useful for CI.

Typical workflow: run `find-regression.py` to surface broken candidates, then `diagnose-call.py <sid>` to drill into the worst one.

## Future work

- Auto-correlate regression windows with git log
- Smarter NLP-based query matching (query: "recording doesn't stop" vs "recording won't start")

More from sonichi/sutando

SkillDescription
agent-registryLocal Agent Registry — a standalone, dependency-free service that tracks running Claude Code (and other) agent instances. Agents self-register on startup and heartbeat while alive; the Electron overlay and Sutando dashboard read the live list. Use when you need to know which coding agents are running, where, and since when.
bot2bot-postPost a coordination message from this bot to the shared bot2bot channel, @-mentioning the other Sutando node.
claude-codexBash wrapper around the local Codex CLI for non-interactive runs from inside Sutando (bridges, cron, scripts). For interactive code review or task hand-off from this Claude Code session, prefer the official `/codex:*` plugin commands; this skill is the file-bridge-compatible path that `discord-bridge.py` invokes for team-tier sandboxed delegation.
claude-geminiUse the local Gemini CLI from Claude Code with the user's existing Gemini authentication or API configuration. Use for large-context repo scans, multimodal analysis, second-opinion planning, or structured Gemini runs in the current workspace.
claude-routerChoose between the local Codex CLI and Gemini CLI from Claude Code. Use for automatic model selection when the user wants the best local delegate for code review, repo-wide analysis, planning, or implementation.
cross-node-syncRsync-over-ssh sync between Sutando nodes (Mac Studio and MacBook) for shared memory + notes. Optional — core runs fine without it; enables automatic cross-bot learning and note propagation by running from the proactive-loop cron on each pass.
deal-finderScan configured sources (Craigslist now; eBay + Facebook Marketplace planned) for used-item listings matching the owner's criteria. Currently configured for a Mac mini search (M2+, 16GB+, 512GB+, ≤$500, near 94566). Notify owner via SMS + Telegram on a match.
electron-overlay-dimmingReusable pattern for focus-based auto-dimming of Electron overlay windows — when the app loses focus, all overlay windows fade to a low opacity; when an overlay regains focus, they return to their configured opacity. Use when building always-on-top Electron overlays that should recede while the user works in other apps.
gemini-ttsRender text to mp3 via Google Gemini Flash TTS. Free-tier eligible (1500 req/day). Use for video narration, demo voiceovers, audio notes. Parallels openai-tts; default for make-viral-video.
macos-toolsmacOS native integrations: screen capture, calendar, reminders, contacts, email (Mail.app), Spotlight search. Use when the user asks about their screen, schedule, to-do list, contacts, or wants to send email on macOS.