examples-auto-run
$
npx mdskill add openai/openai-agents-python/examples-auto-runExecute Python examples automatically with logging and rerun support.
- Enables agents to run example scripts without manual intervention.
- Integrates with litellm, sqlalchemy, redis, and temporal services.
- Uses auto-input mode to approve actions and generate failure lists.
- Delivers logs and status updates through dedicated shell helpers.
SKILL.md
.github/skills/examples-auto-runView on GitHub ↗
---
name: examples-auto-run
description: Run python examples in auto mode with logging, rerun helpers, and background control.
---
# examples-auto-run
## What it does
- Runs `uv run examples/run_examples.py` with:
- Optional dependency extras enabled by default:
`litellm`, `any-llm`, `sqlalchemy`, `redis`, `blaxel`, `modal`, `runloop`, and `temporal`.
- `EXAMPLES_INTERACTIVE_MODE=auto` (auto-input/auto-approve).
- Per-example logs under `.tmp/examples-start-logs/`.
- Main summary log path passed via `--main-log` (also under `.tmp/examples-start-logs/`).
- Generates a rerun list of failures at `.tmp/examples-rerun.txt` when `--write-rerun` is set.
- Provides start/stop/status/logs/tail/collect/rerun helpers via `run.sh`.
- Background option keeps the process running with a pidfile; `stop` cleans it up.
## Usage
```bash
# Start (auto mode; interactive included by default)
.agents/skills/examples-auto-run/scripts/run.sh start [extra args to run_examples.py]
# Examples:
.agents/skills/examples-auto-run/scripts/run.sh start --filter basic
.agents/skills/examples-auto-run/scripts/run.sh start --include-server --include-audio
# Check status
.agents/skills/examples-auto-run/scripts/run.sh status
# Stop running job
.agents/skills/examples-auto-run/scripts/run.sh stop
# List logs
.agents/skills/examples-auto-run/scripts/run.sh logs
# Tail latest log (or specify one)
.agents/skills/examples-auto-run/scripts/run.sh tail
.agents/skills/examples-auto-run/scripts/run.sh tail main_20260113-123000.log
# Collect rerun list from a main log (defaults to latest main_*.log)
.agents/skills/examples-auto-run/scripts/run.sh collect
# Rerun only failed entries from rerun file (auto mode)
.agents/skills/examples-auto-run/scripts/run.sh rerun
```
## Codex execution environment
- When Codex invokes `start` or `rerun`, run the command outside the Codex sandbox by default
(`sandbox_permissions=require_escalated`). Many examples start nested sandboxes, browsers, npm
helpers, or cloud/local service processes; running from inside the Codex sandbox can produce
environment-only failures such as `sandbox-exec: sandbox_apply: Operation not permitted`,
Playwright cache permission errors, or npm cache permission errors.
- Use sandboxed execution only when the user explicitly asks for it or when running a narrow dry-run
/ log inspection command that does not execute examples.
## Defaults (overridable via env)
- `EXAMPLES_INTERACTIVE_MODE=auto`
- `EXAMPLES_INCLUDE_INTERACTIVE=1`
- `EXAMPLES_INCLUDE_SERVER=0`
- `EXAMPLES_INCLUDE_AUDIO=0`
- `EXAMPLES_INCLUDE_EXTERNAL=0`
- `EXAMPLES_UV_EXTRAS="litellm any-llm sqlalchemy redis blaxel modal runloop temporal"` (set to an empty string to disable extras)
- Auto-approvals in auto mode: `APPLY_PATCH_AUTO_APPROVE=1`, `SHELL_AUTO_APPROVE=1`, `AUTO_APPROVE_MCP=1`
## Log locations
- Main logs: `.tmp/examples-start-logs/main_*.log`
- Per-example logs (from `run_examples.py`): `.tmp/examples-start-logs/<module_path>.log`
- Rerun list: `.tmp/examples-rerun.txt`
- Stdout logs: `.tmp/examples-start-logs/stdout_*.log`
## Notes
- The runner delegates to `uv run --extra ... examples/run_examples.py`, which already writes per-example logs and supports `--collect`, `--rerun-file`, and `--print-auto-skip`.
- `start` uses `--write-rerun` so failures are captured automatically.
- If `.tmp/examples-rerun.txt` exists and is non-empty, invoking the skill with no args runs `rerun` by default.
## Behavioral validation (Codex/LLM responsibility)
The runner does not perform any automated behavioral validation. After every foreground `start` or `rerun`, **Codex must manually validate** all exit-0 entries:
1. Read the example source (and comments) to infer intended flow, tools used, and expected key outputs.
2. Open the matching per-example log under `.tmp/examples-start-logs/`.
3. Confirm the intended actions/results occurred; flag omissions or divergences.
4. Do this for **all passed examples**, not just a sample.
5. Report immediately after the run with concise citations to the exact log lines that justify the validation.