vellum-browser-use
$
npx mdskill add vellum-ai/vellum-assistant/vellum-browser-useUse this skill to browse the web. All browser operations are executed through the `assistant browser` CLI, invoked via `bash` or `host_bash`. Each operation is a subcommand:
SKILL.md
.github/skills/vellum-browser-useView on GitHub ↗
---
name: vellum-browser-use
description: Browse the web using `assistant browser` CLI commands
compatibility: "Designed for Vellum personal assistants"
metadata:
emoji: "🌐"
vellum:
category: "browsing"
display-name: "Browser"
activation-hints:
- "Load first if you need to browse the web (navigating, clicking, extracting web content) via `assistant browser` commands"
---
Use this skill to browse the web. All browser operations are executed through the `assistant browser` CLI, invoked via `bash` or `host_bash`. Each operation is a subcommand:
| Command | Description |
| ------------------------------------- | -------------------------------------------------- |
| `assistant browser navigate` | Navigate to a URL |
| `assistant browser snapshot` | List interactive elements on the current page |
| `assistant browser screenshot` | Take a visual screenshot |
| `assistant browser click` | Click an element |
| `assistant browser type` | Type text into an input |
| `assistant browser press-key` | Press a keyboard key |
| `assistant browser scroll` | Scroll the page or a specific element |
| `assistant browser select-option` | Select an option from a native `<select>` element |
| `assistant browser hover` | Hover over an element to reveal menus/tooltips |
| `assistant browser wait-for` | Wait for a condition |
| `assistant browser extract` | Extract page text content |
| `assistant browser wait-for-download` | Wait for a file download to complete |
| `assistant browser fill-credential` | Fill a stored credential into a form field |
| `assistant browser attach` | Attach the Chrome debugger to the active tab |
| `assistant browser detach` | Detach the Chrome debugger from the active tab |
| `assistant browser close` | Close the browser page |
| `assistant browser status` | Diagnose browser backend readiness and setup steps |
## Getting Started — Check Browser Readiness
Before using any browser commands, run `assistant browser --json status` first to check which browser backends are available. The status command returns JSON with readiness information for each backend mode:
```bash
assistant browser --json status
```
The response includes:
- `recommendedMode` — the best available backend (use this)
- `modes[]` — per-mode status with `available`, `summary`, and `userActions` (remediation steps)
## Browser Modes
Use `--browser-mode <mode>` on the `assistant browser` parent command to pin the browser backend:
| Value | Backend | Description |
| ------------- | ---------------- | --------------------------------------------------------------------- |
| `auto` | Automatic | Default. Picks the best available backend based on context. |
| `extension` | Chrome extension | Routes through the user's Chrome browser via the extension debugger. |
| `cdp-inspect` | CDP inspect | Connects to an already-running Chrome instance via DevTools Protocol. |
| `local` | Playwright | Drives a dedicated Playwright-managed Chromium instance. |
```bash
assistant browser --browser-mode extension navigate --url http://www.example.com
```
### Prefer the Chrome Extension
The **Chrome extension** (`extension` mode) is the preferred browser backend. It is:
- More secure than Chrome's native remote debugging
- Uses the user's real browser profile (cookies, sessions, saved logins)
- Best experience for interacting with the user's actual browsing context
If the status check shows the extension is **not available**, encourage the user to install and pair it:
1. Install the **Vellum Assistant Chrome Extension** from the Chrome Web Store: https://chromewebstore.google.com/detail/vellum-assistant-browser/hphbdmpffeigpcdjkckleobjmhhokpne
2. Open the extension in Chrome and pair it with the assistant.
The status response's `userActions` array for the `extension` mode provides these same steps when the extension is not connected.
### Fallback Modes
If the user declines to install the extension:
- **`cdp-inspect`** — Connects to an already-running Chrome instance via DevTools Protocol (Chrome 146+). Requires enabling remote debugging in Chrome settings.
- **`local`** — Drives a dedicated Playwright-managed Chromium instance. Last resort — does not use the user's browser profile.
Only fall back to these if the user explicitly indicates they do not want to install the extension. Prefer `cdp-inspect` over `local`.
## Targeting a Specific Client
When multiple clients support `host_browser` (e.g. two Chrome profiles, a macOS client and a Chrome extension), use `--target-client-id <id>` on the `assistant browser` parent command to pin all operations in the invocation to one specific client:
```bash
assistant browser --target-client-id <client-id> navigate --url https://example.com
```
Obtain client IDs from:
```bash
assistant clients list --capability host_browser
```
Omit `--target-client-id` when only one client is connected — the default interface-preference order (`chrome-extension` first, then `macos`) picks the best available client automatically.
## Session Management
Use `--session <id>` on the `assistant browser` parent command to group sequential operations so they share browser state (same page, cookies, etc.). Different session IDs create independent browser contexts.
```bash
assistant browser --session myflow navigate --url https://example.com
assistant browser --session myflow snapshot
assistant browser --session myflow click --element-id e3
```
Omitting `--session` uses the `default` session.
## Machine-Readable Output
Use `--json` on the `assistant browser` parent command to get structured JSON output suitable for parsing in scripts:
```bash
assistant browser --json navigate --url https://example.com
# {"ok":true,"content":"Page title: Example Domain"}
assistant browser --json snapshot
# {"ok":true,"content":"...element list..."}
assistant browser --json screenshot
# {"ok":true,"content":"...","screenshots":[{"mediaType":"image/jpeg","data":"<base64>"}]}
```
Error responses use `{"ok":false,"error":"..."}`.
## Screenshots
To save a screenshot to disk, use `--output <path>`:
```bash
assistant browser screenshot --output page.jpg
assistant browser screenshot --full-page --output full.jpg
```
To receive base64 screenshot data in JSON output:
```bash
assistant browser --json screenshot
```
The response includes a `screenshots` array with `mediaType` and `data` (base64) fields.
## Typical Workflow
1. `assistant browser --json status` to check backend readiness — if the extension is not available, help the user install it
2. (Optional) `assistant browser attach` to establish the session
3. `assistant browser navigate --url <url>` to load a page
4. `assistant browser snapshot` to discover interactive elements
5. Use `click`, `type`, `press-key`, `scroll`, `select-option`, or `hover` to interact
6. `assistant browser extract` or `assistant browser screenshot --output <path>` to capture results
7. **Always** `assistant browser detach` when you are done — this releases the debugger so the user can browse freely
## Interaction Strategies
**Date pickers / calendars:** Click the date input to open the picker, re-snapshot to see calendar controls, click month navigation arrows to reach the target month, then click the target date. For `<input type="date">`, use `type` with `YYYY-MM-DD` format.
**Native `<select>` elements:** Use `select-option` with `--value`, `--label`, or `--index`. Do not try to click individual `<option>` elements.
**ARIA / custom dropdowns:** Click to open, take a new `snapshot`, then click the desired option by `--element-id`.
**Autocomplete inputs:** Type the search text, wait 500-1000ms (`wait-for --duration`), re-snapshot for suggestions, then click the suggestion or use `press-key --key ArrowDown` + `press-key --key Enter`.
**Multi-step forms:** Complete each step, wait for the next section to load, re-snapshot to discover new elements, then proceed.
**Dynamic content:** After interactions that change the page, use `wait-for` (with `--selector` or `--text`) or re-snapshot to see updated elements before continuing.
**Scrolling:** Use `scroll --direction down` to reveal below-the-fold content before snapshotting. Long pages may require multiple scrolls.
**Hover menus / tooltips:** Use `hover` to reveal hidden menus or tooltips, then re-snapshot to see newly revealed elements.
## Verification
After critical actions (form submission, booking confirmation, checkout), take a screenshot and then read the saved image to visually verify results before reporting success to the user:
```bash
assistant browser screenshot --output /tmp/verify.jpg
```
Then read the saved image to inspect it before reporting success. Use `file_read` if the screenshot was taken via `bash`, or `host_file_read` if it was taken via `host_bash`.