vellum-browser-use

Name: vellum-browser-use
Author: vellum-ai/vellum-assistant

$npx mdskill add vellum-ai/vellum-assistant/vellum-browser-use

Use this skill to browse the web. All browser operations are executed through the `assistant browser` CLI, invoked via `bash` or `host_bash`. Each operation is a subcommand:

SKILL.md

.github/skills/vellum-browser-useView on GitHub ↗

---
name: vellum-browser-use
description: Browse the web using `assistant browser` CLI commands
compatibility: "Designed for Vellum personal assistants"
metadata:
  emoji: "🌐"
  vellum:
    category: "browsing"
    display-name: "Browser"
    activation-hints:
      - "Load first if you need to browse the web (navigating, clicking, extracting web content) via `assistant browser` commands"
---

Use this skill to browse the web. All browser operations are executed through the `assistant browser` CLI, invoked via `bash` or `host_bash`. Each operation is a subcommand:

| Command                               | Description                                        |
| ------------------------------------- | -------------------------------------------------- |
| `assistant browser navigate`          | Navigate to a URL                                  |
| `assistant browser snapshot`          | List interactive elements on the current page      |
| `assistant browser screenshot`        | Take a visual screenshot                           |
| `assistant browser click`             | Click an element                                   |
| `assistant browser type`              | Type text into an input                            |
| `assistant browser press-key`         | Press a keyboard key                               |
| `assistant browser scroll`            | Scroll the page or a specific element              |
| `assistant browser select-option`     | Select an option from a native `<select>` element  |
| `assistant browser hover`             | Hover over an element to reveal menus/tooltips     |
| `assistant browser wait-for`          | Wait for a condition                               |
| `assistant browser extract`           | Extract page text content                          |
| `assistant browser wait-for-download` | Wait for a file download to complete               |
| `assistant browser fill-credential`   | Fill a stored credential into a form field         |
| `assistant browser attach`            | Attach the Chrome debugger to the active tab       |
| `assistant browser detach`            | Detach the Chrome debugger from the active tab     |
| `assistant browser close`             | Close the browser page                             |
| `assistant browser status`            | Diagnose browser backend readiness and setup steps |

## Getting Started — Check Browser Readiness

Before using any browser commands, run `assistant browser --json status` first to check which browser backends are available. The status command returns JSON with readiness information for each backend mode:

```bash
assistant browser --json status
```

The response includes:

- `recommendedMode` — the best available backend (use this)
- `modes[]` — per-mode status with `available`, `summary`, and `userActions` (remediation steps)

## Browser Modes

Use `--browser-mode <mode>` on the `assistant browser` parent command to pin the browser backend:

| Value         | Backend          | Description                                                           |
| ------------- | ---------------- | --------------------------------------------------------------------- |
| `auto`        | Automatic        | Default. Picks the best available backend based on context.           |
| `extension`   | Chrome extension | Routes through the user's Chrome browser via the extension debugger.  |
| `cdp-inspect` | CDP inspect      | Connects to an already-running Chrome instance via DevTools Protocol. |
| `local`       | Playwright       | Drives a dedicated Playwright-managed Chromium instance.              |

```bash
assistant browser --browser-mode extension navigate --url http://www.example.com
```

### Prefer the Chrome Extension

The **Chrome extension** (`extension` mode) is the preferred browser backend. It is:

- More secure than Chrome's native remote debugging
- Uses the user's real browser profile (cookies, sessions, saved logins)
- Best experience for interacting with the user's actual browsing context

If the status check shows the extension is **not available**, encourage the user to install and pair it:

1. Install the **Vellum Assistant Chrome Extension** from the Chrome Web Store: https://chromewebstore.google.com/detail/vellum-assistant-browser/hphbdmpffeigpcdjkckleobjmhhokpne
2. Open the extension in Chrome and pair it with the assistant.

The status response's `userActions` array for the `extension` mode provides these same steps when the extension is not connected.

### Fallback Modes

If the user declines to install the extension:

- **`cdp-inspect`** — Connects to an already-running Chrome instance via DevTools Protocol (Chrome 146+). Requires enabling remote debugging in Chrome settings.
- **`local`** — Drives a dedicated Playwright-managed Chromium instance. Last resort — does not use the user's browser profile.

Only fall back to these if the user explicitly indicates they do not want to install the extension. Prefer `cdp-inspect` over `local`.

## Targeting a Specific Client

When multiple clients support `host_browser` (e.g. two Chrome profiles, a macOS client and a Chrome extension), use `--target-client-id <id>` on the `assistant browser` parent command to pin all operations in the invocation to one specific client:

```bash
assistant browser --target-client-id <client-id> navigate --url https://example.com
```

Obtain client IDs from:

```bash
assistant clients list --capability host_browser
```

Omit `--target-client-id` when only one client is connected — the default interface-preference order (`chrome-extension` first, then `macos`) picks the best available client automatically.

## Session Management

Use `--session <id>` on the `assistant browser` parent command to group sequential operations so they share browser state (same page, cookies, etc.). Different session IDs create independent browser contexts.

```bash
assistant browser --session myflow navigate --url https://example.com
assistant browser --session myflow snapshot
assistant browser --session myflow click --element-id e3
```

Omitting `--session` uses the `default` session.

## Machine-Readable Output

Use `--json` on the `assistant browser` parent command to get structured JSON output suitable for parsing in scripts:

```bash
assistant browser --json navigate --url https://example.com
# {"ok":true,"content":"Page title: Example Domain"}

assistant browser --json snapshot
# {"ok":true,"content":"...element list..."}

assistant browser --json screenshot
# {"ok":true,"content":"...","screenshots":[{"mediaType":"image/jpeg","data":"<base64>"}]}
```

Error responses use `{"ok":false,"error":"..."}`.

## Screenshots

To save a screenshot to disk, use `--output <path>`:

```bash
assistant browser screenshot --output page.jpg
assistant browser screenshot --full-page --output full.jpg
```

To receive base64 screenshot data in JSON output:

```bash
assistant browser --json screenshot
```

The response includes a `screenshots` array with `mediaType` and `data` (base64) fields.

## Typical Workflow

1. `assistant browser --json status` to check backend readiness — if the extension is not available, help the user install it
2. (Optional) `assistant browser attach` to establish the session
3. `assistant browser navigate --url <url>` to load a page
4. `assistant browser snapshot` to discover interactive elements
5. Use `click`, `type`, `press-key`, `scroll`, `select-option`, or `hover` to interact
6. `assistant browser extract` or `assistant browser screenshot --output <path>` to capture results
7. **Always** `assistant browser detach` when you are done — this releases the debugger so the user can browse freely

## Interaction Strategies

**Date pickers / calendars:** Click the date input to open the picker, re-snapshot to see calendar controls, click month navigation arrows to reach the target month, then click the target date. For `<input type="date">`, use `type` with `YYYY-MM-DD` format.

**Native `<select>` elements:** Use `select-option` with `--value`, `--label`, or `--index`. Do not try to click individual `<option>` elements.

**ARIA / custom dropdowns:** Click to open, take a new `snapshot`, then click the desired option by `--element-id`.

**Autocomplete inputs:** Type the search text, wait 500-1000ms (`wait-for --duration`), re-snapshot for suggestions, then click the suggestion or use `press-key --key ArrowDown` + `press-key --key Enter`.

**Multi-step forms:** Complete each step, wait for the next section to load, re-snapshot to discover new elements, then proceed.

**Dynamic content:** After interactions that change the page, use `wait-for` (with `--selector` or `--text`) or re-snapshot to see updated elements before continuing.

**Scrolling:** Use `scroll --direction down` to reveal below-the-fold content before snapshotting. Long pages may require multiple scrolls.

**Hover menus / tooltips:** Use `hover` to reveal hidden menus or tooltips, then re-snapshot to see newly revealed elements.

## Verification

After critical actions (form submission, booking confirmation, checkout), take a screenshot and then read the saved image to visually verify results before reporting success to the user:

```bash
assistant browser screenshot --output /tmp/verify.jpg
```

Then read the saved image to inspect it before reporting success. Use `file_read` if the screenshot was taken via `bash`, or `host_file_read` if it was taken via `host_bash`.

More from vellum-ai/vellum-assistant

Skill	Description
acp	Spawn external coding agents via the Agent Client Protocol (ACP)
amazon	Shop on Amazon and Amazon Fresh through your browser
api-mapping	Record and analyze API surfaces of web services
app-builder	Build and edit small, personal visual tools and artifacts — dashboards, trackers, calculators, data visualizations, charts, simple landing pages, and slide decks the user wants for THEMSELVES. This is the right skill whenever the user asks to "visualize this," "make a chart," or "build an artifact" for their own use, or to edit an app they already built here. Do NOT reach for a ui_show dynamic_page to fake an artifact — build a real persistent app here. NOT for complex, multi-user, or shippable products — those go to a real project folder with a coding agent (see Scope below).
app-control	Drive a specific named macOS app via raw input bypassing the Accessibility tree
assistant-migration	Migrate from ChatGPT, Claude, OpenClaw, Hermes, Manus, and other AI assistants into Vellum by inspecting their data exports, conversation archives, files, prompts, custom instructions, memory, saved memories, tools, GPTs, workflows, integrations, and relationships, then mapping as much as safely possible into Vellum primitives. Handles single-source and multi-source migrations with a unified, deduplicated inventory.
chatgpt-import	Import conversation history from ChatGPT into Vellum
cli-discover	Discover which CLI tools are installed, their versions, and authentication status
computer-use	Control the macOS desktop
contacts	Manage contacts, communication channels, access control, and invite links