web-browser

$npx mdskill add megalithic/dotfiles/web-browser

Automates web page interactions by connecting to a running browser with authenticated sessions via agent-browser CLI.

  • Enables agents to perform browser tasks without manual intervention, such as accessing logged-in accounts.
  • Integrates with agent-browser CLI and requires a browser running on port 9222 for remote debugging.
  • Decides actions by checking for existing tabs to avoid disrupting user work and opens new ones only when necessary.
  • Presents results through command-line outputs and maintains session continuity by reusing authenticated browser connections.

SKILL.md

.github/skills/web-browserView on GitHub ↗
---
name: web-browser
description: "Interact with web pages using agent-browser CLI. MUST run 'browser connect 9222' FIRST to use existing browser with authenticated sessions."
---

# Web Browser Skill

Browser automation using `agent-browser` CLI connected to your running browser.

## 🚨 MANDATORY FIRST STEP

**EVERY browser session MUST start with:**

```bash
browser connect 9222
```

This connects to your running browser with all authenticated sessions (Asana, Figma, GitHub, etc.).

**WITHOUT THIS STEP:**
- Commands will fail or timeout
- You'll get isolated sessions without logins
- User will have to re-authenticate everything

## ⚠️ CRITICAL REQUIREMENTS

### 1. ALWAYS connect to port 9222 FIRST

Before ANY browser operation, you MUST connect to the remote debugging port:

```bash
browser connect 9222
```

This is REQUIRED for accessing authenticated sessions. Without this step, commands will fail or create isolated sessions without your logins.

### 2. NEVER take over existing tabs

When navigating to a URL:
- First check if tab already exists: `browser tab list`
- If found, switch to it: `browser tab <index>`
- If NOT found, open a NEW tab: `browser open <url>`

**NEVER navigate an existing tab to a different URL** - this destroys the user's work/context.

## Correct workflow

```bash
# 1. ALWAYS connect first (required every session)
browser connect 9222

# 2. Check for existing tab
browser tab list

# 3a. If tab exists for your URL, switch to it
browser tab 14

# 3b. If tab doesn't exist, open NEW tab
browser open https://app.asana.com/...

# 4. Interact
browser snapshot -i
browser click @e5
```

## Check if browser is listening

```bash
lsof -i :9222 -sTCP:LISTEN
```

## Common commands

After connecting, use standard agent-browser commands:

### Navigation & tabs
```bash
browser tab list                    # List all tabs
browser tab 14                      # Switch to tab by index
browser open https://example.com    # Open URL (NEW tab)
browser back                        # Go back
browser reload                      # Reload page
```

### Inspection
```bash
browser snapshot -i                 # Get interactive elements with @refs
browser screenshot                  # Take screenshot
browser get title                   # Get page title
browser get url                     # Get current URL
browser get text @e1                # Get text of element
```

### Interaction
```bash
browser click @e1                   # Click element
browser fill @e2 "search text"      # Clear and type
browser type @e3 "append text"      # Type without clearing
browser select @e4 "option"         # Select dropdown
browser press Enter                 # Press key
browser scroll down 500             # Scroll
```

### Waiting
```bash
browser wait @e1                    # Wait for element
browser wait 2000                   # Wait milliseconds
```

## Tab targeting by URL

Instead of remembering tab numbers, find tabs by URL:

```bash
browser tab list | rg -i asana
browser tab list | rg -i localhost:4000
```

## Notes

- Tabs are numbered by CDP, not visual order in browser
- `snapshot -i` gives @refs like @e1, @e2 for clicking
- After page changes (navigation, clicks), re-run `snapshot -i`
- Your browser must be running with `--remote-debugging-port=9222`

More from megalithic/dotfiles

SkillDescription
brave-searchWeb search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
cli-toolsModern CLI tool usage (fd, rg) for fast file and content searching. Critical for Nix store searches and large codebases. Use when searching files or content, especially in /nix/store.
hsComprehensive guide for Hammerspoon development in this dotfiles repo. Covers config patterns, debugging decision trees, API reference, performance monitoring, and troubleshooting.
image-handlingImage handling for Claude API constraints (5MB max, 8000px max dimension). Use when working with images, screenshots, or MCP browser tools.
jjJujutsu (jj) version control workflow, commands, and best practices. Use when working with version control in jj-enabled repos. Covers commits, bookmarks, workspaces, and safe push patterns.
nixExpert help with Nix, nix-darwin, home-manager, flakes, and nixpkgs. Use for dotfiles configuration, package management, module development, hash fetching, debugging evaluation errors, and understanding Nix idioms and patterns.
notesExpert help with the meganote system - cross-tool note capture, daily notes, and obsidian.nvim integration. Covers Hammerspoon, Shade, nvim, and the full capture → daily note linking pipeline.
nvimComprehensive guide for Neovim configuration in this dotfiles repo. Covers plugin management, LSP debugging, treesitter, keymaps, performance, and troubleshooting decision trees.
previewDisplay code, diffs, images, and other content in a tmux pane or popup. Auto-detects nvim/megaterm for floating popups.
shadeExpert help with Shade - the native Swift note capture app. Use for debugging Shade issues, understanding IPC protocols, implementing Hammerspoon integration, nvim RPC, context gathering, and meganote workflows.