web-scraper

Name: web-scraper
Author: guia-matthieu/clawfu-skills

$npx mdskill add guia-matthieu/clawfu-skills/web-scraper

Extracts structured data from websites for competitor research, lead generation, and content audits using BeautifulSoup and requests.

Helps with collecting pricing, product listings, contact information, and monitoring website changes.
Integrates with BeautifulSoup, requests, pandas, click, and lxml for web scraping and data processing.
Uses analysis frameworks to structure data and identify opportunities based on user-defined strategic priorities.
Presents results as usable structured data, such as extracted elements or links, for further agent processing.

SKILL.md

.github/skills/web-scraperView on GitHub ↗

---
name: web-scraper
description: "Extract structured data from websites. Use when: collecting competitor pricing; scraping product listings; extracting contact information; gathering research data; monitoring website changes"
license: MIT
metadata:
  author: ClawFu
  version: 1.0.0
  mcp-server: "@clawfu/mcp-skills"
---

# Web Scraper

> Extract structured data from websites using BeautifulSoup and requests - turn any webpage into usable data.

## When to Use This Skill

- **Competitor research** - Scrape pricing, features, positioning
- **Lead generation** - Extract contact info from directories
- **Content audit** - Pull headings, links, meta data
- **Price monitoring** - Track competitor pricing changes
- **Data collection** - Gather research data from multiple sources


## What Claude Does vs What You Decide

| Claude Does | You Decide |
|-------------|------------|
| Structures analysis frameworks | Strategic priorities |
| Synthesizes market data | Competitive positioning |
| Identifies opportunities | Resource allocation |
| Creates strategic options | Final strategy selection |
| Suggests implementation approaches | Execution decisions |

## Dependencies

```bash
pip install beautifulsoup4 requests pandas click lxml
```

## Commands

### Scrape Elements
```bash
python scripts/main.py scrape https://example.com --selector "h1,h2,p"
python scripts/main.py scrape https://example.com --selector ".product-price"
```

### Extract Links
```bash
python scripts/main.py links https://example.com
python scripts/main.py links https://example.com --internal-only
```

### Extract Emails
```bash
python scripts/main.py emails https://example.com
python scripts/main.py emails https://example.com --depth 2
```

### Extract Structured Data
```bash
python scripts/main.py structured https://example.com/article --schema article
python scripts/main.py structured https://example.com/product --schema product
```

## Examples

### Example 1: Scrape Competitor Pricing
```bash
python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"

# Output:
# Extracted 6 elements
# 1. Starter - $29/mo
# 2. Pro - $99/mo
# 3. Enterprise - Contact us
```

### Example 2: Extract Article Content
```bash
python scripts/main.py structured https://blog.example.com/post --schema article

# Output: article_data.json
# {
#   "title": "How to Scale Your Startup",
#   "author": "Jane Doe",
#   "date": "2024-01-15",
#   "content": "...",
#   "word_count": 1523
# }
```

## CSS Selector Reference

| Selector | Description | Example |
|----------|-------------|---------|
| `tag` | Element type | `h1`, `p`, `div` |
| `.class` | Class name | `.price`, `.title` |
| `#id` | Element ID | `#main-content` |
| `tag.class` | Tag with class | `div.product` |
| `tag[attr]` | Has attribute | `a[href]` |
| `parent > child` | Direct child | `ul > li` |
| `tag1, tag2` | Multiple | `h1, h2, h3` |

## Ethical Scraping Guidelines

1. **Check robots.txt** - Respect site's scraping policy
2. **Rate limit** - Don't overload servers (1-2 req/sec)
3. **Identify yourself** - Use descriptive User-Agent
4. **Cache requests** - Don't re-scrape unchanged pages
5. **Terms of Service** - Check if scraping is allowed

## Skill Boundaries

### What This Skill Does Well
- Structuring strategic analysis
- Identifying market opportunities
- Creating strategic frameworks
- Synthesizing competitive data

### What This Skill Cannot Do
- Replace market research
- Guarantee strategic success
- Know proprietary competitor info
- Make executive decisions

## Related Skills

- [competitor-monitor](../competitor-monitor/) - Monitor competitor changes
- [pdf-extractor](../pdf-extractor/) - Extract from PDFs

## Skill Metadata


- **Mode**: centaur
```yaml
category: automation
subcategory: data-extraction
dependencies: [beautifulsoup4, requests, pandas]
difficulty: intermediate
time_saved: 5+ hours/week
```

More from guia-matthieu/clawfu-skills

Skill	Description
aarrr-metrics	Measure and optimize growth using the AARRR (Pirate Metrics) framework with stage-specific KPIs and funnel analysis
ab-test-stats	"Calculate A/B test statistical significance. Use when: determining if test results are significant; calculating required sample size; estimating test duration; analyzing conversion experiments; making data-driven decisions"
account-health	Assess customer account health using product usage, support sentiment, payment status, and relationship signals
ad-spend-optimizer	"Analyze paid advertising performance across channels and recommend budget reallocation to maximize ROAS and minimize CAC. Use when: planning quarterly ad budget allocation, diagnosing underperforming ad channels, deciding whether to scale spend on a channel, calculating marginal ROI across Google Ads, Meta, LinkedIn, or TikTok, rebalancing media mix after performance shifts, or setting up a test-and-scale framework for new channels."
ai-bot-log-audit	Use when analyzing server logs to understand how AI crawlers (GPTBot, ClaudeBot, PerplexityBot) interact with your site. Use when optimizing content placement for LLM retrieval, diagnosing why AI search isn't citing your content, or auditing crawl patterns to find optimization gaps.
ai-storyboard-2x2	"Créez des storyboards visuellement cohérents en utilisant la technique des 2x2 Grid Shots de PJ Ace, garantissant éclairage, personnages et décors uniformes entre les plans. Use when: Après avoir finalisé un script vidéo - Transformer le concept en visuels; Besoin de cohérence visuelle - Personnages et éclairage constants entre les plans; Préparer des assets pour animation - Frames prêtes pour Veo, Runway, Kling; Présenter un storyboard client - Visualisation avant production;..."
ai-video-concept	"Développez une idée créative et structurez un script vidéo optimisé pour la génération IA, en suivant la méthode des scènes de 8 secondes de PJ Ace. Use when: Démarrer une publicité vidéo IA - Transformer une idée brute en script structuré; Créer du contenu vidéo pour les réseaux sociaux - TikTok, Reels, YouTube Shorts; Développer un concept de campagne - Avant de passer au storyboard; Pitcher une idée vidéo - Présenter un concept à un client ou une équipe; **Adapter un messag..."
ai-video-prompting	"Générez des prompts optimisés pour chaque modèle de génération vidéo IA (Veo 3, Runway Gen-3, Kling 2.6, Pika), en exploitant leurs forces spécifiques. Use when: Animer des frames de storyboard - Transformer des images fixes en vidéo; Choisir le bon modèle - Sélectionner Veo, Runway, Kling ou Pika selon le besoin; Optimiser la qualité de génération - Prompts structurés pour meilleurs résultats; Créer des transitions fluides - Scene extension, first/last frame; **Utiliser le mo..."
ai-video-qa	"Validez la qualité de vos vidéos IA avant publication avec une checklist complète couvrant technique, créatif, et positionnement marque. Use when: Avant publication - Dernière validation avant mise en ligne; Revue client - Préparer les points de feedback anticipés; Itération qualité - Identifier les problèmes à corriger; Go/No-Go decision - Décider si la vidéo est prête; Post-mortem - Analyser pourquoi une vidéo a (ou n'a pas) performé"
ai-voice-design	"Concevez et générez des voix IA pour vos vidéos en utilisant ElevenLabs ou Qwen3-TTS, avec clonage vocal, design par description, et synchronisation lip-sync. Use when: Créer une voix de marque - Définir le ton vocal pour une campagne; Cloner une voix existante - Reproduire une voix avec autorisation; Designer une voix originale - Créer une voix à partir d'une description; Multi-personnages - Gérer plusieurs voix dans une même vidéo; Lip-sync vidéo IA - Synchroniser voix e..."