supadata

Name: supadata
Author: vm0-ai/vm0-skills

$npx mdskill add vm0-ai/vm0-skills/supadata

Fetch YouTube transcripts and web data via Supadata API.

Extract video content and channel statistics from YouTube.
Integrates with Supadata API using x-api-key authentication.
Accepts video URLs and text preferences to customize output.
Returns plain text or timestamped JSON chunks for analysis.

SKILL.md

.github/skills/supadataView on GitHub ↗

---
name: supadata
description: Supadata API for YouTube/web data. Use when user mentions "Supadata",
  "YouTube data", "channel stats", or web scraping data.
---

## Troubleshooting

If requests fail, run `zero doctor check-connector --env-name SUPADATA_TOKEN` or `zero doctor check-connector --url https://api.supadata.ai/v1/transcript --method POST`

## How to Use

All examples below assume you have `SUPADATA_TOKEN` set.

The base URL for the API is:

- `https://api.supadata.ai/v1`

Authentication uses the `x-api-key` header.

### 1. Get YouTube Video Transcript

Extract transcript from a YouTube video:

Write to `/tmp/supadata_url.txt`:

```
https://www.youtube.com/watch?v=dQw4w9WgXcQ
```

```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"
```

**Parameters:**

- `url`: Video URL (required)
- `text`: Return plain text (`true`) or timestamped chunks (`false`, default)
- `lang`: Preferred language (ISO 639-1 code, e.g., `en`, `zh`)
- `mode`: `native` (existing only), `generate` (AI), `auto` (default)

### 2. Get Transcript with Timestamps

Get transcript with timing information:

```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=false" | jq '.content[:3]'
```

Response format:
```json
{
  "content": [
  {"text": "Hello", "offset": 0, "duration": 1500, "lang": "en"}
  ],
  "lang": "en",
  "availableLangs": ["en", "es", "zh"]
}
```

### 3. Get TikTok/Instagram/X Transcript

Extract transcript from other platforms:

```bash
# TikTok
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"

# Instagram Reel
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"
```

Supported platforms: YouTube, TikTok, Instagram, X (Twitter), Facebook

### 4. Native Transcript Only (Save Credits)

Fetch only existing transcripts without AI generation:

```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true" -d "mode=native"
```

Use `mode=native` to avoid AI generation costs (1 credit vs 2 credits/min).

### 5. Get YouTube Channel Metadata

Get channel information:

```bash
curl -s "https://api.supadata.ai/v1/youtube/channel" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "id=@mkbhd" | jq '{name, subscriberCount, videoCount}
```

Accepts channel URL, channel ID, or handle (e.g., `@mkbhd`).

### 6. Get YouTube Video Metadata

Get video information:

```bash
curl -s "https://api.supadata.ai/v1/youtube/video" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" | jq '{title, viewCount, likeCount, duration}
```

### 7. Get Social Media Metadata

Get metadata from any supported platform:

```bash
curl -s "https://api.supadata.ai/v1/metadata" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt"
```

Works with YouTube, TikTok, Instagram, X, Facebook posts.

### 8. Scrape Web Page to Markdown

Extract web page content:

```bash
curl -s "https://api.supadata.ai/v1/web/scrape" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt"
```

Returns page content in Markdown format, ideal for AI processing.

### 9. Map Website Links

Get all links from a website:

```bash
curl -s "https://api.supadata.ai/v1/web/map" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" | jq '.urls[:10]'
```

### 10. Crawl Website (Async)

Start a crawl job for multiple pages.

Write to `/tmp/supadata_request.json`:

```json
{
  "url": "https://example.com",
  "maxPages": 10
}
```

Then run:

```bash
# Start crawl
JOB_ID="$(curl -s "https://api.supadata.ai/v1/web/crawl" -X POST -H "x-api-key: $SUPADATA_TOKEN" -H "Content-Type: application/json" -d @/tmp/supadata_request.json | jq -r '.jobId')"

echo "Job ID: ${JOB_ID}"

# Check status
curl -s "https://api.supadata.ai/v1/web/crawl/<your-job-id>" -H "x-api-key: $SUPADATA_TOKEN" | jq '{status, pagesCompleted}'
```

Status values: `queued`, `active`, `completed`, `failed`

### 11. Translate Transcript

Translate a YouTube transcript to another language:

```bash
curl -s "https://api.supadata.ai/v1/youtube/transcript/translate" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "lang=zh" -d "text=true"
```

## Response Handling

**Synchronous (HTTP 200):** Direct result returned.

**Asynchronous (HTTP 202):** Returns `jobId` for polling:
```json
{"jobId": "abc123"}
```

Poll the job endpoint until status is `completed`.

## Guidelines

1. **Use `mode=native` to save credits**: Only fetches existing transcripts
2. **URL encode parameters**: Use `--data-urlencode` for URLs
3. **Check available languages**: Response includes `availableLangs` array
4. **Handle async responses**: Some requests return job IDs for polling
5. **Max file size**: 1GB for direct file URLs
6. **Supported formats**: MP4, WEBM, MP3, FLAC, MPEG, M4A, OGG, WAV

More from vm0-ai/vm0-skills

Skill	Description
account-reconciliation	Perform account reconciliations comparing general ledger balances against subledgers, bank statements, or external records. Use for bank reconciliation, GL-to-subledger reconciliation, intercompany reconciliation, balance sheet reconciliation, reconciling item analysis, outstanding item aging, or clearing open items.
agentphone	Build AI phone agents with AgentPhone API. Use when the user wants to make phone calls, send/receive SMS, manage phone numbers, create voice agents, set up webhooks, or check usage — anything related to telephony, phone numbers, or voice AI.
ahrefs	Ahrefs SEO API for backlink and keyword analysis. Use when user mentions
amplitude	Amplitude product analytics API. Use when user mentions "Amplitude",
analysis-qa	Quality-check a data analysis before sharing — verify joins, aggregations, denominators, time ranges, and metric definitions. Detect pitfalls like survivorship bias, average-of-averages, join explosion, timezone mismatches, incomplete periods, and selection bias. Includes documentation templates for reproducible analyses.
anthropic-managed-agents	Anthropic Managed Agents API for programmatically creating, running, and streaming AI agents on Anthropic's cloud infrastructure. Use when the user mentions "Managed Agents", "Anthropic agent sessions", or needs to create/run/stream an Anthropic agent with tool use (bash, git, web), attach GitHub repositories, or inject secrets via Vault. Do NOT use for standard Claude Messages API — use the Claude API skill instead.
apify	Apify web scraping platform. Use when user mentions "scrape website",
asana	Asana API for tasks and projects. Use when user mentions "Asana", "asana.com",
atlassian	Atlassian API for Confluence and Jira. Use when user mentions "Confluence
attio	Attio REST API for AI-native CRM operations — manage companies, people, deals, and custom objects, plus notes, tasks, lists, and comments. Use when the user mentions "Attio", "CRM record", "create company", "add person", "list entry", "CRM note", or "CRM task".