supadata
$
npx mdskill add vm0-ai/vm0-skills/supadataFetch YouTube transcripts and web data via Supadata API.
- Extract video content and channel statistics from YouTube.
- Integrates with Supadata API using x-api-key authentication.
- Accepts video URLs and text preferences to customize output.
- Returns plain text or timestamped JSON chunks for analysis.
SKILL.md
.github/skills/supadataView on GitHub ↗
---
name: supadata
description: Supadata API for YouTube/web data. Use when user mentions "Supadata",
"YouTube data", "channel stats", or web scraping data.
---
## Troubleshooting
If requests fail, run `zero doctor check-connector --env-name SUPADATA_TOKEN` or `zero doctor check-connector --url https://api.supadata.ai/v1/transcript --method POST`
## How to Use
All examples below assume you have `SUPADATA_TOKEN` set.
The base URL for the API is:
- `https://api.supadata.ai/v1`
Authentication uses the `x-api-key` header.
### 1. Get YouTube Video Transcript
Extract transcript from a YouTube video:
Write to `/tmp/supadata_url.txt`:
```
https://www.youtube.com/watch?v=dQw4w9WgXcQ
```
```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"
```
**Parameters:**
- `url`: Video URL (required)
- `text`: Return plain text (`true`) or timestamped chunks (`false`, default)
- `lang`: Preferred language (ISO 639-1 code, e.g., `en`, `zh`)
- `mode`: `native` (existing only), `generate` (AI), `auto` (default)
### 2. Get Transcript with Timestamps
Get transcript with timing information:
```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=false" | jq '.content[:3]'
```
Response format:
```json
{
"content": [
{"text": "Hello", "offset": 0, "duration": 1500, "lang": "en"}
],
"lang": "en",
"availableLangs": ["en", "es", "zh"]
}
```
### 3. Get TikTok/Instagram/X Transcript
Extract transcript from other platforms:
```bash
# TikTok
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"
# Instagram Reel
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"
```
Supported platforms: YouTube, TikTok, Instagram, X (Twitter), Facebook
### 4. Native Transcript Only (Save Credits)
Fetch only existing transcripts without AI generation:
```bash
curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true" -d "mode=native"
```
Use `mode=native` to avoid AI generation costs (1 credit vs 2 credits/min).
### 5. Get YouTube Channel Metadata
Get channel information:
```bash
curl -s "https://api.supadata.ai/v1/youtube/channel" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "id=@mkbhd" | jq '{name, subscriberCount, videoCount}
```
Accepts channel URL, channel ID, or handle (e.g., `@mkbhd`).
### 6. Get YouTube Video Metadata
Get video information:
```bash
curl -s "https://api.supadata.ai/v1/youtube/video" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" | jq '{title, viewCount, likeCount, duration}
```
### 7. Get Social Media Metadata
Get metadata from any supported platform:
```bash
curl -s "https://api.supadata.ai/v1/metadata" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt"
```
Works with YouTube, TikTok, Instagram, X, Facebook posts.
### 8. Scrape Web Page to Markdown
Extract web page content:
```bash
curl -s "https://api.supadata.ai/v1/web/scrape" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt"
```
Returns page content in Markdown format, ideal for AI processing.
### 9. Map Website Links
Get all links from a website:
```bash
curl -s "https://api.supadata.ai/v1/web/map" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" | jq '.urls[:10]'
```
### 10. Crawl Website (Async)
Start a crawl job for multiple pages.
Write to `/tmp/supadata_request.json`:
```json
{
"url": "https://example.com",
"maxPages": 10
}
```
Then run:
```bash
# Start crawl
JOB_ID="$(curl -s "https://api.supadata.ai/v1/web/crawl" -X POST -H "x-api-key: $SUPADATA_TOKEN" -H "Content-Type: application/json" -d @/tmp/supadata_request.json | jq -r '.jobId')"
echo "Job ID: ${JOB_ID}"
# Check status
curl -s "https://api.supadata.ai/v1/web/crawl/<your-job-id>" -H "x-api-key: $SUPADATA_TOKEN" | jq '{status, pagesCompleted}'
```
Status values: `queued`, `active`, `completed`, `failed`
### 11. Translate Transcript
Translate a YouTube transcript to another language:
```bash
curl -s "https://api.supadata.ai/v1/youtube/transcript/translate" -H "x-api-key: $SUPADATA_TOKEN" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "lang=zh" -d "text=true"
```
## Response Handling
**Synchronous (HTTP 200):** Direct result returned.
**Asynchronous (HTTP 202):** Returns `jobId` for polling:
```json
{"jobId": "abc123"}
```
Poll the job endpoint until status is `completed`.
## Guidelines
1. **Use `mode=native` to save credits**: Only fetches existing transcripts
2. **URL encode parameters**: Use `--data-urlencode` for URLs
3. **Check available languages**: Response includes `availableLangs` array
4. **Handle async responses**: Some requests return job IDs for polling
5. **Max file size**: 1GB for direct file URLs
6. **Supported formats**: MP4, WEBM, MP3, FLAC, MPEG, M4A, OGG, WAV