transcribe

Name: transcribe
Author: vellum-ai/vellum-assistant

$npx mdskill add vellum-ai/vellum-assistant/transcribe

Transcribe audio and video files using the configured speech-to-text provider. Supports multiple STT providers including OpenAI Whisper, Deepgram, and Google Gemini — the active provider is selected in Settings under Speech-to-Text (`services.stt`).

SKILL.md

.github/skills/transcribeView on GitHub ↗

---
name: transcribe
description: Transcribe audio and video files using the configured speech-to-text provider
compatibility: "Designed for Vellum personal assistants"
metadata:
emoji: "🎙️"
vellum:
display-name: "Transcribe"
activation-hints:
- "User has an audio or video file on disk they want converted to text"
- "User wants speech-to-text on a recording, voice memo, podcast, or meeting capture"
- "User asks for a transcript of a media file (mp3, wav, m4a, mp4, mov, etc.)"
---

## Usage Notes

- The tool accepts a `file_path` (absolute path to a local audio or video file) to transcribe.
- Supported formats: any video (mp4, mov, etc.) or audio (mp3, wav, m4a, etc.) file.
- For video files, audio is automatically extracted via ffmpeg before transcription.
- Large files are automatically split into chunks for processing.
- If no STT provider credentials are configured, the tool will return an error with setup instructions.
- The STT provider (`services.stt`) is shared between transcription and telephony call paths.

## Maintenance

When adding or modifying an STT provider, follow the onboarding checklist at `assistant/docs/stt-provider-onboarding.md`. That document covers the daemon catalog, config schema, adapter wiring, client catalog parity, and required tests.

More from vellum-ai/vellum-assistant

Skill	Description
acp	Spawn external coding agents via the Agent Client Protocol (ACP)
amazon	Shop on Amazon and Amazon Fresh through your browser
api-mapping	Record and analyze API surfaces of web services
app-builder	Build and edit small, personal visual tools and artifacts — dashboards, trackers, calculators, data visualizations, charts, simple landing pages, and slide decks the user wants for THEMSELVES. This is the right skill whenever the user asks to "visualize this," "make a chart," or "build an artifact" for their own use, or to edit an app they already built here. Do NOT reach for a ui_show dynamic_page to fake an artifact — build a real persistent app here. NOT for complex, multi-user, or shippable products — those go to a real project folder with a coding agent (see Scope below).
app-control	Drive a specific named macOS app via raw input bypassing the Accessibility tree
assistant-migration	Migrate from ChatGPT, Claude, OpenClaw, Hermes, Manus, and other AI assistants into Vellum by inspecting their data exports, conversation archives, files, prompts, custom instructions, memory, saved memories, tools, GPTs, workflows, integrations, and relationships, then mapping as much as safely possible into Vellum primitives. Handles single-source and multi-source migrations with a unified, deduplicated inventory.
chatgpt-import	Import conversation history from ChatGPT into Vellum
cli-discover	Discover which CLI tools are installed, their versions, and authentication status
computer-use	Control the macOS desktop
contacts	Manage contacts, communication channels, access control, and invite links