speechbrain

Name: speechbrain
Author: mkurman/zorai

$npx mdskill add mkurman/zorai/speechbrain

Transcribe, analyze, and synthesize speech using PyTorch models.

Convert audio to text, identify speakers, and generate speech.
Depends on PyTorch and pre-trained SpeechBrain models.
Executes tasks via recipe-based training and inference APIs.
Returns transcripts, scores, and synthesized audio outputs.

SKILL.md

.github/skills/speechbrainView on GitHub ↗

---
name: speechbrain
description: "SpeechBrain — PyTorch speech toolkit. ASR, speaker recognition, speech separation, diarization, enhancement, language identification, and TTS. Recipe-based training with pre-trained model zoo."
tags: [speech-recognition, speaker-diarization, speaker-verification, speech-embeddings, speechbrain]
---
## Overview

SpeechBrain is an open-source PyTorch speech processing toolkit covering ASR (speech-to-text), speaker recognition, speech separation, diarization, enhancement, language identification, emotion recognition, and text-to-speech. Provides pretrained models and recipe-based training.

## Installation

```bash
uv pip install speechbrain
```

## Speech Recognition

```python
from speechbrain.inference.ASR import EncoderDecoderASR

asr_model = EncoderDecoderASR.from_hparams(
    source="speechbrain/asr-crdnn-rnnlm-librispeech",
    savedir="pretrained_models/asr")
transcript = asr_model.transcribe_file("audio.wav")
print(f"Transcript: {transcript}")
```

## Speaker Verification

```python
from speechbrain.inference.speaker import SpeakerRecognition

verification = SpeakerRecognition.from_hparams(
    source="speechbrain/spkrec-ecapa-voxceleb",
    savedir="pretrained_models/spkrec")
score, prediction = verification.verify_files("speaker1.wav", "speaker2.wav")
print(f"Same speaker: {prediction} (score: {score:.3f})")
```

## References
- [SpeechBrain docs](https://speechbrain.github.io/)
- [SpeechBrain GitHub](https://github.com/speechbrain/speechbrain)

More from mkurman/zorai

Skill	Description
account-management	>
agile-scrum	>
albumentations	Fast image augmentation library (Albumentations). 70+ transforms for classification, segmentation, object detection, keypoints, and pose estimation. Optimized OpenCV-based pipeline with unified API across all CV tasks. Supports images, masks, bounding boxes, and keypoints simultaneously. Note: classic Albumentations (MIT) is no longer maintained; successor AlbumentationsX uses AGPL-3.0. For torchvision-native augmentations, use torchvision.transforms.v2.
aml-compliance	Anti-Money Laundering (AML) and Know Your Customer (KYC) compliance workflow. Sanctions screening, PEP detection, transaction monitoring, suspicious activity reporting (SAR), and OFAC compliance.
anki-connect	This skill is for interacting with Anki through AnkiConnect, and should be used whenever a user asks to interact with Anki, including to read or modify decks, notes, cards, models, media, or sync operations.
approval-checkpoint-long-task	Canonical long-task pack for daemon-managed work with deliberate approval checkpoints, status summaries, rollback notes, and mobile-safe governance-aware updates.
auditing-goal-artifacts	Use when reviewing recent zorai goal run outputs, closure markers, ledgers, or evidence bundles to judge whether completion is credible or to identify remaining uncertainty.
autogen	AutoGen (Microsoft) — multi-agent conversation framework. Agent-to-agent chat, code generation & execution, tool use, group chat, and human-in-the-loop. Build collaborative AI systems with specialized agents.
backtrader	Python backtesting framework for trading strategies. Data feeds, brokers, analyzers, and live trading support. Strategy development with commission models, slippage, and signal-based execution.
beautiful-mermaid	Render Mermaid diagrams as SVG and PNG using the Beautiful Mermaid library. Use when the user asks to render a Mermaid diagram.