pydub-automation

$npx mdskill add guia-matthieu/clawfu-skills/pydub-automation

Automates batch audio processing tasks like format conversion, normalization, and assembly using PyDub for scalable content production.

  • Helps with repetitive audio tasks for large file batches, such as converting formats and normalizing loudness.
  • Integrates with the PyDub library, which abstracts FFmpeg for accessible audio operations in Python.
  • Decisions are based on predefined batch processing needs for consistent and efficient audio handling.
  • Delivers results through automated execution, enabling audio pipelines that save time in content workflows.

SKILL.md

.github/skills/pydub-automationView on GitHub ↗
---
name: pydub-automation
description: "Automate repetitive audio tasks with Python using PyDub for batch processing, format conversion, normalization, and content assembly. Use when: Processing large numbers of audio files consistently; Converting between audio formats at scale; Normalizing loudness across a batch of files; Assembling intros/outros automatically to episodes; Trimming silence or extracting segments programmatically"
license: MIT
metadata:
  author: ClawFu
  version: 1.0.0
  mcp-server: "@clawfu/mcp-skills"
---

# PyDub Audio Automation

> Automate repetitive audio tasks with Python using PyDub for batch processing, format conversion, normalization, and content assembly.

## When to Use This Skill

- Processing large numbers of audio files consistently
- Converting between audio formats at scale
- Normalizing loudness across a batch of files
- Assembling intros/outros automatically to episodes
- Trimming silence or extracting segments programmatically
- Building audio pipelines for content production

## Methodology Foundation

**Source**: PyDub Library (James Robert) + Python Audio Processing

**Core Principle**: "Audio operations that take hours manually can run in minutes with code." PyDub provides a high-level interface that abstracts FFmpeg's complexity, making common operations accessible to non-audio engineers.

**Why This Matters**: Content teams producing regular podcasts, courses, or video content spend significant time on repetitive audio tasks. Automation enables consistent quality at scale while freeing humans for creative work.


## What Claude Does vs What You Decide

| Claude Does | You Decide |
|-------------|------------|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |

## What This Skill Does

1. **Batch processes audio files** - Apply same operations to hundreds of files
2. **Converts formats** - MP3, WAV, FLAC, OGG, and more
3. **Normalizes loudness** - Consistent levels across episodes
4. **Assembles content** - Concatenate intros, content, outros
5. **Extracts segments** - Trim, split, and slice audio programmatically

## How to Use

### Generate Processing Script
```
Help me write a PyDub script to [describe task].
Input files: [format, location]
Output requirements: [format, specs]
```

### Create Batch Workflow
```
Create a Python script that processes all audio files in a folder:
- Input: [source folder, file type]
- Operations: [what to do]
- Output: [destination, naming convention]
```

### Debug Audio Script
```
This PyDub script isn't working as expected:
[paste code]
Expected: [what you want]
Actual: [what's happening]
```

## Instructions

When automating audio with PyDub, follow this methodology:

### Step 1: Setup and Prerequisites

```python
## Installation

# Install PyDub
pip install pydub

# FFmpeg is required (PyDub uses it under the hood)
# macOS:
brew install ffmpeg

# Ubuntu/Debian:
sudo apt-get install ffmpeg

# Windows:
# Download from ffmpeg.org, add to PATH
```

```python
## Basic Imports

from pydub import AudioSegment
from pydub.effects import normalize, compress_dynamic_range
from pydub.silence import detect_silence, split_on_silence
import os
from pathlib import Path
```

---

### Step 2: Core Operations

```python
## Loading and Saving Audio

# Load audio file (format auto-detected from extension)
audio = AudioSegment.from_file("input.mp3")
audio = AudioSegment.from_file("input.wav", format="wav")

# Save audio file
audio.export("output.mp3", format="mp3", bitrate="192k")
audio.export("output.wav", format="wav")

# Export with metadata
audio.export(
    "output.mp3",
    format="mp3",
    bitrate="192k",
    tags={"artist": "Brand Name", "album": "Podcast"}
)
```

```python
## Basic Properties

print(f"Duration: {len(audio)} ms")
print(f"Channels: {audio.channels}")
print(f"Frame rate: {audio.frame_rate} Hz")
print(f"Sample width: {audio.sample_width} bytes")
print(f"dBFS: {audio.dBFS}")  # Volume level
```

---

### Step 3: Volume and Normalization

```python
## Volume Adjustments

# Increase volume by 6 dB
louder = audio + 6

# Decrease volume by 3 dB
quieter = audio - 3

# Normalize to target level (0 dB = maximum)
normalized = normalize(audio)

# Normalize to specific headroom
def normalize_to_target(audio, target_dBFS=-16):
    """Normalize audio to target loudness."""
    change_in_dBFS = target_dBFS - audio.dBFS
    return audio.apply_gain(change_in_dBFS)

normalized = normalize_to_target(audio, target_dBFS=-16)
```

```python
## Batch Normalization

def normalize_folder(input_dir, output_dir, target_dBFS=-16):
    """Normalize all audio files in a folder."""
    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)

    for file in input_path.glob("*.mp3"):
        audio = AudioSegment.from_file(file)
        normalized = normalize_to_target(audio, target_dBFS)

        output_file = output_path / file.name
        normalized.export(output_file, format="mp3", bitrate="192k")
        print(f"Processed: {file.name}")

# Usage
normalize_folder("raw_episodes/", "processed_episodes/", target_dBFS=-16)
```

---

### Step 4: Concatenation and Assembly

```python
## Basic Concatenation

intro = AudioSegment.from_file("intro.mp3")
content = AudioSegment.from_file("episode.mp3")
outro = AudioSegment.from_file("outro.mp3")

# Concatenate (+ operator)
full_episode = intro + content + outro

# Add silence between segments
silence = AudioSegment.silent(duration=2000)  # 2 seconds
full_episode = intro + silence + content + silence + outro

full_episode.export("final_episode.mp3", format="mp3")
```

```python
## Podcast Assembly Script

def assemble_episode(
    content_file,
    intro_file="assets/intro.mp3",
    outro_file="assets/outro.mp3",
    output_file=None,
    intro_fade_ms=500,
    outro_fade_ms=500
):
    """
    Assemble podcast episode with intro and outro.
    Includes crossfade for professional sound.
    """
    intro = AudioSegment.from_file(intro_file)
    content = AudioSegment.from_file(content_file)
    outro = AudioSegment.from_file(outro_file)

    # Apply fade out to intro, fade in to content
    intro = intro.fade_out(intro_fade_ms)
    content = content.fade_in(intro_fade_ms).fade_out(outro_fade_ms)
    outro = outro.fade_in(outro_fade_ms)

    # Crossfade join
    episode = intro.append(content, crossfade=intro_fade_ms)
    episode = episode.append(outro, crossfade=outro_fade_ms)

    # Generate output filename if not provided
    if output_file is None:
        output_file = content_file.replace(".mp3", "_final.mp3")

    episode.export(output_file, format="mp3", bitrate="192k")
    print(f"Assembled: {output_file} ({len(episode)/1000:.1f}s)")
    return output_file

# Usage
assemble_episode("episode_042_raw.mp3")
```

---

### Step 5: Trimming and Splitting

```python
## Time-Based Trimming

# Extract segment (milliseconds)
# audio[start:end]
first_30_seconds = audio[:30000]
last_minute = audio[-60000:]
middle_section = audio[60000:120000]

# Remove first 5 seconds (skip intro)
without_intro = audio[5000:]
```

```python
## Silence-Based Operations

from pydub.silence import detect_silence, split_on_silence

# Detect silence regions
# Returns list of [start, end] in milliseconds
silence_ranges = detect_silence(
    audio,
    min_silence_len=1000,  # Minimum 1 second silence
    silence_thresh=-40     # dB threshold for "silence"
)

# Split on silence (useful for chapter markers)
chunks = split_on_silence(
    audio,
    min_silence_len=500,
    silence_thresh=-40,
    keep_silence=250  # Keep 250ms of silence on each side
)

# Export chunks
for i, chunk in enumerate(chunks):
    chunk.export(f"segment_{i:03d}.mp3", format="mp3")
```

```python
## Trim Silence from Start/End

def trim_silence(audio, silence_thresh=-50, chunk_size=10):
    """Remove silence from beginning and end of audio."""

    # Find first non-silent moment
    start_trim = 0
    for i in range(0, len(audio), chunk_size):
        if audio[i:i+chunk_size].dBFS > silence_thresh:
            start_trim = max(0, i - 100)  # Keep 100ms before
            break

    # Find last non-silent moment
    end_trim = len(audio)
    for i in range(len(audio), 0, -chunk_size):
        if audio[i-chunk_size:i].dBFS > silence_thresh:
            end_trim = min(len(audio), i + 100)  # Keep 100ms after
            break

    return audio[start_trim:end_trim]
```

---

### Step 6: Format Conversion

```python
## Batch Format Conversion

def convert_folder(input_dir, output_dir, output_format="mp3", **export_kwargs):
    """
    Convert all audio files to specified format.

    Example kwargs for MP3:
        bitrate="192k"
    Example kwargs for WAV:
        parameters=["-ac", "1"]  # mono
    """
    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)

    # Supported input formats
    extensions = ["*.mp3", "*.wav", "*.flac", "*.ogg", "*.m4a"]

    for ext in extensions:
        for file in input_path.glob(ext):
            audio = AudioSegment.from_file(file)

            output_file = output_path / f"{file.stem}.{output_format}"
            audio.export(output_file, format=output_format, **export_kwargs)
            print(f"Converted: {file.name} → {output_file.name}")

# Usage: Convert WAVs to MP3
convert_folder("recordings/", "mp3_output/", output_format="mp3", bitrate="192k")

# Usage: Convert to mono WAV
convert_folder("stereo/", "mono/", output_format="wav", parameters=["-ac", "1"])
```

---

### Step 7: Complete Pipeline Example

```python
## Full Podcast Processing Pipeline

from pydub import AudioSegment
from pydub.effects import normalize
from pathlib import Path
import json
from datetime import datetime

class PodcastProcessor:
    """
    Complete podcast processing pipeline.

    Operations:
    1. Normalize loudness
    2. Trim silence
    3. Add intro/outro
    4. Export with metadata
    """

    def __init__(self, config_file="podcast_config.json"):
        with open(config_file) as f:
            self.config = json.load(f)

        self.intro = AudioSegment.from_file(self.config["intro_file"])
        self.outro = AudioSegment.from_file(self.config["outro_file"])

    def process_episode(self, input_file, episode_number, title):
        """Process a single episode through the full pipeline."""

        print(f"Processing Episode {episode_number}: {title}")

        # 1. Load and normalize
        audio = AudioSegment.from_file(input_file)
        target_db = self.config.get("target_loudness", -16)
        audio = self._normalize_to_target(audio, target_db)
        print(f"  ✓ Normalized to {target_db} dBFS")

        # 2. Trim silence
        audio = self._trim_silence(audio)
        print(f"  ✓ Trimmed silence (duration: {len(audio)/1000:.1f}s)")

        # 3. Add intro/outro with crossfade
        fade_ms = self.config.get("crossfade_ms", 500)
        intro = self.intro.fade_out(fade_ms)
        outro = self.outro.fade_in(fade_ms)
        audio = audio.fade_in(fade_ms).fade_out(fade_ms)

        final = intro.append(audio, crossfade=fade_ms)
        final = final.append(outro, crossfade=fade_ms)
        print(f"  ✓ Added intro/outro (total: {len(final)/1000:.1f}s)")

        # 4. Export with metadata
        output_dir = Path(self.config["output_dir"])
        output_dir.mkdir(exist_ok=True)

        filename = f"episode_{episode_number:03d}.mp3"
        output_path = output_dir / filename

        final.export(
            output_path,
            format="mp3",
            bitrate=self.config.get("bitrate", "192k"),
            tags={
                "title": f"Episode {episode_number}: {title}",
                "artist": self.config["podcast_name"],
                "album": self.config["podcast_name"],
                "track": episode_number,
                "date": datetime.now().strftime("%Y"),
            }
        )
        print(f"  ✓ Exported: {output_path}")

        return output_path

    def _normalize_to_target(self, audio, target_dBFS):
        change = target_dBFS - audio.dBFS
        return audio.apply_gain(change)

    def _trim_silence(self, audio, threshold=-50):
        # Simple trim - just first/last 100ms chunks
        chunk_size = 100
        start = 0
        for i in range(0, min(5000, len(audio)), chunk_size):
            if audio[i:i+chunk_size].dBFS > threshold:
                start = max(0, i - 100)
                break

        end = len(audio)
        for i in range(len(audio), max(0, len(audio)-5000), -chunk_size):
            if audio[i-chunk_size:i].dBFS > threshold:
                end = min(len(audio), i + 100)
                break

        return audio[start:end]


# Example config file (podcast_config.json):
"""
{
    "podcast_name": "Marketing Mastery",
    "intro_file": "assets/intro.mp3",
    "outro_file": "assets/outro.mp3",
    "output_dir": "processed_episodes/",
    "target_loudness": -16,
    "crossfade_ms": 500,
    "bitrate": "192k"
}
"""

# Usage:
# processor = PodcastProcessor("podcast_config.json")
# processor.process_episode("raw/ep42.mp3", 42, "The Power of Audio")
```

## Examples

### Example 1: Batch Normalize Podcast Backlog

**Context**: 50 podcast episodes with inconsistent loudness levels.

**Input**:
> Write a script to normalize all MP3 files in my episodes/ folder to -16 LUFS, keeping the originals as backup.

**Output**:

```python
#!/usr/bin/env python3
"""
Batch normalize podcast episodes to -16 dBFS.
Creates backups before processing.
"""

from pydub import AudioSegment
from pathlib import Path
import shutil

def batch_normalize(
    input_dir: str,
    target_dBFS: float = -16,
    create_backup: bool = True
):
    """
    Normalize all MP3 files in directory to target loudness.

    Args:
        input_dir: Folder containing MP3 files
        target_dBFS: Target loudness level
        create_backup: If True, saves originals to backup/ folder
    """
    input_path = Path(input_dir)

    # Create backup folder if needed
    if create_backup:
        backup_path = input_path / "backup"
        backup_path.mkdir(exist_ok=True)

    # Find all MP3 files
    mp3_files = list(input_path.glob("*.mp3"))
    print(f"Found {len(mp3_files)} MP3 files")

    for i, file in enumerate(mp3_files, 1):
        print(f"\n[{i}/{len(mp3_files)}] Processing: {file.name}")

        # Backup original
        if create_backup:
            shutil.copy(file, backup_path / file.name)

        # Load and check current level
        audio = AudioSegment.from_file(file)
        original_dBFS = audio.dBFS
        print(f"  Original level: {original_dBFS:.1f} dBFS")

        # Calculate adjustment needed
        adjustment = target_dBFS - original_dBFS

        if abs(adjustment) < 0.5:
            print(f"  Already at target, skipping")
            continue

        # Apply normalization
        normalized = audio.apply_gain(adjustment)

        # Export (overwrite original)
        normalized.export(file, format="mp3", bitrate="192k")
        print(f"  Normalized: {original_dBFS:.1f} → {target_dBFS} dBFS")

    print(f"\n✓ Complete! Processed {len(mp3_files)} files")
    if create_backup:
        print(f"  Backups saved to: {backup_path}")


if __name__ == "__main__":
    import sys

    input_dir = sys.argv[1] if len(sys.argv) > 1 else "episodes/"
    batch_normalize(input_dir, target_dBFS=-16)
```

**Usage**:
```bash
python normalize_episodes.py episodes/
```

---

### Example 2: Auto-Add Intro/Outro to New Episodes

**Context**: Weekly podcast needs intro/outro added to each new recording.

**Input**:
> Create a script that watches a folder for new recordings and automatically adds my standard intro and outro.

**Output**:

```python
#!/usr/bin/env python3
"""
Auto-assemble podcast episodes when new recordings are added.
Watch 'incoming/' folder, output to 'ready/' folder.
"""

from pydub import AudioSegment
from pathlib import Path
import time

# Configuration
WATCH_DIR = Path("incoming/")
OUTPUT_DIR = Path("ready/")
INTRO_FILE = Path("assets/intro.mp3")
OUTRO_FILE = Path("assets/outro.mp3")
CROSSFADE_MS = 500
TARGET_DBFS = -16

def assemble_episode(input_file: Path) -> Path:
    """Add intro/outro and normalize a podcast episode."""

    print(f"\nProcessing: {input_file.name}")

    # Load audio
    intro = AudioSegment.from_file(INTRO_FILE)
    content = AudioSegment.from_file(input_file)
    outro = AudioSegment.from_file(OUTRO_FILE)

    # Normalize content to target
    adjustment = TARGET_DBFS - content.dBFS
    content = content.apply_gain(adjustment)
    print(f"  Normalized: {content.dBFS:.1f} dBFS")

    # Apply fades
    intro = intro.fade_out(CROSSFADE_MS)
    content = content.fade_in(CROSSFADE_MS).fade_out(CROSSFADE_MS)
    outro = outro.fade_in(CROSSFADE_MS)

    # Assemble with crossfade
    episode = intro.append(content, crossfade=CROSSFADE_MS)
    episode = episode.append(outro, crossfade=CROSSFADE_MS)

    # Export
    output_file = OUTPUT_DIR / input_file.name
    episode.export(output_file, format="mp3", bitrate="192k")
    print(f"  Created: {output_file} ({len(episode)/1000/60:.1f} min)")

    return output_file


def watch_and_process():
    """Watch folder and process new files."""

    WATCH_DIR.mkdir(exist_ok=True)
    OUTPUT_DIR.mkdir(exist_ok=True)

    processed = set()
    print(f"Watching {WATCH_DIR} for new recordings...")
    print(f"Output to: {OUTPUT_DIR}")
    print("Press Ctrl+C to stop\n")

    while True:
        for file in WATCH_DIR.glob("*.mp3"):
            if file.name not in processed:
                try:
                    assemble_episode(file)
                    processed.add(file.name)
                    # Move original to archive
                    archive = WATCH_DIR / "processed"
                    archive.mkdir(exist_ok=True)
                    file.rename(archive / file.name)
                except Exception as e:
                    print(f"  Error: {e}")

        time.sleep(5)  # Check every 5 seconds


if __name__ == "__main__":
    watch_and_process()
```

## Checklists & Templates

### PyDub Project Setup

```
## Requirements

□ Python 3.7+ installed
□ pip install pydub
□ FFmpeg installed and in PATH
□ Test: python -c "from pydub import AudioSegment; print('OK')"

## Project Structure

project/
├── scripts/
│   ├── normalize.py
│   ├── assemble.py
│   └── convert.py
├── assets/
│   ├── intro.mp3
│   └── outro.mp3
├── incoming/      # Raw recordings
├── processed/     # Final output
└── config.json    # Settings
```

---

### Common Operations Reference

```python
## Quick Reference

# Load
audio = AudioSegment.from_file("file.mp3")

# Save
audio.export("out.mp3", format="mp3", bitrate="192k")

# Volume
louder = audio + 6  # +6 dB
quieter = audio - 3  # -3 dB

# Trim
first_30s = audio[:30000]  # milliseconds
last_min = audio[-60000:]

# Concatenate
combined = audio1 + audio2 + audio3

# Fade
audio = audio.fade_in(500).fade_out(500)

# Crossfade
combined = audio1.append(audio2, crossfade=500)

# Normalize
from pydub.effects import normalize
normalized = normalize(audio)

# Silence
silence = AudioSegment.silent(duration=2000)

# Properties
print(len(audio))  # duration in ms
print(audio.dBFS)  # volume level
```

## Skill Boundaries

### What This Skill Does Well
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches

### What This Skill Cannot Do
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success

## References

- PyDub. "GitHub Repository" (jiaaro/pydub) - Library documentation
- GeeksforGeeks. "Create an Audio Editor in Python Using PyDub"
- Real Python. "Working with Audio in Python"
- FFmpeg Documentation - Underlying encoder

## Related Skills

- [audio-editing](../audio-editing/) - Manual editing fundamentals
- [transcription-to-content](../transcription-to-content/) - Processing transcripts
- [podcast-production](../podcast-production/) - Full podcast workflow

---

## Skill Metadata (Internal Use)

```yaml
name: pydub-automation
category: audio
subcategory: automation
version: 1.0
author: MKTG Skills
source_expert: PyDub Library
source_work: jiaaro/pydub
difficulty: intermediate
estimated_value: Hours saved per batch (5-50 hours depending on scale)
tags: [python, automation, audio, batch-processing, pydub]
created: 2026-01-26
updated: 2026-01-26
```

More from guia-matthieu/clawfu-skills

SkillDescription
aarrr-metricsMeasure and optimize growth using the AARRR (Pirate Metrics) framework with stage-specific KPIs and funnel analysis
ab-test-stats"Calculate A/B test statistical significance. Use when: determining if test results are significant; calculating required sample size; estimating test duration; analyzing conversion experiments; making data-driven decisions"
account-healthAssess customer account health using product usage, support sentiment, payment status, and relationship signals
ad-spend-optimizer"Analyze paid advertising performance across channels and recommend budget reallocation to maximize ROAS and minimize CAC. Use when: planning quarterly ad budget allocation, diagnosing underperforming ad channels, deciding whether to scale spend on a channel, calculating marginal ROI across Google Ads, Meta, LinkedIn, or TikTok, rebalancing media mix after performance shifts, or setting up a test-and-scale framework for new channels."
ai-bot-log-auditUse when analyzing server logs to understand how AI crawlers (GPTBot, ClaudeBot, PerplexityBot) interact with your site. Use when optimizing content placement for LLM retrieval, diagnosing why AI search isn't citing your content, or auditing crawl patterns to find optimization gaps.
ai-storyboard-2x2"Créez des storyboards visuellement cohérents en utilisant la technique des 2x2 Grid Shots de PJ Ace, garantissant éclairage, personnages et décors uniformes entre les plans. Use when: **Après avoir finalisé un script vidéo** - Transformer le concept en visuels; **Besoin de cohérence visuelle** - Personnages et éclairage constants entre les plans; **Préparer des assets pour animation** - Frames prêtes pour Veo, Runway, Kling; **Présenter un storyboard client** - Visualisation avant production;..."
ai-video-concept"Développez une idée créative et structurez un script vidéo optimisé pour la génération IA, en suivant la méthode des scènes de 8 secondes de PJ Ace. Use when: **Démarrer une publicité vidéo IA** - Transformer une idée brute en script structuré; **Créer du contenu vidéo pour les réseaux sociaux** - TikTok, Reels, YouTube Shorts; **Développer un concept de campagne** - Avant de passer au storyboard; **Pitcher une idée vidéo** - Présenter un concept à un client ou une équipe; **Adapter un messag..."
ai-video-prompting"Générez des prompts optimisés pour chaque modèle de génération vidéo IA (Veo 3, Runway Gen-3, Kling 2.6, Pika), en exploitant leurs forces spécifiques. Use when: **Animer des frames de storyboard** - Transformer des images fixes en vidéo; **Choisir le bon modèle** - Sélectionner Veo, Runway, Kling ou Pika selon le besoin; **Optimiser la qualité de génération** - Prompts structurés pour meilleurs résultats; **Créer des transitions fluides** - Scene extension, first/last frame; **Utiliser le mo..."
ai-video-qa"Validez la qualité de vos vidéos IA avant publication avec une checklist complète couvrant technique, créatif, et positionnement marque. Use when: **Avant publication** - Dernière validation avant mise en ligne; **Revue client** - Préparer les points de feedback anticipés; **Itération qualité** - Identifier les problèmes à corriger; **Go/No-Go decision** - Décider si la vidéo est prête; **Post-mortem** - Analyser pourquoi une vidéo a (ou n'a pas) performé"
ai-voice-design"Concevez et générez des voix IA pour vos vidéos en utilisant ElevenLabs ou Qwen3-TTS, avec clonage vocal, design par description, et synchronisation lip-sync. Use when: **Créer une voix de marque** - Définir le ton vocal pour une campagne; **Cloner une voix existante** - Reproduire une voix avec autorisation; **Designer une voix originale** - Créer une voix à partir d'une description; **Multi-personnages** - Gérer plusieurs voix dans une même vidéo; **Lip-sync vidéo IA** - Synchroniser voix e..."