pydub-automation
$
npx mdskill add guia-matthieu/clawfu-skills/pydub-automationAutomates batch audio processing tasks like format conversion, normalization, and assembly using PyDub for scalable content production.
- Helps with repetitive audio tasks for large file batches, such as converting formats and normalizing loudness.
- Integrates with the PyDub library, which abstracts FFmpeg for accessible audio operations in Python.
- Decisions are based on predefined batch processing needs for consistent and efficient audio handling.
- Delivers results through automated execution, enabling audio pipelines that save time in content workflows.
SKILL.md
.github/skills/pydub-automationView on GitHub ↗
---
name: pydub-automation
description: "Automate repetitive audio tasks with Python using PyDub for batch processing, format conversion, normalization, and content assembly. Use when: Processing large numbers of audio files consistently; Converting between audio formats at scale; Normalizing loudness across a batch of files; Assembling intros/outros automatically to episodes; Trimming silence or extracting segments programmatically"
license: MIT
metadata:
author: ClawFu
version: 1.0.0
mcp-server: "@clawfu/mcp-skills"
---
# PyDub Audio Automation
> Automate repetitive audio tasks with Python using PyDub for batch processing, format conversion, normalization, and content assembly.
## When to Use This Skill
- Processing large numbers of audio files consistently
- Converting between audio formats at scale
- Normalizing loudness across a batch of files
- Assembling intros/outros automatically to episodes
- Trimming silence or extracting segments programmatically
- Building audio pipelines for content production
## Methodology Foundation
**Source**: PyDub Library (James Robert) + Python Audio Processing
**Core Principle**: "Audio operations that take hours manually can run in minutes with code." PyDub provides a high-level interface that abstracts FFmpeg's complexity, making common operations accessible to non-audio engineers.
**Why This Matters**: Content teams producing regular podcasts, courses, or video content spend significant time on repetitive audio tasks. Automation enables consistent quality at scale while freeing humans for creative work.
## What Claude Does vs What You Decide
| Claude Does | You Decide |
|-------------|------------|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |
## What This Skill Does
1. **Batch processes audio files** - Apply same operations to hundreds of files
2. **Converts formats** - MP3, WAV, FLAC, OGG, and more
3. **Normalizes loudness** - Consistent levels across episodes
4. **Assembles content** - Concatenate intros, content, outros
5. **Extracts segments** - Trim, split, and slice audio programmatically
## How to Use
### Generate Processing Script
```
Help me write a PyDub script to [describe task].
Input files: [format, location]
Output requirements: [format, specs]
```
### Create Batch Workflow
```
Create a Python script that processes all audio files in a folder:
- Input: [source folder, file type]
- Operations: [what to do]
- Output: [destination, naming convention]
```
### Debug Audio Script
```
This PyDub script isn't working as expected:
[paste code]
Expected: [what you want]
Actual: [what's happening]
```
## Instructions
When automating audio with PyDub, follow this methodology:
### Step 1: Setup and Prerequisites
```python
## Installation
# Install PyDub
pip install pydub
# FFmpeg is required (PyDub uses it under the hood)
# macOS:
brew install ffmpeg
# Ubuntu/Debian:
sudo apt-get install ffmpeg
# Windows:
# Download from ffmpeg.org, add to PATH
```
```python
## Basic Imports
from pydub import AudioSegment
from pydub.effects import normalize, compress_dynamic_range
from pydub.silence import detect_silence, split_on_silence
import os
from pathlib import Path
```
---
### Step 2: Core Operations
```python
## Loading and Saving Audio
# Load audio file (format auto-detected from extension)
audio = AudioSegment.from_file("input.mp3")
audio = AudioSegment.from_file("input.wav", format="wav")
# Save audio file
audio.export("output.mp3", format="mp3", bitrate="192k")
audio.export("output.wav", format="wav")
# Export with metadata
audio.export(
"output.mp3",
format="mp3",
bitrate="192k",
tags={"artist": "Brand Name", "album": "Podcast"}
)
```
```python
## Basic Properties
print(f"Duration: {len(audio)} ms")
print(f"Channels: {audio.channels}")
print(f"Frame rate: {audio.frame_rate} Hz")
print(f"Sample width: {audio.sample_width} bytes")
print(f"dBFS: {audio.dBFS}") # Volume level
```
---
### Step 3: Volume and Normalization
```python
## Volume Adjustments
# Increase volume by 6 dB
louder = audio + 6
# Decrease volume by 3 dB
quieter = audio - 3
# Normalize to target level (0 dB = maximum)
normalized = normalize(audio)
# Normalize to specific headroom
def normalize_to_target(audio, target_dBFS=-16):
"""Normalize audio to target loudness."""
change_in_dBFS = target_dBFS - audio.dBFS
return audio.apply_gain(change_in_dBFS)
normalized = normalize_to_target(audio, target_dBFS=-16)
```
```python
## Batch Normalization
def normalize_folder(input_dir, output_dir, target_dBFS=-16):
"""Normalize all audio files in a folder."""
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
for file in input_path.glob("*.mp3"):
audio = AudioSegment.from_file(file)
normalized = normalize_to_target(audio, target_dBFS)
output_file = output_path / file.name
normalized.export(output_file, format="mp3", bitrate="192k")
print(f"Processed: {file.name}")
# Usage
normalize_folder("raw_episodes/", "processed_episodes/", target_dBFS=-16)
```
---
### Step 4: Concatenation and Assembly
```python
## Basic Concatenation
intro = AudioSegment.from_file("intro.mp3")
content = AudioSegment.from_file("episode.mp3")
outro = AudioSegment.from_file("outro.mp3")
# Concatenate (+ operator)
full_episode = intro + content + outro
# Add silence between segments
silence = AudioSegment.silent(duration=2000) # 2 seconds
full_episode = intro + silence + content + silence + outro
full_episode.export("final_episode.mp3", format="mp3")
```
```python
## Podcast Assembly Script
def assemble_episode(
content_file,
intro_file="assets/intro.mp3",
outro_file="assets/outro.mp3",
output_file=None,
intro_fade_ms=500,
outro_fade_ms=500
):
"""
Assemble podcast episode with intro and outro.
Includes crossfade for professional sound.
"""
intro = AudioSegment.from_file(intro_file)
content = AudioSegment.from_file(content_file)
outro = AudioSegment.from_file(outro_file)
# Apply fade out to intro, fade in to content
intro = intro.fade_out(intro_fade_ms)
content = content.fade_in(intro_fade_ms).fade_out(outro_fade_ms)
outro = outro.fade_in(outro_fade_ms)
# Crossfade join
episode = intro.append(content, crossfade=intro_fade_ms)
episode = episode.append(outro, crossfade=outro_fade_ms)
# Generate output filename if not provided
if output_file is None:
output_file = content_file.replace(".mp3", "_final.mp3")
episode.export(output_file, format="mp3", bitrate="192k")
print(f"Assembled: {output_file} ({len(episode)/1000:.1f}s)")
return output_file
# Usage
assemble_episode("episode_042_raw.mp3")
```
---
### Step 5: Trimming and Splitting
```python
## Time-Based Trimming
# Extract segment (milliseconds)
# audio[start:end]
first_30_seconds = audio[:30000]
last_minute = audio[-60000:]
middle_section = audio[60000:120000]
# Remove first 5 seconds (skip intro)
without_intro = audio[5000:]
```
```python
## Silence-Based Operations
from pydub.silence import detect_silence, split_on_silence
# Detect silence regions
# Returns list of [start, end] in milliseconds
silence_ranges = detect_silence(
audio,
min_silence_len=1000, # Minimum 1 second silence
silence_thresh=-40 # dB threshold for "silence"
)
# Split on silence (useful for chapter markers)
chunks = split_on_silence(
audio,
min_silence_len=500,
silence_thresh=-40,
keep_silence=250 # Keep 250ms of silence on each side
)
# Export chunks
for i, chunk in enumerate(chunks):
chunk.export(f"segment_{i:03d}.mp3", format="mp3")
```
```python
## Trim Silence from Start/End
def trim_silence(audio, silence_thresh=-50, chunk_size=10):
"""Remove silence from beginning and end of audio."""
# Find first non-silent moment
start_trim = 0
for i in range(0, len(audio), chunk_size):
if audio[i:i+chunk_size].dBFS > silence_thresh:
start_trim = max(0, i - 100) # Keep 100ms before
break
# Find last non-silent moment
end_trim = len(audio)
for i in range(len(audio), 0, -chunk_size):
if audio[i-chunk_size:i].dBFS > silence_thresh:
end_trim = min(len(audio), i + 100) # Keep 100ms after
break
return audio[start_trim:end_trim]
```
---
### Step 6: Format Conversion
```python
## Batch Format Conversion
def convert_folder(input_dir, output_dir, output_format="mp3", **export_kwargs):
"""
Convert all audio files to specified format.
Example kwargs for MP3:
bitrate="192k"
Example kwargs for WAV:
parameters=["-ac", "1"] # mono
"""
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
# Supported input formats
extensions = ["*.mp3", "*.wav", "*.flac", "*.ogg", "*.m4a"]
for ext in extensions:
for file in input_path.glob(ext):
audio = AudioSegment.from_file(file)
output_file = output_path / f"{file.stem}.{output_format}"
audio.export(output_file, format=output_format, **export_kwargs)
print(f"Converted: {file.name} → {output_file.name}")
# Usage: Convert WAVs to MP3
convert_folder("recordings/", "mp3_output/", output_format="mp3", bitrate="192k")
# Usage: Convert to mono WAV
convert_folder("stereo/", "mono/", output_format="wav", parameters=["-ac", "1"])
```
---
### Step 7: Complete Pipeline Example
```python
## Full Podcast Processing Pipeline
from pydub import AudioSegment
from pydub.effects import normalize
from pathlib import Path
import json
from datetime import datetime
class PodcastProcessor:
"""
Complete podcast processing pipeline.
Operations:
1. Normalize loudness
2. Trim silence
3. Add intro/outro
4. Export with metadata
"""
def __init__(self, config_file="podcast_config.json"):
with open(config_file) as f:
self.config = json.load(f)
self.intro = AudioSegment.from_file(self.config["intro_file"])
self.outro = AudioSegment.from_file(self.config["outro_file"])
def process_episode(self, input_file, episode_number, title):
"""Process a single episode through the full pipeline."""
print(f"Processing Episode {episode_number}: {title}")
# 1. Load and normalize
audio = AudioSegment.from_file(input_file)
target_db = self.config.get("target_loudness", -16)
audio = self._normalize_to_target(audio, target_db)
print(f" ✓ Normalized to {target_db} dBFS")
# 2. Trim silence
audio = self._trim_silence(audio)
print(f" ✓ Trimmed silence (duration: {len(audio)/1000:.1f}s)")
# 3. Add intro/outro with crossfade
fade_ms = self.config.get("crossfade_ms", 500)
intro = self.intro.fade_out(fade_ms)
outro = self.outro.fade_in(fade_ms)
audio = audio.fade_in(fade_ms).fade_out(fade_ms)
final = intro.append(audio, crossfade=fade_ms)
final = final.append(outro, crossfade=fade_ms)
print(f" ✓ Added intro/outro (total: {len(final)/1000:.1f}s)")
# 4. Export with metadata
output_dir = Path(self.config["output_dir"])
output_dir.mkdir(exist_ok=True)
filename = f"episode_{episode_number:03d}.mp3"
output_path = output_dir / filename
final.export(
output_path,
format="mp3",
bitrate=self.config.get("bitrate", "192k"),
tags={
"title": f"Episode {episode_number}: {title}",
"artist": self.config["podcast_name"],
"album": self.config["podcast_name"],
"track": episode_number,
"date": datetime.now().strftime("%Y"),
}
)
print(f" ✓ Exported: {output_path}")
return output_path
def _normalize_to_target(self, audio, target_dBFS):
change = target_dBFS - audio.dBFS
return audio.apply_gain(change)
def _trim_silence(self, audio, threshold=-50):
# Simple trim - just first/last 100ms chunks
chunk_size = 100
start = 0
for i in range(0, min(5000, len(audio)), chunk_size):
if audio[i:i+chunk_size].dBFS > threshold:
start = max(0, i - 100)
break
end = len(audio)
for i in range(len(audio), max(0, len(audio)-5000), -chunk_size):
if audio[i-chunk_size:i].dBFS > threshold:
end = min(len(audio), i + 100)
break
return audio[start:end]
# Example config file (podcast_config.json):
"""
{
"podcast_name": "Marketing Mastery",
"intro_file": "assets/intro.mp3",
"outro_file": "assets/outro.mp3",
"output_dir": "processed_episodes/",
"target_loudness": -16,
"crossfade_ms": 500,
"bitrate": "192k"
}
"""
# Usage:
# processor = PodcastProcessor("podcast_config.json")
# processor.process_episode("raw/ep42.mp3", 42, "The Power of Audio")
```
## Examples
### Example 1: Batch Normalize Podcast Backlog
**Context**: 50 podcast episodes with inconsistent loudness levels.
**Input**:
> Write a script to normalize all MP3 files in my episodes/ folder to -16 LUFS, keeping the originals as backup.
**Output**:
```python
#!/usr/bin/env python3
"""
Batch normalize podcast episodes to -16 dBFS.
Creates backups before processing.
"""
from pydub import AudioSegment
from pathlib import Path
import shutil
def batch_normalize(
input_dir: str,
target_dBFS: float = -16,
create_backup: bool = True
):
"""
Normalize all MP3 files in directory to target loudness.
Args:
input_dir: Folder containing MP3 files
target_dBFS: Target loudness level
create_backup: If True, saves originals to backup/ folder
"""
input_path = Path(input_dir)
# Create backup folder if needed
if create_backup:
backup_path = input_path / "backup"
backup_path.mkdir(exist_ok=True)
# Find all MP3 files
mp3_files = list(input_path.glob("*.mp3"))
print(f"Found {len(mp3_files)} MP3 files")
for i, file in enumerate(mp3_files, 1):
print(f"\n[{i}/{len(mp3_files)}] Processing: {file.name}")
# Backup original
if create_backup:
shutil.copy(file, backup_path / file.name)
# Load and check current level
audio = AudioSegment.from_file(file)
original_dBFS = audio.dBFS
print(f" Original level: {original_dBFS:.1f} dBFS")
# Calculate adjustment needed
adjustment = target_dBFS - original_dBFS
if abs(adjustment) < 0.5:
print(f" Already at target, skipping")
continue
# Apply normalization
normalized = audio.apply_gain(adjustment)
# Export (overwrite original)
normalized.export(file, format="mp3", bitrate="192k")
print(f" Normalized: {original_dBFS:.1f} → {target_dBFS} dBFS")
print(f"\n✓ Complete! Processed {len(mp3_files)} files")
if create_backup:
print(f" Backups saved to: {backup_path}")
if __name__ == "__main__":
import sys
input_dir = sys.argv[1] if len(sys.argv) > 1 else "episodes/"
batch_normalize(input_dir, target_dBFS=-16)
```
**Usage**:
```bash
python normalize_episodes.py episodes/
```
---
### Example 2: Auto-Add Intro/Outro to New Episodes
**Context**: Weekly podcast needs intro/outro added to each new recording.
**Input**:
> Create a script that watches a folder for new recordings and automatically adds my standard intro and outro.
**Output**:
```python
#!/usr/bin/env python3
"""
Auto-assemble podcast episodes when new recordings are added.
Watch 'incoming/' folder, output to 'ready/' folder.
"""
from pydub import AudioSegment
from pathlib import Path
import time
# Configuration
WATCH_DIR = Path("incoming/")
OUTPUT_DIR = Path("ready/")
INTRO_FILE = Path("assets/intro.mp3")
OUTRO_FILE = Path("assets/outro.mp3")
CROSSFADE_MS = 500
TARGET_DBFS = -16
def assemble_episode(input_file: Path) -> Path:
"""Add intro/outro and normalize a podcast episode."""
print(f"\nProcessing: {input_file.name}")
# Load audio
intro = AudioSegment.from_file(INTRO_FILE)
content = AudioSegment.from_file(input_file)
outro = AudioSegment.from_file(OUTRO_FILE)
# Normalize content to target
adjustment = TARGET_DBFS - content.dBFS
content = content.apply_gain(adjustment)
print(f" Normalized: {content.dBFS:.1f} dBFS")
# Apply fades
intro = intro.fade_out(CROSSFADE_MS)
content = content.fade_in(CROSSFADE_MS).fade_out(CROSSFADE_MS)
outro = outro.fade_in(CROSSFADE_MS)
# Assemble with crossfade
episode = intro.append(content, crossfade=CROSSFADE_MS)
episode = episode.append(outro, crossfade=CROSSFADE_MS)
# Export
output_file = OUTPUT_DIR / input_file.name
episode.export(output_file, format="mp3", bitrate="192k")
print(f" Created: {output_file} ({len(episode)/1000/60:.1f} min)")
return output_file
def watch_and_process():
"""Watch folder and process new files."""
WATCH_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)
processed = set()
print(f"Watching {WATCH_DIR} for new recordings...")
print(f"Output to: {OUTPUT_DIR}")
print("Press Ctrl+C to stop\n")
while True:
for file in WATCH_DIR.glob("*.mp3"):
if file.name not in processed:
try:
assemble_episode(file)
processed.add(file.name)
# Move original to archive
archive = WATCH_DIR / "processed"
archive.mkdir(exist_ok=True)
file.rename(archive / file.name)
except Exception as e:
print(f" Error: {e}")
time.sleep(5) # Check every 5 seconds
if __name__ == "__main__":
watch_and_process()
```
## Checklists & Templates
### PyDub Project Setup
```
## Requirements
□ Python 3.7+ installed
□ pip install pydub
□ FFmpeg installed and in PATH
□ Test: python -c "from pydub import AudioSegment; print('OK')"
## Project Structure
project/
├── scripts/
│ ├── normalize.py
│ ├── assemble.py
│ └── convert.py
├── assets/
│ ├── intro.mp3
│ └── outro.mp3
├── incoming/ # Raw recordings
├── processed/ # Final output
└── config.json # Settings
```
---
### Common Operations Reference
```python
## Quick Reference
# Load
audio = AudioSegment.from_file("file.mp3")
# Save
audio.export("out.mp3", format="mp3", bitrate="192k")
# Volume
louder = audio + 6 # +6 dB
quieter = audio - 3 # -3 dB
# Trim
first_30s = audio[:30000] # milliseconds
last_min = audio[-60000:]
# Concatenate
combined = audio1 + audio2 + audio3
# Fade
audio = audio.fade_in(500).fade_out(500)
# Crossfade
combined = audio1.append(audio2, crossfade=500)
# Normalize
from pydub.effects import normalize
normalized = normalize(audio)
# Silence
silence = AudioSegment.silent(duration=2000)
# Properties
print(len(audio)) # duration in ms
print(audio.dBFS) # volume level
```
## Skill Boundaries
### What This Skill Does Well
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches
### What This Skill Cannot Do
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success
## References
- PyDub. "GitHub Repository" (jiaaro/pydub) - Library documentation
- GeeksforGeeks. "Create an Audio Editor in Python Using PyDub"
- Real Python. "Working with Audio in Python"
- FFmpeg Documentation - Underlying encoder
## Related Skills
- [audio-editing](../audio-editing/) - Manual editing fundamentals
- [transcription-to-content](../transcription-to-content/) - Processing transcripts
- [podcast-production](../podcast-production/) - Full podcast workflow
---
## Skill Metadata (Internal Use)
```yaml
name: pydub-automation
category: audio
subcategory: automation
version: 1.0
author: MKTG Skills
source_expert: PyDub Library
source_work: jiaaro/pydub
difficulty: intermediate
estimated_value: Hours saved per batch (5-50 hours depending on scale)
tags: [python, automation, audio, batch-processing, pydub]
created: 2026-01-26
updated: 2026-01-26
```