webex-voice-interface
$
npx mdskill add automateyournetwork/netclaw/webex-voice-interfaceRespond to WebEx voice clips with text and MP3 voice replies using edge-tts
- Solve the need to reply to voice messages in WebEx with audio responses
- Uses OpenClaw for transcription and edge-tts for text-to-speech conversion
- Processes transcribed text with NetClaw tools to generate accurate responses
- Delivers results by uploading MP3 files and posting text responses in WebEx threads
SKILL.md
.github/skills/webex-voice-interfaceView on GitHub ↗
---
name: webex-voice-interface
description: "Respond to WebEx voice clips with both text and an MP3 voice reply using edge-tts. Voice IN is already handled by OpenClaw transcription. Use when a user sends a voice message in WebEx, you need to reply with audio, or you want to generate a spoken MP3 response."
license: Apache-2.0
user-invocable: true
metadata:
{ "openclaw": { "requires": { "bins": ["python3"], "env": ["TTS_MCP_SCRIPT", "MCP_CALL"] } } }
---
# WebEx Voice Interface
## How It Works
```
User sends voice clip in WebEx space
|
v
OpenClaw transcribes automatically (built-in)
|
v
NetClaw processes with full skill set
(pyATS, NetBox, ServiceNow, all 43 MCP servers)
|
v
python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" text_to_speech -> MP3 file
|
v
Upload MP3 to WebEx thread + post text response
```
## Voice Response Workflow
### Step 1: Process the question
Treat the transcribed voice message identically to a typed text message.
Use the full NetClaw skill set -- pyATS, NetBox, ServiceNow, etc.
### Step 2: Generate voice response
After composing your text response, call `text_to_speech`:
```bash
python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" text_to_speech '{"text":"R1 has 3 OSPF neighbors, all in FULL state on Area 0...","voice":"en-US-GuyNeural"}'
```
This returns JSON with an `output_path` to the generated MP3 file.
To list available voices:
```bash
python3 $MCP_CALL "python3 -u $TTS_MCP_SCRIPT" list_voices '{"language":"en"}'
```
### Step 3: Deliver both text and voice
Post the text response in the WebEx space/thread AND upload the MP3 file as an attachment via the Messages API (multipart/form-data with `files` parameter):
> **Voice Response**
> [MP3 audio file attached]
>
> R1 has 3 OSPF neighbors, all in FULL state on Area 0:
> - 2.2.2.2 (R2) via Gi1 -- FULL/DR
> - 3.3.3.3 (R3) via Gi2 -- FULL/BDR
**Always deliver text AND voice.** Text is primary (searchable, accessible).
Voice is supplementary.
## Voice Selection
| Voice | Description |
|-------|-------------|
| en-US-GuyNeural | Professional male -- **default** |
| en-US-JennyNeural | Professional female |
| en-US-AriaNeural | Conversational female |
| en-GB-RyanNeural | British male |
Users can request a voice change:
- "Switch to a female voice" -> use en-US-JennyNeural
- "Use a British accent" -> use en-GB-RyanNeural
Call `list_voices` to see all 300+ available voices.
## Performance
| Phase | Latency |
|-------|---------|
| edge-tts synthesis | 1-2 seconds |
| WebEx MP3 upload | < 1 second |
Voice synthesis adds minimal overhead to the response time.
## Fallback
If TTS fails, deliver the text response immediately. Do not block on voice.
## Tips for Voice Responses
- **Keep it concise** -- under 100 words works best for spoken delivery
- **Avoid tables** -- describe data conversationally for voice
- **Spell out abbreviations** -- say "OSPF" not "O-S-P-F" (edge-tts handles this)
- **Use natural phrasing** -- the text will be read aloud, so write for the ear
## WebEx File Upload for Voice
WebEx Messages API supports file attachments via multipart upload:
```
POST https://webexapis.com/v1/messages
Content-Type: multipart/form-data
- roomId: <space-id>
- parentId: <thread-parent-message-id> (if threading)
- text: "Voice Response: R1 has 3 OSPF neighbors..."
- files: @/tmp/netclaw-tts/response.mp3
```
Files up to 100 MB are supported. MP3 voice responses are typically under 1 MB.
## GAIT Integration
Record voice interactions in the GAIT audit trail:
```
Input: Voice clip from @user (transcript: "What are your interfaces?")
Action: Queried R1 interfaces via pyATS
Output: 4 interfaces found -- text + voice response delivered to WebEx
```