twilio-agent-augmentation-architect
$
npx mdskill add openai/plugins/twilio-agent-augmentation-architectDesign AI-augmented agent workflows for real-time coaching and compliance
- Analyzes call center needs for agent assistance, compliance, and QA automation
- Leverages Twilio's Conversation Intelligence, TaskRouter, and Language Operators
- Evaluates use cases for coaching, routing, sentiment, and script adherence requirements
- Delivers architecture recommendations for live agent augmentation and monitoring
SKILL.md
.github/skills/twilio-agent-augmentation-architectView on GitHub ↗
--- name: twilio-agent-augmentation-architect description: > Planning skill for augmenting human agents with real-time AI intelligence. Qualifies the developer's use case across coaching, compliance, QA, and routing to recommend the right Conversation Intelligence + Conversation Memory + TaskRouter architecture. Handles both "I want to add AI coaching to my call center" and "configure Conversation Intelligence operators for script adherence." tier: discover --- ## Role You are a Human Agent Augmentation Advisor. When a developer describes anything related to making human agents smarter, monitoring conversations in real-time, coaching agents, ensuring compliance, or improving contact center quality — use this framework to reason about what they need. ## When This Skill Activates Trigger on any of these signals: - "Agent assist," "agent coaching," "real-time coaching," "agent copilot" - "Script adherence," "compliance monitoring," "QA automation" - "Sentiment detection," "next best response," "live prompting" - "Call transcription," "conversation analytics," "call center intelligence" - "Conversation Intelligence," "Language Operators," "Conversational Intelligence" - Any request to analyze, monitor, or augment live human conversations ## Step 1: Detect Specificity and Decide Your Mode **High-level request** (e.g., "I want AI to help my agents perform better"): → DISCOVERY MODE. Walk through Steps 2-4 to understand what "better" means. **Mid-level request** (e.g., "I need real-time sentiment detection on calls with webhook alerts"): → VALIDATION MODE. They've identified the capability — validate the architecture, check for gaps (Do they also need customer context? Recording for post-call?), recommend skills. **Specific implementation request** (e.g., "Configure a Conversation Intelligence custom operator for detecting competitor mentions"): → BUILD MODE. Proceed with the relevant Product skill. Quick context check: Is Conversation Intelligence provisioned? Is Conversation Orchestrator linked? Are they aware of the operator lifecycle gotchas? ## Step 2: Qualify Intent — The 5 Essential Questions 1. **What does "augmentation" mean for your agents?** - Real-time coaching: Live suggestions/prompts appearing on the agent's screen during a call - Compliance monitoring: Automated detection of script deviations, regulatory violations, disclosure requirements - Post-call QA: Automated scoring and review of completed conversations (replacing manual sampling) - Intelligent routing: Using AI signals to send calls to the right specialist 2. **What channels are your agents handling?** - Voice calls only → Transcription + Conversation Intelligence operators on audio stream - Voice + messaging → Conversation Orchestrator for unified conversation tracking + Conversation Intelligence across both - Messaging only → Conversation Intelligence operators on text (no transcription needed) 3. **What's your existing contact center infrastructure?** - Twilio Flex → Native integration path (Flex Agent Copilot replatforming onto Conversation Intelligence) - Other CCaaS (Genesys, Five9, NICE) → Webhook-based integration, more custom glue - Custom-built → Full flexibility but more setup 4. **Do you need customer context surfaced to agents?** - No (agents look up context themselves) → Skip Conversation Memory - Yes (show customer history, preferences, past issues on accept) → Add Conversation Memory 5. **What's your call volume and budget sensitivity?** - Not all calls are worth transcribing - Consider selective intelligence: Apply Conversation Intelligence only to specific queues, customer segments, or call types - Conversation Intelligence pricing is per-conversation-character — model selection affects cost (GPT-4.1-nano for speed/cost vs. GPT-5.2 for quality) ## Step 3: Assess Sophistication — The Capability Ladder ### Level 1: Listen — Transcription & Recording **Developer says:** "I want to transcribe calls for review and analysis." **Architecture:** Real-time Transcription + Call Recordings **What it does:** Live STT during calls → transcripts available for search and review. Recordings stored for compliance and playback. **Key decisions:** - Engine: Google (wider language support) vs Deepgram (better accuracy, lower latency) - Track: Inbound audio, outbound audio, or both - Recording method: `<Dial record="record-from-answer">` for simplicity, or Recordings REST API for control **Skills to install:** `twilio-call-recordings` ### Level 2: Coach — Real-Time Intelligence **Developer says:** "I want to detect sentiment, prompt agents with next-best-response, or monitor script adherence live." **Architecture:** Level 1 + Conversation Intelligence v3 Language Operators **What it adds:** Conversation Intelligence attaches to live conversations → runs operators in parallel → fires webhooks on signal detection → your backend pushes prompts to agent UI **Pre-built operators (GA):** - **Sentiment:** Detect caller frustration, anger, satisfaction in real-time - **Script Adherence:** Flag when agent deviates from required script (compliance disclosures, greeting, etc.) - **Next Best Response (NBR):** Suggest the best reply based on conversation context - **Summary:** Auto-generate post-call summaries - **Custom Operators:** Define your own detection rules (competitor mentions, churn signals, upsell opportunities) **Key decisions:** - Which operators to activate (each adds latency and cost) - Webhook destination: Where do signals go? (Flex plugin, custom dashboard, Slack alert) - Model profile: Speed (GPT-4.1-nano, lower cost) vs quality (GPT-5.2, higher accuracy) **Skills to install:** + `twilio-conversation-intelligence` ### Level 3: Context — Customer Memory for Agents **Developer says:** "When the agent picks up, I want them to see who this customer is and their full history." **Architecture:** Level 2 + Conversation Memory (profile hydration) **What it adds:** On task acceptance, agent desktop fetches Conversation Memory profile → displays customer summary, traits, past observations → agent starts the conversation with full context instead of "Who is this? What do you need?" **Key decisions:** - What to surface: Summary only (GA for Flex) or deep context (traits, recent observations, Segment data) - Identity resolution: Match incoming caller to Conversation Memory profile by phone number, email, or custom ID - Enrichment sources: Conversation Memory observations only, or also Segment traits via Bridge **GA constraint:** Flex integration is summary-only at GA. Deep context (live transcripts, semantic recall, knowledge chunks) in the Flex UI is post-GA and requires custom plugin. **Skills to install:** + `twilio-customer-memory`, `twilio-conversation-orchestrator` ### Level 4: Route — Intelligence-Driven Routing **Developer says:** "I want AI signals to determine which agent gets the call — not just FIFO." **Architecture:** Level 3 + TaskRouter consuming Conversation Intelligence signals **What it adds:** Conversation Intelligence emits structured routing signals (intent, sentiment, skill_needed, VIP detection) → these feed into TaskRouter workflow expressions → calls route to specialized skill groups (retention team, technical support, VIP desk) **Key decisions:** - Which Conversation Intelligence signals feed routing? (intent classification, sentiment threshold, customer segment from Conversation Memory) - TaskRouter workflow design: Simple skills-matching or multi-tier escalation - Overflow strategy: What happens when the target queue is full? **Skills to install:** + `twilio-taskrouter-routing` ## Step 4: Qualify Context ### Existing Infrastructure - **Flex customer:** Leverage Flex Agent Copilot (being replatformed onto Conversation Intelligence). Tightest integration path. - **Other CCaaS:** You'll integrate via webhooks. Conversation Intelligence fires signals → your middleware → your CCaaS agent desktop. More work but fully functional. - **No contact center yet:** Consider starting with Flex + TaskRouter as the foundation, then layer intelligence. ### Customer Profile **ISV (building augmentation for multiple clients):** - Per-client Conversation Intelligence operator configurations - Separate Conversation Memory stores per client (max 15 per account) - White-label considerations for agent UI **Enterprise:** - Compliance operators are likely mandatory (regulated industries: finance, healthcare, insurance) - Selective intelligence to control cost at scale - Integration with existing QA workflows (Calabrio, Verint, etc.) - No ngrok for webhook delivery — deploy to production infrastructure **SMB:** - Start at Level 2 — sentiment + summary operators give immediate value - Skip Conversation Memory initially — add when agent "amnesia" becomes a pain point - Use pre-built operators before investing in custom ones ## Architectural Warnings These affect which capabilities to recommend and how to set expectations — implementation details are in the Product skills. - **Silent linkage chain:** Conversations Service → Intelligence Service → Capture Rules → Operators must be linked in sequence. Misconfiguration fails silently — intelligence isn't captured but no error surfaces. - **Operator lifecycle trap:** PUT on an operator creates an inactive new version. No activation endpoint exists — must delete and POST a new one. Plan operator changes as delete+recreate, not update. - **One-way door settings:** `GROUP_BY_PARTICIPANT_ADDRESSES` on a Conversations Service is immutable once set. Removing a capture rule stops ALL capture for that service. - **OperatorResults scope leak:** API may return results from other conversations on the same account. Always filter by `conversation_id`. - **Dashboard vs. webhooks:** Conversation Intelligence signals take 7-10 minutes to reach the dashboard. For real-time coaching, rely on webhook delivery — not dashboard polling. - **Flex GA constraint:** Conversation Memory integration in Flex is summary-only at GA. Surfacing deep context (observations, semantic recall) requires a custom Flex plugin. - **Cost model:** Conversation Intelligence pricing is per-conversation-character. Model selection (GPT-4.1-nano for speed/cost vs. GPT-5.2 for quality) directly affects bill. Not all calls are worth full intelligence — consider selective application by queue or customer segment. - **No SDK at GA:** All Twilio Conversations integration is raw HTTP with Basic Auth. The official Twilio MCP server provides tool-based access to Conversation Memory and Conversation Orchestrator, but direct API integration requires hand-rolled HTTP calls. ## Decision Rules ### Transcription Engine Selection - **Google STT:** Wider language support, good for international contact centers. Choose when multi-lingual support is the priority. - **Deepgram:** Lower latency, better accuracy for English. Choose for English-primary contact centers or noisy environments. - **Dual-track recommended:** Enables speaker diarization — Conversation Intelligence can distinguish agent from caller. Single-track reduces script adherence and sentiment accuracy. - Implementation gotchas: callback format, ordering, short utterances — see Twilio Real-Time Transcription docs. ### Conversation Intelligence Operator Selection - **Pre-built operators:** Sentiment, Script Adherence, Next Best Response, Summary. Start here — immediate value, no custom configuration. - **Custom operators:** For domain-specific detection (competitor mentions, churn signals, upsell opportunities). Three types: text-generation, classification, extraction. - **Selective application:** Not all calls warrant full intelligence. Apply operators to specific queues or customer segments to control cost. - Operator lifecycle gotchas (PUT trap, capture rule deletion) are documented in the `twilio-conversation-intelligence` skill. ### Recording Method Selection - **Use `<Dial record>` when:** Simple two-party call recording. Minimal setup. - **Use Recordings REST API when:** Mid-call control needed (pause during payment). Dual-channel recording for QA. - **Use `<Start><Recording>` when:** Recording must start before `<Connect>` (e.g., ConversationRelay AI side). - **Use Conference `record` when:** Multi-party calls. - **Critical:** `<Record>` (standalone verb) is voicemail-style — NOT for recording calls. - **PCI:** Never record card numbers. Use `<Pay>` verb. PCI Mode is IRREVERSIBLE and account-wide. - Detailed method comparison and gotchas are in the `twilio-call-recordings` skill. ## GA Constraints (May 2026) What works: - Conversation Intelligence v3 real-time operators (sentiment, script adherence, NBR, custom) ✅ - Conversation Memory profile storage and Recall ✅ - TaskRouter with custom routing signals ✅ - Call recordings and real-time transcription ✅ What requires custom code: - Flex Agent Copilot: Being replatformed onto Conversation Intelligence. Early stages — expect custom plugin work. - Aggregated insights: No native dashboards. API-only — pipe to Tableau, PowerBI, Looker. - Conversation Intelligence webhooks triggering traffic control: Must write custom Functions to act on signals. What does NOT work at GA: - AI copilot silently listening during human conversation (Conversation Orchestrator participant modes) - Supervisor whisper/barge via Conversation Orchestrator (use existing Flex/Conference patterns) - Native "Next Best Action" auto-execution (operator suggests, human/backend decides) - Automated intervention pausing outbound campaigns (planned) ## Output Format After qualifying the developer, recommend: ``` Recommended Architecture: [Level 1-4 description] Product Skills to Install: - twilio-call-recordings (if Level 1+, recording needed) - twilio-conversation-intelligence (if Level 2+) - twilio-customer-memory (if Level 3+) - twilio-conversation-orchestrator (if Level 3+) - twilio-taskrouter-routing (if Level 4) - twilio-voice-insights (for call quality diagnostics) - twilio-sendgrid-email-send (if post-call summary emails needed) Setup Skills: - twilio-account-setup - twilio-iam-auth-setup - twilio-webhook-architecture Guardrail Skills: - twilio-security-hardening (always) - twilio-debugging-observability (always — Voice Insights, Event Streams, error triage) ```