service-mesh-observability
$
npx mdskill add wshobson/agents/service-mesh-observabilityDeploy comprehensive observability for service meshes with tracing and metrics.
- Enables debugging latency issues and defining SLOs for service communication.
- Integrates with Istio, Linkerd, and standard mesh monitoring dashboards.
- Analyzes golden signals like request rate, error rate, and latency P50.
- Generates actionable reports on service dependencies and connectivity health.
SKILL.md
.github/skills/service-mesh-observabilityView on GitHub ↗
--- name: service-mesh-observability description: Implement comprehensive observability for service meshes including distributed tracing, metrics, and visualization. Use when setting up mesh monitoring, debugging latency issues, or implementing SLOs for service communication. --- # Service Mesh Observability Complete guide to observability patterns for Istio, Linkerd, and service mesh deployments. ## When to Use This Skill - Setting up distributed tracing across services - Implementing service mesh metrics and dashboards - Debugging latency and error issues - Defining SLOs for service communication - Visualizing service dependencies - Troubleshooting mesh connectivity ## Core Concepts ### 1. Three Pillars of Observability ``` ┌─────────────────────────────────────────────────────┐ │ Observability │ ├─────────────────┬─────────────────┬─────────────────┤ │ Metrics │ Traces │ Logs │ │ │ │ │ │ • Request rate │ • Span context │ • Access logs │ │ • Error rate │ • Latency │ • Error details │ │ • Latency P50 │ • Dependencies │ • Debug info │ │ • Saturation │ • Bottlenecks │ • Audit trail │ └─────────────────┴─────────────────┴─────────────────┘ ``` ### 2. Golden Signals for Mesh | Signal | Description | Alert Threshold | | -------------- | ------------------------- | ----------------- | | **Latency** | Request duration P50, P99 | P99 > 500ms | | **Traffic** | Requests per second | Anomaly detection | | **Errors** | 5xx error rate | > 1% | | **Saturation** | Resource utilization | > 80% | ## Templates and detailed worked examples Full template library and detailed worked examples live in `references/details.md`. Read that file when you need the concrete templates. ## Best Practices ### Do's - **Sample appropriately** - 100% in dev, 1-10% in prod - **Use trace context** - Propagate headers consistently - **Set up alerts** - For golden signals - **Correlate metrics/traces** - Use exemplars - **Retain strategically** - Hot/cold storage tiers ### Don'ts - **Don't over-sample** - Storage costs add up - **Don't ignore cardinality** - Limit label values - **Don't skip dashboards** - Visualize dependencies - **Don't forget costs** - Monitor observability costs