enterprise-agent-ops
$
npx mdskill add affaan-m/ECC/enterprise-agent-opsOperate long-lived agent systems with lifecycle, security, and observability controls
- Manage runtime lifecycle and safety for continuously running agent workloads
- Integrates with PM2, systemd, container orchestrators, and CI/CD systems
- Enforces least-privilege access and tracks metrics for failure analysis
- Provides audit logs, rollback capabilities, and gradual recovery from incidents
SKILL.md
.github/skills/enterprise-agent-opsView on GitHub ↗
--- name: enterprise-agent-ops description: Operate long-lived agent workloads with observability, security boundaries, and lifecycle management. origin: ECC --- # Enterprise Agent Ops Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions. ## Operational Domains 1. runtime lifecycle (start, pause, stop, restart) 2. observability (logs, metrics, traces) 3. safety controls (scopes, permissions, kill switches) 4. change management (rollout, rollback, audit) ## Baseline Controls - immutable deployment artifacts - least-privilege credentials - environment-level secret injection - hard timeout and retry budgets - audit log for high-risk actions ## Metrics to Track - success rate - mean retries per task - time to recovery - cost per successful task - failure class distribution ## Incident Pattern When failure spikes: 1. freeze new rollout 2. capture representative traces 3. isolate failing route 4. patch with smallest safe change 5. run regression + security checks 6. resume gradually ## Deployment Integrations This skill pairs with: - PM2 workflows - systemd services - container orchestrators - CI/CD gates