Related skills
datadog sre pagerduty aws prometheus๐ Description
- Incident Command: Lead Sev1/Sev2/Sev3 incidents; single point of accountability.
- Stakeholder Orchestration: Mobilize cross-functional teams using RACI.
- Real-Time Decision Making: Classify severity and coordinate remediation.
- Communication Leadership: Update internal execs; manage external comms.
- Program Management: Evolve Major Incident Management across product lines.
- Metrics & Analytics: Track MTTR and trends; present quarterly insights.
๐ฏ Requirements
- 7+ years in technical ops, SRE/DevOps, or incident management.
- 3+ years in program management or incident command leadership.
- Proven Sev1/Sev2 incident management in high-availability SaaS/AdTech.
- Experience coordinating cross-functional response teams during outages.
- Monitoring/observability tools: Nagios, Prometheus, Grafana, Datadog, PagerDuty.
- ITIL, SRE, SLO/SLI frameworks and incident best practices.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!