Added
19 days ago
Type
Full time
Salary
Salary not provided

Related skills

datadog sre pagerduty aws prometheus

๐Ÿ“‹ Description

  • Incident Command: Lead Sev1/Sev2/Sev3 incidents; single point of accountability.
  • Stakeholder Orchestration: Mobilize cross-functional teams using RACI.
  • Real-Time Decision Making: Classify severity and coordinate remediation.
  • Communication Leadership: Update internal execs; manage external comms.
  • Program Management: Evolve Major Incident Management across product lines.
  • Metrics & Analytics: Track MTTR and trends; present quarterly insights.

๐ŸŽฏ Requirements

  • 7+ years in technical ops, SRE/DevOps, or incident management.
  • 3+ years in program management or incident command leadership.
  • Proven Sev1/Sev2 incident management in high-availability SaaS/AdTech.
  • Experience coordinating cross-functional response teams during outages.
  • Monitoring/observability tools: Nagios, Prometheus, Grafana, Datadog, PagerDuty.
  • ITIL, SRE, SLO/SLI frameworks and incident best practices.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’