Added
21 days ago
Location
Type
Full time
Salary
Salary not provided
Related skills
datadog terraform pagerduty grafana jira๐ Description
- Own and refine incident management; act as incident commander and coordinate post-incident follow-ups.
- Manage central on-call and integrations used by 100+ teams; automate with self-serve tools like Terraform.
- Analyze MTTR, incident frequency, and SLA data to identify trends and guide reliability initiatives.
- Improve change management processes and automation to reduce risk and friction.
- Collaborate with engineering to standardize ops and build automated workflows.
- Leverage AI for incident analysis, alerting, and automation.
๐ฏ Requirements
- Experience managing and participating in a 24/7 on-call rotation and incident response.
- Experience with on-call systems (Rootly, PagerDuty, Opsgenie, etc.).
- Experience with monitoring/observability tools (Datadog, NewRelic, Grafana, etc.).
- Strong communication; manage incidents and actions with engineers to directors and public messaging.
- Experience with ITSM tools (Jira/JSM) for tickets and change management.
- 3+ years incident response experience.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to DevOps Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!