Related skills
slack datadog pagerduty kubernetes confluence๐ Description
- Incident Commander for major incidents; coordinate cross-functional response teams.
- Own incident communications with leadership, CS, and partners; update status page.
- Lead blameless post-incident reviews; identify root causes and assign owners/actions.
- Analyze incident trends and production bugs; report findings to product/engineering.
- Enforce incident management framework: severity, SLA targets, escalations, readiness gates.
- Oversee and mentor the on-shift Operations Engineer; triage, runbooks, documentation quality.
๐ฏ Requirements
- 6+ years in incident management, SRE, NOC leadership, or tech ops in prod env.
- Proven incident mgmt: coordinate multi-team response and exec stakeholders.
- ITIL Foundation; knowledge of incident, problem, and change management.
- Observability depth: logs, traces, metrics; Datadog or Grafana; SLOs.
- Hands-on tools: Datadog, PagerDuty/OpsGenie, Jira Service Management, Slack, Confluence.
- Willingness for 24x7 shift-based ops and weekend on-call.
๐ Benefits
- Convenient work tools.
- Latest Mac workplaces and hardware.
- Google tools: Chat, Gmail, Drive, Confluence, Jira, GitLab.
- Professional growth: trainings and conferences.
- Health insurance for employees and dependents.
- Flexible hours and no dress code.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Operations Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!