Related skills
datadog grafana kubernetes gcp splunk๐ Description
- Incident Commander for major incidents; coordinate cross-functional response and SLA targets.
- Own incident communications to leadership, CS, and partners; update status pages.
- Facilitate blameless PIRs; lead root-cause analysis and assign owners with deadlines.
- Analyze incident trends; create Problem tickets; report findings to product/engineering.
- Enforce incident framework: severity model, priority matrix, SLAs, escalations.
- Oversee and mentor the Operations Engineer on your shift; knowledge transfer.
๐ฏ Requirements
- 6+ years in incident management, SRE, NOC, or production ops.
- Proven incident management: multi-team coordination and exec comms.
- Excellent English communication; exec updates, PIRs, status reports.
- ITIL foundation; incident, problem, and change lifecycles.
- Observability depth: logs/traces/metrics in Datadog, Grafana, Splunk, NR.
- Hands-on tooling: Datadog, PagerDuty/OpsGenie, JIRA Service Management, Slack, Confluence.
๐ Benefits
- Gaming/payments/fintech experience; incident comms and status pages.
- JIRA Service Management admin; SLOs and burn-rate alerts.
- Datadog Service Catalog, scorecards, SLOs familiarity.
- Ops function building; Kubernetes, cloud infra (GCP), microservices.
- ITIL Foundation certification a plus.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Operations Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!