Related skills
datadog grafana prometheus splunk elk stack📋 Description
- On-call monitoring in a follow-the-sun model; EMEA shifts.
- Incident management: coordinate mitigation and recovery across teams.
- Communicate with merchants during incidents; provide updates.
- Improve monitoring strategy by leading initiatives; automate where possible.
- Investigate alerts; provide feedback to engineering on logging/alerts.
- Problem management: analyze incident trends; drive long-term fixes.
🎯 Requirements
- 5+ years in incident management, problem management, and platform monitoring.
- Experience with problem management: trends, RCA, preventative action.
- Strong communication; translate tech topics to diverse audiences via dashboards.
- Willing to participate in on-call rotation in a fast-paced environment.
- Experience with monitoring/logging tools: Prometheus, Grafana, ELK.
- Experience with observability platforms: Datadog, Dynatrace, Splunk.
- Analytical/problem-solving skills; ability to analyze complex systems.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!