Added
27 minutes ago
Type
Full time
Salary
Salary not provided

Related skills

datadog grafana prometheus splunk elk stack

📋 Description

  • On-call monitoring in a follow-the-sun model; EMEA shifts.
  • Incident management: coordinate mitigation and recovery across teams.
  • Communicate with merchants during incidents; provide updates.
  • Improve monitoring strategy by leading initiatives; automate where possible.
  • Investigate alerts; provide feedback to engineering on logging/alerts.
  • Problem management: analyze incident trends; drive long-term fixes.

🎯 Requirements

  • 5+ years in incident management, problem management, and platform monitoring.
  • Experience with problem management: trends, RCA, preventative action.
  • Strong communication; translate tech topics to diverse audiences via dashboards.
  • Willing to participate in on-call rotation in a fast-paced environment.
  • Experience with monitoring/logging tools: Prometheus, Grafana, ELK.
  • Experience with observability platforms: Datadog, Dynatrace, Splunk.
  • Analytical/problem-solving skills; ability to analyze complex systems.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →