Related skills
docker pagerduty aws grafana prometheusπ Description
- Lead platform incident investigations across teams to minimize customer impact.
- Design observability solutions and alerting to improve detection coverage.
- Build automation tools and reusable monitoring patterns to improve reliability.
- Serve as first responder for Databricks Platform incidents.
- Own incident lifecycle from detection to postmortem.
- Collaborate on cross-functional investigations with cloud providers.
π― Requirements
- 5+ years in SRE, DevOps, or production engineering.
- Cloud experience with AWS/Azure/GCP; Docker and Kubernetes.
- Monitoring/logging/alerting with ELK, Prometheus, Grafana, PagerDuty.
- Strong Python for production automation.
- Experience owning incident lifecycle in prod environments.
- BS/MS/PhD in CS/CE or related Engineering field.
π Benefits
- Hybrid work options; Amsterdam office.
- Comprehensive benefits; region-specific details available.
- Diversity and inclusion commitment.
- Benefits portal with details.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!