Sr Platform Monitoring Engineer

Added
4 days ago
Type
Full time
Salary
Salary not provided

Related skills

docker pagerduty aws grafana prometheus

πŸ“‹ Description

  • Lead platform incident investigations across teams to minimize customer impact.
  • Design observability solutions and alerting to improve detection coverage.
  • Build automation tools and reusable monitoring patterns to improve reliability.
  • Serve as first responder for Databricks Platform incidents.
  • Own incident lifecycle from detection to postmortem.
  • Collaborate on cross-functional investigations with cloud providers.

🎯 Requirements

  • 5+ years in SRE, DevOps, or production engineering.
  • Cloud experience with AWS/Azure/GCP; Docker and Kubernetes.
  • Monitoring/logging/alerting with ELK, Prometheus, Grafana, PagerDuty.
  • Strong Python for production automation.
  • Experience owning incident lifecycle in prod environments.
  • BS/MS/PhD in CS/CE or related Engineering field.

🎁 Benefits

  • Hybrid work options; Amsterdam office.
  • Comprehensive benefits; region-specific details available.
  • Diversity and inclusion commitment.
  • Benefits portal with details.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’