Related skills
datadog terraform grafana prometheus python📋 Description
- Lead reliability strategy for scalable, high-availability systems.
- Drive automation to reduce toil and boost productivity.
- Architect observability, monitoring, and alerting with chaos testing.
- Partner with engineering, product, and operations to embed SRE practices.
- Build and mentor a global, high-performing SRE team.
- Oversee capacity planning and future-state demand.
🎯 Requirements
- BS/MS/PhD in CS, Engineering, or related field.
- 7+ years in field; 3+ years leading SRE or similar teams.
- Kubernetes, Terraform, and CI/CD experience.
- Datadog, Prometheus, Grafana; incident.io and PagerDuty.
- Proficiency in Python, Go, or Java; ability to review code.
- Proven experience building highly available, scalable systems.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!