Related skills
datadog terraform aws grafana prometheus📋 Description
- Lead a 9-member global SRE team to ensure reliability and on-call readiness.
- Define reliability standards and SLO-based practices across services.
- Collaborate with DevOps, Security, Database, and Product Engineering to improve reliability and velocity.
- Own observability strategy; drive monitoring, alerting, and incident response.
- Drive automation of infrastructure deployment using Terraform, Kubernetes, and cloud-native tools.
- Ensure uptime, SLAs, and scalable production systems on AWS.
🎯 Requirements
- Bachelor's degree in Computer Science, Information Science, Engineering, or related field, or equivalent experience.
- 2+ years as a manager or team lead with direct reports.
- 5+ years in SRE, DevOps, Cloud Engineering, or similar roles.
- Experience with AWS and automation tools (Terraform, CloudFormation, Ansible) and Kubernetes.
- Strong programming skills for automation (Python, Go, or similar).
- Experience with on-call/incident management systems (PagerDuty, VictorOps, OpsGenie) and observability tools (Datadog, Prometheus, Grafana).
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!