Related skills
docker terraform aws grafana prometheus๐ Description
- Automate ops and infra tools with Python or Go to reduce toil
- Design scalable, fault-tolerant infra on AWS/GCP/Azure
- Own reliability, performance, and SLOs for core services
- Own observability stack for monitoring, logging, alerting
- Lead incident response, post-mortems, and root-cause analyses
- Collaborate with product/engineering on reliability and scale
๐ฏ Requirements
- 7+ years in SRE/DevOps or similar, building/operating large-scale, highly available systems
- Deep expertise with AWS, Docker, Kubernetes, and Terraform
- Strong proficiency in Python, Java, or Go
- Knowledge of Prometheus, Grafana, ELK Stack
- Challenge status quo, identify weaknesses, and propose innovative reliability solutions
- Excellent communication and collaboration; ability to connect with cross-functional teams
๐ Benefits
- Generous PTO, plus company holidays
- Medical and dental insurance
- Paid parental leave for all parents (12 weeks)
- Fertility and family planning support
- Early-detection cancer testing through Galleri
- Pension scheme and company contribution
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!