Related skills
docker terraform aws grafana prometheusπ Description
- Automate tasks and infra with Python/Go to reduce toil
- Design scalable fault-tolerant infra on AWS/GCP/Azure
- Own reliability, performance, and SLOs for core services
- Own observability stack for monitoring, logging, alerts
- Lead incident response, post-mortems, root cause analyses
- Collaborate with product/engineering on reliable system design
π― Requirements
- 7+ years in SRE/DevOps for high-availability systems
- Cloud expertise (AWS preferred), Docker/Kubernetes, Terraform
- Python/Java/Go for automation and monitoring
- Prometheus, Grafana, ELK Stack for monitoring/logging
- Challenge the status quo; propose reliability improvements
- Ownership and accountability for mission-critical systems
π Benefits
- Generous PTO plus company holidays
- Medical, dental, and vision coverage for you and family
- Paid parental leave (12 weeks)
- Fertility and family planning support
- HSA with company contribution
- 401k with company stock options
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!