Added
7 days ago
Type
Full time
Salary
Salary not provided
Related skills
terraform grafana prometheus python kubernetes๐ Description
- Architect and maintain self-healing systems with 99.9%+ availability targets.
- Use AI/ML to automate infra governance and detect IaC anti-patterns.
- Implement adaptive SLIs/SLOs that evolve automatically from real-time data.
- Build AIOps-based observability and auto-remediation pipelines.
- Apply predictive modeling to forecast failures before they impact users.
- Lead chaos, performance, and resilience testing programs.
๐ฏ Requirements
- 10+ years in software/systems engineering, with 5+ years in SRE.
- Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform.
- Proficiency in Python or Go for automation and tooling.
- Observability stacks (Prometheus/Grafana/OpenTelemetry) and service meshes.
- Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations.
- Strong communication and influencing skills โ data over hierarchy.
๐ Benefits
- Access to cutting-edge technologies in a transformative environment.
- Professional growth and leadership development pathways.
- A chance to shape reliable and scalable systems with impact.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!