Related skills
bash grafana prometheus python kubernetes๐ Description
- Develop automation to manage infrastructure rollouts across clouds
- Improve telemetry to identify customer-impacting events
- Partner with engineering to optimize cloud service performance
- Debug live site events and perform RCA/postmortems
- Participate in an SLA-driven on-call rotation (after-hours, weekends)
๐ฏ Requirements
- 5 years of experience as a Site Reliability Engineer
- Infrastructure automation experience; Python, Bash scripting a plus
- Experience with the Prometheus monitoring stack; Grafana, Mimir and Loki a plus
- Knowledge of Kubernetes and the container ecosystem
- Familiar with AWS, Azure, or Google Cloud
- Experience debugging, diagnosing and troubleshooting complex production software
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!