Develop automation platform to manage infrastructure rollouts across cloud providers
Optimize telemetry platform to identify customer impacting events and data for debugging
Partner with engineering team to optimize performance of services for cloud architecture
Debug Live Site events and conduct follow-up postmortem and RCA analysis
Participate in an SLA-driven on-call rotation, including after-hours and rotating holidays

🎯 Requirements

7+ years SRE experience; B.S. in Computer Science or related field
Infrastructure automation experience; scripting (Python, Bash) required
Experience with Prometheus monitoring stack; Grafana, Mimir and Loki is a plus
Knowledge of Kubernetes and the container ecosystem
Experience with AWS, Azure or Google Cloud
On-call experience and incident response

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot