Site Reliability Engineer

Added
less than a minute ago
Type
Full time
Salary
Salary not provided

Related skills

bash grafana prometheus python kubernetes

๐Ÿ“‹ Description

  • Develop automation to manage infrastructure rollouts across clouds
  • Improve telemetry to identify customer-impacting events
  • Partner with engineering to optimize cloud service performance
  • Debug live site events and perform RCA/postmortems
  • Participate in an SLA-driven on-call rotation (after-hours, weekends)

๐ŸŽฏ Requirements

  • 5 years of experience as a Site Reliability Engineer
  • Infrastructure automation experience; Python, Bash scripting a plus
  • Experience with the Prometheus monitoring stack; Grafana, Mimir and Loki a plus
  • Knowledge of Kubernetes and the container ecosystem
  • Familiar with AWS, Azure, or Google Cloud
  • Experience debugging, diagnosing and troubleshooting complex production software
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’