Lead Site Reliability Engineer

Added
less than a minute ago
Type
Full time
Salary
Salary not provided

Related skills

terraform grafana prometheus kubernetes elasticsearch

📋 Description

  • Hands-on Cloud Platform Engineering role to ensure observable, reliable, scalable cloud platforms.
  • SRE team member; production gatekeeper; manage backlog and reliability improvements.
  • Lead investigations into outages, performance, and cost issues.
  • Drive automation of low-value tasks while balancing project delivery.
  • Collaborate with DevOps/engineering to establish SLOs, SLAs, and error budgets.
  • Develop/maintain monitoring dashboards/alerts (Grafana, Azure Monitor, Prometheus).

🎯 Requirements

  • 6+ years of Site Reliability Engineering experience
  • Excellent technical, analytical and troubleshooting skills
  • Experience with Azure cloud
  • Programming/scripting in Python, PowerShell, or C#
  • Infrastructure as code and version control (ARM, BICEP, Git)
  • Strong experience with monitoring/observability stacks (Azure Monitor, Prometheus, Grafana, Elasticsearch)
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →