Manager, Site Reliability Engineering

Added
5 hours ago
Type
Full time
Salary
Salary not provided

Related skills

datadog terraform aws grafana prometheus

📋 Description

  • Lead a 9-member global SRE team to ensure reliability and on-call readiness.
  • Define reliability standards and SLO-based practices across services.
  • Collaborate with DevOps, Security, Database, and Product Engineering to improve reliability and velocity.
  • Own observability strategy; drive monitoring, alerting, and incident response.
  • Drive automation of infrastructure deployment using Terraform, Kubernetes, and cloud-native tools.
  • Ensure uptime, SLAs, and scalable production systems on AWS.

🎯 Requirements

  • Bachelor's degree in Computer Science, Information Science, Engineering, or related field, or equivalent experience.
  • 2+ years as a manager or team lead with direct reports.
  • 5+ years in SRE, DevOps, Cloud Engineering, or similar roles.
  • Experience with AWS and automation tools (Terraform, CloudFormation, Ansible) and Kubernetes.
  • Strong programming skills for automation (Python, Go, or similar).
  • Experience with on-call/incident management systems (PagerDuty, VictorOps, OpsGenie) and observability tools (Datadog, Prometheus, Grafana).
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →