Director, Site Reliability Engineering

Added
1 minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog terraform grafana prometheus python

📋 Description

  • Lead reliability strategy for scalable, high-availability systems.
  • Drive automation to reduce toil and boost productivity.
  • Architect observability, monitoring, and alerting with chaos testing.
  • Partner with engineering, product, and operations to embed SRE practices.
  • Build and mentor a global, high-performing SRE team.
  • Oversee capacity planning and future-state demand.

🎯 Requirements

  • BS/MS/PhD in CS, Engineering, or related field.
  • 7+ years in field; 3+ years leading SRE or similar teams.
  • Kubernetes, Terraform, and CI/CD experience.
  • Datadog, Prometheus, Grafana; incident.io and PagerDuty.
  • Proficiency in Python, Go, or Java; ability to review code.
  • Proven experience building highly available, scalable systems.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →