Manager, Site Reliability Engineering

Added
1 hour ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog azure terraform github actions terragrunt

πŸ“‹ Description

  • Lead and grow the SRE team across hiring, performance, and on-call health.
  • Set cadence for standups, incident reviews, SLOs, post-incident learning, capacity planning.
  • Build blameless learning culture, technical depth, customer empathy, ownership.
  • Partner with DevSecOps, Security Eng, DRE, Incident & Change Mgmt, and product leadership to remove cross-team friction.
  • Own reliability strategy; define SLOs/SLIs and embed them in the release process.

🎯 Requirements

  • 10+ years in SRE/DevOps/Platform Eng roles.
  • 4+ years direct people management, including remote on-call teams.
  • Ownership of reliability for customer-facing SaaS; define SLOs/SLIs and drive prioritization.
  • Azure experience: 3+ years on AKS, networking, identity; AWS/GCP equivalent.
  • Observability fluency with Datadog or equivalents across metrics, logs, traces, RUM, and Synthetics.
  • AI in operations β€” hands-on experience integrating AI/LLM tooling into incident workflows.

🎁 Benefits

  • Free premium medical, dental, life and vision insurance.
  • Generous 401(k) match.
  • Company-sponsored virtual events, happy hours and team-building; birthday treat.
  • Unlimited PTO β€” we believe in time off.
  • Virtual yoga, meditation or boot camp classes offered daily.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’