Related skills
datadog azure terraform github actions terragruntπ Description
- Lead and grow the SRE team across hiring, performance, and on-call health.
- Set cadence for standups, incident reviews, SLOs, post-incident learning, capacity planning.
- Build blameless learning culture, technical depth, customer empathy, ownership.
- Partner with DevSecOps, Security Eng, DRE, Incident & Change Mgmt, and product leadership to remove cross-team friction.
- Own reliability strategy; define SLOs/SLIs and embed them in the release process.
π― Requirements
- 10+ years in SRE/DevOps/Platform Eng roles.
- 4+ years direct people management, including remote on-call teams.
- Ownership of reliability for customer-facing SaaS; define SLOs/SLIs and drive prioritization.
- Azure experience: 3+ years on AKS, networking, identity; AWS/GCP equivalent.
- Observability fluency with Datadog or equivalents across metrics, logs, traces, RUM, and Synthetics.
- AI in operations β hands-on experience integrating AI/LLM tooling into incident workflows.
π Benefits
- Free premium medical, dental, life and vision insurance.
- Generous 401(k) match.
- Company-sponsored virtual events, happy hours and team-building; birthday treat.
- Unlimited PTO β we believe in time off.
- Virtual yoga, meditation or boot camp classes offered daily.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!