Related skills
datadog aws kubernetes eks observabilityπ Description
- Own and drive SRE strategy in observability, incidents, reliability, and platform ops
- Serve as go-to consultant for infrastructure and reliability across teams
- Lead architecture decisions for monitoring, alerting, and SLO frameworks; RFCs
- Provide L2 on-call support for complex incidents; build incident response capability
- Define multi-quarter SRE initiatives with cross-team dependencies
- Define and maintain SLIs/SLOs for tier-1 flows: contributions, disbursements, reporting
- Contribute to ActBlue's reliability roadmap; anticipate upstream decisions
- Prefer automation over manual processes; reduce toil through tooling
π― Requirements
- 8+ years in SRE, DevOps, or systems/infrastructure engineering
- Deep expertise in observability tooling (Datadog)
- Strong Kubernetes and cloud-native infra (AWS EKS)
- Experience defining and operating SLIs and SLOs in production
- Proven ability to lead cross-functional reliability initiatives
- Strong incident management: on-call, post-mortems, blameless culture
π Benefits
- Flexible schedules and unlimited time off
- Comprehensive health, dental, vision insurance for you and family
- 401K with employer match
- Paid medical, family and parental leave
- Home-office setup allowance for remote employees
- Snacks and digital subscriptions perks
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!