Staff Software Engineer, Platform

Added
12 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog aws kubernetes eks observability

πŸ“‹ Description

  • Own and drive SRE strategy in observability, incidents, reliability, and platform ops
  • Serve as go-to consultant for infrastructure and reliability across teams
  • Lead architecture decisions for monitoring, alerting, and SLO frameworks; RFCs
  • Provide L2 on-call support for complex incidents; build incident response capability
  • Define multi-quarter SRE initiatives with cross-team dependencies
  • Define and maintain SLIs/SLOs for tier-1 flows: contributions, disbursements, reporting
  • Contribute to ActBlue's reliability roadmap; anticipate upstream decisions
  • Prefer automation over manual processes; reduce toil through tooling

🎯 Requirements

  • 8+ years in SRE, DevOps, or systems/infrastructure engineering
  • Deep expertise in observability tooling (Datadog)
  • Strong Kubernetes and cloud-native infra (AWS EKS)
  • Experience defining and operating SLIs and SLOs in production
  • Proven ability to lead cross-functional reliability initiatives
  • Strong incident management: on-call, post-mortems, blameless culture

🎁 Benefits

  • Flexible schedules and unlimited time off
  • Comprehensive health, dental, vision insurance for you and family
  • 401K with employer match
  • Paid medical, family and parental leave
  • Home-office setup allowance for remote employees
  • Snacks and digital subscriptions perks
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’