Senior Site Reliability Engineer

Added
1 hour ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog terraform aws s3 eks

πŸ“‹ Description

  • Design and implement resilient infrastructure for high availability at scale
  • Build tools for deployment, monitoring, and recovery of systems
  • Drive incident response and reduce downtime to improve MTTR
  • Partner with engineering to bake reliability, resilience and observability into services
  • Automate infrastructure workflows using IaC and cloud-native tools
  • Guide engineers in reliability practices to raise the engineering bar

🎯 Requirements

  • Strong experience operating distributed systems in production on AWS (EKS, RDS, Route53, S3)
  • Strong programming and automation skills using Go or Python
  • Proficiency with infrastructure as code - Terraform / Pulumi
  • A passion for observability with hands-on metrics, logging, tracing using Datadog
  • Solid cross-functional communication with product, platform and security teams
  • An operational mindset that puts reliability and resilience as a core product requirement

🎁 Benefits

  • Full medical, dental, and vision insurance + OneMedical membership
  • Healthcare and Dependent Care FSA
  • 401(k) with company match
  • Flexible PTO
  • Wellbeing + Learning & Growth reimbursements
  • Paid parental leave + Fertility benefits
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’