Related skills
datadog terraform aws s3 eksπ Description
- Design and implement resilient infrastructure for high availability at scale
- Build tools for deployment, monitoring, and recovery of systems
- Drive incident response and reduce downtime to improve MTTR
- Partner with engineering to bake reliability, resilience and observability into services
- Automate infrastructure workflows using IaC and cloud-native tools
- Guide engineers in reliability practices to raise the engineering bar
π― Requirements
- Strong experience operating distributed systems in production on AWS (EKS, RDS, Route53, S3)
- Strong programming and automation skills using Go or Python
- Proficiency with infrastructure as code - Terraform / Pulumi
- A passion for observability with hands-on metrics, logging, tracing using Datadog
- Solid cross-functional communication with product, platform and security teams
- An operational mindset that puts reliability and resilience as a core product requirement
π Benefits
- Full medical, dental, and vision insurance + OneMedical membership
- Healthcare and Dependent Care FSA
- 401(k) with company match
- Flexible PTO
- Wellbeing + Learning & Growth reimbursements
- Paid parental leave + Fertility benefits
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!