Related skills
terraform aws grafana python kubernetes๐ Description
- Design, implement, maintain, and monitor reliable production systems at scale.
- Lead incident response, mitigate production issues, and conduct post mortem analysis.
- Proactively monitor performance, analyze failures, identify bottlenecks, and propose solutions.
- Create and support observability/monitoring tools and vendor integrations.
- Promote cross-functional collaboration to improve reliability, scalability, resilience, and security.
- Train and mentor other engineers.
๐ฏ Requirements
- 5+ years of reliability-focused engineering in fast-paced, enterprise environments.
- Cloud: AWS, Azure, and/or GCP.
- Infrastructure as code: Terraform or Crossplane.
- Developing applications in Python, Ruby, or Go.
- Deploying and operating Kubernetes at scale.
- Experience with monitoring tools like Grafana, New Relic, or Datadog.
๐ Benefits
- Company-subsidized medical, dental, and vision plans.
- 401(k) plan with company match.
- Annual bonus.
- Flexible PTO (2 weeks strongly encouraged).
- 16-week paid parental leave and disability benefits.
- Workplace flexibility and modern work schedules.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!