Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
datadog terraform aws grafana pythonπ Description
- Design automated reliability and self-healing systems to protect production.
- Own and improve incident management tooling and on-call health.
- Develop observability infrastructure: monitoring, alerts, SLOs, latency visibility.
- Contribute to AI-driven tooling for autonomous remediation.
- Drive incident prevention by eliminating operational toil.
- Partner with product engineering teams to diagnose reliability gaps.
π― Requirements
- 8+ years designing and building software in teams.
- Bachelor's degree in CS/Engineering or equivalent.
- 3+ years in infrastructure and/or platform engineering.
- Expertise in observability, reliability, and data analysis.
- Experience with Datadog, New Relic, or Grafana.
- Familiarity with AWS or GCP and IaC (Terraform).
π Benefits
- Flexible, employee-led remote model with optional in-person offices.
- Professional development stipend and growth opportunities.
- Comprehensive health and parental leave plans.
- Equity and performance-based rewards in a high-growth company.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!