Added
13 days ago
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
datadog node.js terraform aws prometheusπ Description
- Drive and refine SRE across services: SLIs/SLOs, error budgets, reliability reviews.
- Design end-to-end observability: metrics, logs, traces, dashboards, alerts.
- Partner with product/engineering to design reliable services and rollout strategies.
- Evolve and operate AWS infrastructure using Terraform IaC.
- Contribute code and tooling for reliability libraries and health checks.
- Define SLIs/SLOs with owners to guide reliability and release decisions.
π― Requirements
- 5+ years in SRE/DevOps/Infra with production systems.
- Led multi-sprint, multi-engineer reliability and infra initiatives with impact.
- Expertise in SRE: SLIs/SLOs, error budgets, post-incident reviews.
- Production-grade software in Python or Node.js/TypeScript.
- Observability with Datadog, Prometheus, Grafana, Honeycomb, or New Relic.
- AWS production experience; Terraform IaC; Docker/ECS/EKS/Kubernetes.
π Benefits
- Generous equity grant.
- MacBook computer provided.
- Comprehensive benefits package.
- Flexible PTO and hybrid work schedules.
- Work from home stipend.
- Hubs in Los Angeles, San Francisco, Toronto, Raleigh with hybrid days.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!