Added
15 days ago
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
datadog terraform aws grafana prometheusπ Description
- Drive SRE practices across services (SLIs/SLOs, error budgets, reliability reviews)
- Design and maintain observability (metrics, logs, traces, dashboards, alerts)
- Partner with product/engineering to design reliable services; review architectures, failures, rollout strategies
- Evolve AWS infrastructure with Terraform IaC
- Contribute reliability code to libraries, tooling, and health checks
- Define and iterate SLIs/SLOs with owners to guide releases
π― Requirements
- 5+ years in SRE/DevOps or production infra
- Proven lead on multi-sprint, multi-engineer projects with impact
- Strong SRE practices: SLOs, toil reduction, safe deployments, post-incident reviews
- Production code in Python or Node.js/TypeScript
- Interest in AI-assisted tooling; validate and improve outputs
- Observability skills with Datadog/Prometheus/Grafana/Honeycomb/New Relic
π Benefits
- Generous equity grant, become an owner
- Macbook computer provided
- Comprehensive benefits package
- Flexible PTO and hybrid work schedules
- Work from home stipend
- Hubs in Los Angeles, San Francisco, Toronto, and Raleigh with hybrid days
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!