Added
14 days ago
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
datadog node.js terraform aws prometheusπ Description
- Own end-to-end reliability domains: strategy, roadmap, execution.
- Drive SRE practices: SLIs/SLOs, error budgets, reviews.
- Lead multi-engineer reliability initiatives across teams.
- Design and maintain observability: metrics, logs, traces, alerts.
- Partner with product/engineering on reliable services and capacity.
- Contribute tooling and code; use AI/LLMs to accelerate delivery.
π― Requirements
- 8+ years operating complex SaaS and production systems.
- Led multi-engineer, multi-sprint reliability/perf initiatives.
- Led at least one org-wide reliability or performance initiative.
- Deep expertise in observability, incident management, or data/search.
- Strong software engineering: Python or TS/Node.js; AI tooling.
- AWS production experience with Terraform IaC and Kubernetes (ECS/EKS).
π Benefits
- Generous equity grant; own part of the company.
- Macbook provided.
- Comprehensive benefits package.
- Flexible PTO and hybrid work schedules.
- Work from home stipend.
- Hubs in LA, SF, Toronto, Raleigh with hybrid schedules and lunch.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!