Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
azure terraform aws grafana prometheusπ Description
- Hybrid role: 3 days/week in San Jose, CA or remote; reports to Production Engineering.
- Drive automation-first culture with code to cut toil and build self-healing systems.
- Design highly available, scalable infra across AWS, Azure, GCP, and bare-metal.
- Implement observability with Prometheus, Grafana, OpenTelemetry; set SLIs/SLOs.
- Lead Incident Commander duties; develop playbooks; post-incident analyses.
- Partner with Engineering for operability reviews.
π― Requirements
- 10+ years of reliability, scalability, and availability for large-scale production services.
- Deep programming in Python, Go, or C/C++.
- Strong networking, Linux/FreeBSD, and distributed architecture.
- Experience in 24/7 on-call rotation and incident management.
- ITIL framework experience; drive maturity via operability reviews.
π Benefits
- Various health plans
- Time off for vacation and sick time
- Parental leave options
- Retirement options
- Education reimbursement
- In-office perks, and more!
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!