Related skills
terraform aws python kubernetes goπ Description
- Own reliability and operational health of production systems
- Lead the NOC: shift structure, escalation paths, incident standards, readiness, reporting
- Act as senior escalation point and incident commander for high severity events
- Design and improve monitoring, alerting, and tooling for early detection
- Drive root cause analysis and post-incident reviews to produce real action
- Build and maintain runbooks, readiness checklists, and service health standards
π― Requirements
- 7+ years in SRE/infrastructure with production ownership
- Strong AWS production systems experience
- Kubernetes and containerized services experience
- Observability across metrics, logs, tracing, alerting
- Incident response programs, on-call ops, post-incident reviews
- Infrastructure as code with Terraform
π Benefits
- Employee Stock Ownership Plan for long-term upside
- Comprehensive health coverage (medical, dental, vision)
- Mental health and wellness support
- Hands-on exposure with key clients in a scaling global tech company
- Continuous learning through real ownership
- Direct collaboration with the Founders and tech leadership
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!