Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
terraform aws grafana prometheus python📋 Description
- Act as Incident Commander during incidents to ensure timely resolution
- Coordinate cross-functional teams and maintain clear incident comms
- Lead root-cause analysis and implement long-term fixes
- Own post-incident reviews and actionable PIR outcomes
- Drive incident playbooks and escalation processes for readiness
- Define and track SLOs, KPIs, and reliability goals
🎯 Requirements
- 4+ years in production engineering, cloud ops, SRE, or incident response
- Deep knowledge of Kubernetes-based infra, AWS, and GCP
- Familiarity with ITIL and SRE best practices
- Proficiency in Prometheus and Grafana for monitoring and alerts
- Hands-on automation with Python, Bash, Terraform
- Strong decision-making under pressure and clear communication; mentoring
🎁 Benefits
- Medical, dental, and vision insurance - 100% paid by CoreWeave
- Company-paid life insurance
- Health Savings Account and Flexible Spending Account
- 401(k) with generous employer match
- Tuition Reimbursement and ESPP eligibility
- Paid parental leave and childcare support
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!