Production Engineer – Team Lead

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform aws grafana prometheus python

📋 Description

  • Act as Incident Commander during incidents to ensure timely resolution
  • Coordinate cross-functional teams and maintain clear incident comms
  • Lead root-cause analysis and implement long-term fixes
  • Own post-incident reviews and actionable PIR outcomes
  • Drive incident playbooks and escalation processes for readiness
  • Define and track SLOs, KPIs, and reliability goals

🎯 Requirements

  • 4+ years in production engineering, cloud ops, SRE, or incident response
  • Deep knowledge of Kubernetes-based infra, AWS, and GCP
  • Familiarity with ITIL and SRE best practices
  • Proficiency in Prometheus and Grafana for monitoring and alerts
  • Hands-on automation with Python, Bash, Terraform
  • Strong decision-making under pressure and clear communication; mentoring

🎁 Benefits

  • Medical, dental, and vision insurance - 100% paid by CoreWeave
  • Company-paid life insurance
  • Health Savings Account and Flexible Spending Account
  • 401(k) with generous employer match
  • Tuition Reimbursement and ESPP eligibility
  • Paid parental leave and childcare support
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →