Operations Manager, Fleet Reliability

Added
22 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

sre change management observability reliability incident management

📋 Description

  • Build and lead a 24/7 team of reliability and observability engineers.
  • Document provisioning, validation, and troubleshooting of server nodes.
  • Drive automation and event-driven remediation to improve resilience.
  • Provide 24/7 engineering support for high-criticality node delivery.
  • Enhance onboarding, documentation, enablement, and performance management.
  • Shape culture and communications to enable CoreWeave across teams.

🎯 Requirements

  • 7+ years in software or infra engineering with 2+ years in leadership.
  • Strong SRE fundamentals, incident management, observability, and change management.
  • Champion automation and cross-team tooling to improve reliability.
  • Enjoy helping people grow; extend influence to partners and leadership.
  • Experience leading reliability programs for high-scale fleets.
  • Strong communication and leadership skills.

🎁 Benefits

  • Medical, dental, and vision insurance—100% paid by CoreWeave.
  • Company-paid Life Insurance.
  • 401(k) with generous employer match.
  • Flexible PTO and Paid Parental Leave.
  • Tuition Reimbursement and ESPP.
  • Catered lunch and a casual work environment.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Operations Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Operations Jobs

See more Operations jobs →