Fleet Reliability Technical Program Manager

Related skills

grafana prometheus observability reliability incident management

πŸ“‹ Description

  • Own end-to-end fleet reliability from Day 0 provisioning to Day 2 steady-state.
  • Lead cross-functional programs to improve fleet delivery, readiness, and stability.
  • Define program plans, milestones, dependencies, risks, and success criteria.
  • Proactively manage cross-team dependencies and unblock execution.
  • Establish and own fleet reliability metrics and dashboards.
  • Use data and post-incident learnings to prioritize reliability investments.

🎯 Requirements

  • Bachelor's degree in Computer Engineering or related field.
  • 10+ years in technical program management for large-scale compute infrastructure.
  • Experience with observability/monitoring/telemetry (Prometheus, Grafana, OpenTelemetry).
  • Experience leading cross-functional programs for reliability and availability.
  • Strong technical knowledge across compute, storage, networking, or SRE.
  • Data-driven with metrics to guide prioritization and decisions.
  • Excellent communication and stakeholder mgmt; executive reporting.

🎁 Benefits

  • Base salary: $188,000-$275,000 USD.
  • Medical, dental, and vision insurance (100% paid).
  • 401(k) with employer match.
  • Flexible PTO and paid parental leave.
  • Tuition reimbursement and ESPP eligibility.
  • Mental wellness benefits and family-forming support.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’