Data Center Incident Program Manager

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

pagerduty jira servicenow rca

πŸ“‹ Description

  • Define incident severity levels, criteria, and escalation thresholds.
  • Establish end-to-end incident response standards and lifecycle stages.
  • Build governance artifacts: runbooks, war room formats, templates.
  • Create notification trees and stakeholder comms templates with escalation.
  • Define RACI across Facilities, Hardware Ops, Network, Security, vendors.
  • Set and manage SLAs/OLAs for acknowledgment, escalation, and reporting.

🎯 Requirements

  • 7+ years in mission-critical infra or data center ops.
  • Direct experience leading major incidents (P1/P0) and war rooms.
  • Familiarity with facilities systems, hardware ops, or networks.
  • Experience leading executive updates and post-incident reviews.
  • Root cause analysis and corrective action tracking.
  • Calm, decisive under pressure; prefer hyperscale AI compute exp.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Operations Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Operations Jobs

See more Operations jobs β†’