Technical Program Manager, Reliability Engineering

Added
5 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog sre pagerduty monitoring infrastructure

πŸ“‹ Description

  • Own the Safeguards Eng ops review and cadence
  • Drive incident tracking and post-mortems across teams
  • Establish and maintain SLOs with partner teams
  • Maintain runbooks and incident ownership clarity
  • Drive platform migrations and infra projects
  • Coordinate evals platform improvements

🎯 Requirements

  • Solid technical program mgmt in operational/infrastructure
  • Understand production ML systems to triage incidents
  • Strong ability to close loops and follow up actions
  • Cross-team collaboration and influence without direct authority
  • Thrive balancing keeping lights on with new platform work
  • Interest in AI safety and reliable ML systems

🎁 Benefits

  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
  • Collaborative SF office space for teamwork

πŸ›ƒ Visa sponsorship

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’