Related skills
datadog sre pagerduty monitoring infrastructureπ Description
- Own the Safeguards Eng ops review and cadence
- Drive incident tracking and post-mortems across teams
- Establish and maintain SLOs with partner teams
- Maintain runbooks and incident ownership clarity
- Drive platform migrations and infra projects
- Coordinate evals platform improvements
π― Requirements
- Solid technical program mgmt in operational/infrastructure
- Understand production ML systems to triage incidents
- Strong ability to close loops and follow up actions
- Cross-team collaboration and influence without direct authority
- Thrive balancing keeping lights on with new platform work
- Interest in AI safety and reliable ML systems
π Benefits
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Collaborative SF office space for teamwork
π Visa sponsorship
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!