Related skills
distributed systems incident response monitoring observability reliability engineeringπ Description
- Define and evolve reliability strategy for AI-enabled development.
- Set multi-year goals and roadmaps across observability, debugging, incident mgmt, service health, and readiness.
- Lead cross-team initiatives to reduce reliability risk as velocity and incidents grow.
- Collaborate to improve monitoring, alerting, debugging, SLOs/SLAs, and incident response.
- Identify AI-enabled reliability risks and design scalable guardrails.
- Provide technical leadership and mentorship to raise reliability and quality.
- Align with senior stakeholders on reliability priorities and risks.
- Many Dropbox teams run on-call rotations; engineers participate in the rotation.
π― Requirements
- BS degree in Computer Science or related field, or equivalent experience.
- 12+ years of experience in software engineering, SRE, or related roles.
- Define and deliver multi-year reliability strategies with measurable impact.
- Deep experience with distributed systems, observability, incident response, SLOs/SLAs, debugging, and reliability risk management.
- Diagnose complex problems, debug production systems, automate operational workflows, and design resilient software components.
- Influence engineering roadmaps across multiple teams and make tech decisions for the broader org.
- Strong communication and collaboration to align stakeholders and drive execution.
π Benefits
- Engineering Career Framework is viewable by outsiders.
- Flexible, remote-friendly work environment.
- On-call rotations across teams.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!