Overview
Dropbox is seeking a Staff Site Reliability Engineer focused on incident and disaster readiness to join our remote team supporting select locations in Canada. This senior IC5 role leads incident response, disaster recovery planning, and reliability improvements across Dropbox infrastructure. You will collaborate with engineering teams to design resilient systems and robust incident management processes.
Responsibilities
- Lead and participate in incident response, blameless postmortems, capacity planning, and disaster recovery exercises.
- Develop and maintain runbooks, on-call rotations, and playbooks to minimize outages and speed restoration of services.
- Design, implement, and continuously improve monitoring, alerting, and observability to drive reliability and meet SLOs/SLIs.
- Collaborate with product and engineering teams to define reliability targets and measure progress against budgets and goals.
- Mentor and coach junior engineers; champion reliability best practices across teams.
- Contribute to disaster drills and resilience initiatives across Dropbox infrastructure.
Requirements
- Extensive experience in Site Reliability Engineering, incident management, and disaster recovery.
- Strong background in distributed systems, cloud environments, monitoring stacks, and on-call practices.
- Experience with Kubernetes and modern infrastructure tooling; familiarity with IaC and observability platforms.
- Excellent collaboration, communication, and leadership skills.
- Bachelor's degree in Computer Science or related field, or equivalent practical experience.
About Dropbox
Dropbox is a leading global cloud storage and collaboration platform enabling teams to work more effectively together.