Staff Reliability Engineer - Robinhood Command Center

Added
11 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

grafana prometheus observability opentelemetry

πŸ“‹ Description

  • Lead reliability and observability strategy for Robinhood infra.
  • Partner with engineers to raise operational excellence.
  • Lead incident mitigation, coordinating owners and rollbacks during incidents.
  • Develop and maintain incident management processes for timely resolution.
  • Own incident discovery with dashboards and alerts tied to user journeys.
  • Drive post-incident governance and durable reliability improvements.

🎯 Requirements

  • 8+ years of software engineering incl. production systems.
  • 4+ years in reliability engineering, infra, distributed systems, or prod ops.
  • Hands-on incident leadership roles (IMOC, on-call).
  • Strong communication during high-severity incidents.
  • Deep knowledge of reliability, observability, fault-tolerant design.
  • Familiarity with OpenTelemetry, Prometheus, Grafana.

🎁 Benefits

  • 100% paid health insurance for employees.
  • 90% coverage for dependents.
  • Lifestyle wallet for wellness and learning.
  • Employer-paid life and disability insurance.
  • Fertility benefits and mental health support.
  • Paid time off, holidays, sick time, parental leave.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’