Staff Software Engineer, Robinhood Command Center

Added
17 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

grafana prometheus observability opentelemetry

πŸ“‹ Description

  • Serve as a senior technical leader for reliability and observability strategy.
  • Partner with engineers to raise operational excellence and incident response.
  • Lead incident mitigation by coordinating owners, rollbacks, and traffic shifts.
  • Develop and maintain incident management processes to minimize downtime.
  • Own dashboards and alerts tied to CUJs, availability, and metrics.
  • Drive post-incident governance and postmortem standards.

🎯 Requirements

  • 8+ years of software engineering, including production systems.
  • 4+ years in reliability engineering, infra, or production ops.
  • Hands-on incident leadership roles (IMOC, on-call).
  • Strong communication during high-severity incidents.
  • Deep knowledge of reliability, observability, and fault-tolerant design.
  • Familiarity with observability stacks (OpenTelemetry, Prometheus, Grafana).

🎁 Benefits

  • Challenging, high-impact work to grow your career.
  • Performance-based pay with equity, bonuses, and 401(k) matching.
  • 100% paid health insurance for employees; 90% for dependents.
  • Lifestyle wallet for wellness, learning, and more.
  • Life & disability insurance, fertility and mental health benefits.
  • Generous time off: holidays, PTO, sick time, parental leave, and more.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’