Staff Site Reliability Engineer

Added
2 hours ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform distributed systems observability cdk slos

πŸ“‹ Description

  • Participate in high-impact incident response with calm decision-making
  • Define and evolve org-wide incident practices and reliability tooling
  • Architect observability platforms for actionable insights on health and paths
  • Lead reliability practices, including alerting hygiene and SLO design
  • Guide teams in building resilient, fault-tolerant services
  • Mentor engineers in operational rigor and reliability principles

🎯 Requirements

  • 8+ years operating and scaling production infra in cloud-native environments
  • Deep expertise in incident response, debugging distributed systems, and reliability improvements
  • Strong knowledge of observability stacks (metrics, logs, traces), alerting, and SLO design
  • Experience implementing fault isolation, graceful degradation, and chaos engineering
  • Proficiency with IaC and config management (Terraform, CDK, etc.)
  • Proven ability to influence teams through standards, tooling, and culture

🎁 Benefits

  • Flexible, hybrid work environment
  • Unlimited Vacation
  • 100% paid employee health benefit options (including medical, dental, and vision)
  • Commuter Benefits
  • 401(k) with employer funded match
  • Corporate wellness program with Wellhub
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’