Staff Site Reliability Engineer

Added
12 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog terraform aws grafana prometheus

πŸ“‹ Description

  • Own end-to-end reliability domains: strategy, roadmap, execution
  • Drive SRE practices: SLIs/SLOs, error budgets, reviews
  • Lead multi-sprint, multi-engineer reliability initiatives
  • Design and maintain end-to-end observability: metrics, logs, traces, dashboards
  • Be SME in at least one reliability area to guide decisions
  • Partner with product and engineering to design reliable services

🎯 Requirements

  • 8+ years operating complex SaaS systems and reliability initiatives
  • Led multi-sprint, multi-engineer reliability initiatives with impact
  • Led org-wide reliability/performance initiative end-to-end
  • Strong software engineering: production-quality code in Python or Node.js/TypeScript
  • Regularly use LLMs and AI-assisted tooling to accelerate delivery
  • Deep expertise in at least one reliability domain (observability, incident management, performance, data/search)

🎁 Benefits

  • Generous equity grant; own a part of the company
  • MacBook provided
  • Comprehensive benefits package
  • Flexible PTO and hybrid work schedules
  • Work from home stipend
  • Hubs in LA, SF, Toronto, Raleigh with hybrid days
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’