Senior Site Reliability Engineer

Related skills

datadog docker terraform aws python

πŸ“‹ Description

  • Drive and refine modern SRE practices across services
  • Design observability across metrics, logs, traces, dashboards
  • Partner with product/engineering to design reliable services
  • Evolve AWS infrastructure via Terraform IaC
  • Contribute code to reliability tooling and health checks
  • Participate in incident response and post-incident reviews

🎯 Requirements

  • 5+ years in SRE, DevOps, or production engineering
  • Led multi-sprint reliability/infrastructure initiatives
  • Hands-on with SLIs/SLOs, error budgets, and toil reduction
  • Proficient in Python or TypeScript/Node.js
  • Observability stack experience: Datadog, Prometheus, Grafana
  • AWS production experience; Terraform IaC; Docker/Kubernetes

🎁 Benefits

  • Generous equity grant
  • MacBook provided
  • Comprehensive benefits package
  • Flexible PTO and hybrid work schedules
  • Work from home stipend
  • Hubs in Los Angeles, San Francisco, Toronto, and Raleigh with hybrid schedules
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’