Site Reliability Engineer

Related skills

datadog docker terraform aws prometheus

πŸ“‹ Description

  • Drive SRE practices: SLIs/SLOs, error budgets, reviews
  • Build observability: metrics, logs, traces, dashboards, alerts
  • Partner with product/engineering on reliable services
  • Evolve AWS infra with Terraform; networking/compute
  • Contribute reliability tooling and health checks
  • Define SLIs/SLOs with owners to guide releases

🎯 Requirements

  • 2+ years in SRE/DevOps or infra on production systems
  • Strong observability: metrics, logs, traces; dashboards/alerts
  • AWS prod experience; Terraform IaC; Docker/Kubernetes
  • Production code in Python or TypeScript/Node.js
  • Datadog, Prometheus, Grafana experience (New Relic welcome)
  • Interested in using LLMs; strong communication; hybrid work

🎁 Benefits

  • Generous equity grant; become an owner
  • MacBook computer provided
  • Comprehensive benefits package
  • Flexible PTO and hybrid work schedules
  • Work from home stipend
  • Hybrid hubs in LA, SF, Toronto, Raleigh with in-office days
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’