Site Reliability Engineer

Added
16 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog docker terraform aws grafana

πŸ“‹ Description

  • Drive SRE practices across services: SLIs/SLOs, error budgets, reliability reviews
  • Design end-to-end observability with metrics, logs, traces, dashboards, alerts
  • Collaborate with product/engineering to build reliable services and rollout strategies
  • Evolve AWS infrastructure with Terraform IaC and automation
  • Contribute code to reliability libraries, tooling, and health checks
  • Participate in incident response and post-incident reviews

🎯 Requirements

  • 2+ years in Site Reliability Engineering, DevOps, or Infra on production systems
  • Strong SRE practices: SLIs/SLOs, error budgets, toil reduction
  • Production coding experience in Python or Node.js/TypeScript
  • Experience with AWS, Terraform-based IaC, and containers (Docker, ECS, EKS, Kubernetes)
  • Observability tooling: Datadog, Prometheus, Grafana, Honeycomb, or New Relic
  • Incident management experience is a strong plus; post-incident follow-ups

🎁 Benefits

  • Generous equity grant; own part of the company
  • MacBook provided
  • Comprehensive benefits package
  • Flexible PTO and hybrid work schedules
  • Work-from-home stipend
  • Hubs in LA, SF, Toronto, and Raleigh with hybrid days and lunch
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’