Site Reliability Engineer

Added
16 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog docker terraform aws prometheus

πŸ“‹ Description

  • Drive SRE across services: SLIs/SLOs, error budgets, reliability reviews
  • Design and maintain end-to-end observability: metrics, logs, traces, dashboards, alerts
  • Partner with product/engineering to design reliable services; review architectures and rollouts
  • Evolve AWS infra (networking/compute/data stores) with Terraform IaC
  • Contribute reliability code, tooling, and health checks
  • Define and iterate SLIs/SLOs and error budgets with service owners

🎯 Requirements

  • 2+ years in SRE/DevOps on production systems
  • Strong SRE practices: SLIs/SLOs, error budgets, toil reduction
  • Proficiency in Python or Node.js/TypeScript
  • Experience with Datadog/Prometheus/Grafana/Honeycomb/New Relic
  • AWS in production; Terraform IaC; Docker/Kubernetes
  • Incident management experience a plus; strong communication

🎁 Benefits

  • Generous equity grant, own part of the company
  • Macbook provided
  • Comprehensive benefits package
  • Flexible PTO and hybrid work schedules
  • Work from home stipend
  • Hybrid hubs in LA, SF, Toronto, Raleigh with in-office lunch
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to DevOps Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related DevOps Jobs

See more DevOps jobs β†’