Senior Site Reliability Engineer

Added
28 days ago
Type
Full time
Salary
Salary not provided

Related skills

datadog terraform github actions aws prometheus

📋 Description

  • Design scalable, fault-tolerant, self-healing multi-region AWS systems.
  • Define and track SLOs/SLIs; drive decisions and budgets.
  • Run blameless post-incident reviews; implement preventive measures.
  • Build automation to eliminate manual work; create internal tools.
  • Move from basic monitoring to deep observability with actionable insights.
  • Refine on-call to reduce alert fatigue; improve MTTR.

🎯 Requirements

  • Bachelor’s degree in Computer Engineering or related field.
  • 5+ years SRE experience.
  • 3+ years AWS with container orchestration.
  • 2+ years Kubernetes experience.
  • Observability with Prometheus, Datadog, OpenTelemetry.
  • Terraform IaC; chaos engineering; incident management.

🎁 Benefits

  • Remote-first work arrangement with flexible hours.
  • Health Insurance coverage.
  • Work-from-anywhere stipend.
  • Wellness and learning credits.
  • Annual all-expenses-paid company retreat.
  • On-call rotation with paid standby and rest periods.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →