Principal Site Reliability Engineer

Added
23 days ago
Type
Full time
Salary
Salary not provided

Related skills

azure terraform aws python kubernetes

πŸ“‹ Description

  • End-to-end reliability ownership for distributed systems with SLOs.
  • Incident response and operational excellence for high-severity incidents.
  • Observability and operational insights to surface health risks.
  • Automation, tooling and engineering rigor to reduce toil.
  • Infrastructure, cloud and IaC using Terraform, Pulumi, Kubernetes.
  • Technical leadership and org impact; mentor engineers and set standards.

🎯 Requirements

  • 7+ years in SRE, platform, cloud or infra roles.
  • Define/operationalize SLIs/SLOs and error budgets.
  • Systems thinking and distributed systems fundamentals.
  • Python or Go for automation and tooling experience.
  • Cloud providers (Azure, AWS, GCP) and Kubernetes.
  • Terraform or Pulumi IaC experience with production deployments.

🎁 Benefits

  • Hybrid and onsite options; flexible work arrangements.
  • Rolling applications with no fixed deadline.
  • Inclusive, diverse workplace with equal opportunities.
  • Reasonable accommodations on request.
  • Privacy rights and policies compliance.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’