Site Reliability Engineer - Infrastructure

Added
less than a minute ago
Type
Full time
Salary
Salary not provided

Related skills

datadog node.js docker terraform aws

📋 Description

  • Design scalable blueprints for global automation platform with high availability.
  • Define SLIs, SLOs, and error budgets to balance velocity and reliability.
  • Build and maintain observability pipelines with metrics, logs, and traces.
  • Participate in incident resolution and blameless postmortems to improve reliability.
  • Cultivate a learning culture from outages to harden the platform.
  • Develop and automate CI/CD pipelines with canary or blue/green releases.

🎯 Requirements

  • 6+ years of experience in Software Engineering or SRE with technical leadership
  • Thorough understanding of applying SLI and SLO principles for reliability
  • Deep proficiency in Linux/Unix-based infrastructure at scale
  • Extensive experience with cloud providers, strong preference for AWS
  • Expert-level Kubernetes in production
  • Infrastructure as Code using Terraform

🎁 Benefits

  • RSUs grant and annual bonus
  • Multinational team with 42 nationalities
  • Learning & Development plan with 2 learning days per year
  • Laptop (MacBook) and 34'' curved monitor provided
  • 25 vacation days, 4 sick days
  • Remote working allowance
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →