Senior Site Reliability Engineer

Added
3 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform helm python kubernetes go

πŸ“‹ Description

  • Architect highly available distributed systems across global data centers, focusing on performance and DR.
  • Define and enforce SLOs/SLIs, manage error budgets, and lead post-mortems.
  • Participate in on-call rotations, escalating complex infrastructure outages.
  • Identify and automate manual operations to reduce toil.
  • Design multi-layer monitoring for on-prem and SaaS tools using Prometheus, Grafana, ELK.
  • Act as a technical mentor to upskill team across global regions.

🎯 Requirements

  • 10+ years in high-traffic environments where downtime has a direct financial or operational impact.
  • Advanced experience managing production Kubernetes clusters and apps with Helm and ArgoCD.
  • Proficient with IaC for cloud/on-prem resources; ideally Terraform.
  • Hands-on experience with Consul, Vault, and HAProxy.
  • Experience managing and troubleshooting large-scale MTAs and Postfix.
  • Proficiency in Go or Python.
  • Good to have: Experience managing Next Generation Firewalls (NGFW), ideally Palo Alto GlobalProtect.
  • Good to have: Experience managing LDAP infrastructure.

🎁 Benefits

  • Diversity, Equity and Inclusion commitments.
  • Equal opportunity employer policy.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’