Principal Site Reliability Engineer

Added
17 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

gitops terraform python kubernetes argo cd

πŸ“‹ Description

  • Act as technical leader for reliability across domains; set direction and standards.
  • Define SLOs/SLIs, error budgets, and KPIs aligned to customer journeys.
  • Own incident response maturity; lead incidents, improve RCAs, and remediation.
  • Architect automation to reduce toil using Python and Argo Workflows.
  • Advance GitOps with Argo CD; promotions, canaries, and guardrails.
  • Scale infrastructure management with Crossplane and Terraform; reusable patterns and controls.

🎯 Requirements

  • 8+ years in SRE/platform roles operating production services at scale.
  • Principal-level impact; leading cross-team reliability improvements.
  • Kubernetes operations, troubleshooting; safe rollout/rollback patterns and guardrails.
  • GitOps with Argo CD; workflows with Argo Workflows.
  • Crossplane and Terraform; reusable platform patterns and controls.
  • Incident management and on-call leadership; measurable improvements.

🎁 Benefits

  • Be part of a mission-driven company transforming healthcare.
  • Flexible, remote-friendly company with personality and heart.
  • Employee-driven programs for personal and professional development.
  • Join the Arcadian Community and grow within the company.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’