Senior Production Engineer (Reliability)

Related skills

python kubernetes go observability tracing

πŸ“‹ Description

  • Own critical systems and frameworks; drive architecture and evolution.
  • Lead end-to-end projects improving availability, scalability, and automation.
  • Build observability, alerting, automated remediation, and resilience.
  • Participate in incident response; drive root-cause analyses and fixes.
  • Improve runbooks, truth sources, deployment workflows, and tooling.
  • Eliminate single points of failure; reduce toil via automation and refactors.
  • Ship production code in Python, Go, or similar; participate in on-call.
  • Maintain and mature long-term projects; ensure reliability and operability.
  • Collaborate with platform teams to integrate features with reliability tooling.

🎯 Requirements

  • 7+ years building and operating distributed systems or cloud platforms.
  • Debug complex production issues end-to-end across services and infra.
  • Strong Python/Go scripting; ship and operate production services/tools.
  • Deep knowledge of cloud-native tech and distributed patterns (Kubernetes).
  • Experience with observability stacks: metrics, tracing, logs, SLOs/SLIs.
  • Proven track record delivering reliability improvements through engineering.

🎁 Benefits

  • Medical, dental, and vision insurance – 100% paid.
  • Company-paid Life Insurance.
  • Health Savings Account (HSA).
  • 401(k) with generous employer match.
  • Flexible PTO.
  • Catered lunch daily in office.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’